HETEROGENEOUS COMPUTING

Heterogeneous Computing: Inventory

TYPE 4 NIC

We classify the existing NICs into three types: Type 1 NIC is the classic NIC processing standard MAC protocol, Type 2 NIC is the one with networking stack (e.g., IP, OVS, etc.) offloaded from kernel to it, and the Type 3 NIC accommodates other particular functions originally in CPU, e.g., KVS, etc. The Type 2 and Type 3 NIC are called as SmartNIC, which have enabled customized functions beyond L2, but they highly decoupled hardware development with applications, which have long development time to market. We propose a Type 4 NIC to change the programming model. Instead of fix functions offloaded into NIC, we abstract and implement programmable engines in hardware. New functions can be easily implemented in only software level via the elaborately defined host interface between NIC and host CPU, or/and RPC interface between NIC and remote servers. A programmable engine is not as general as CPU, which is a domain specific engine on FPGA providing programmable interfaces to solve problems in a domain, like L4 engine, monitoring engine, AI engine, etc. Users choose existing programmable engines or customize their own as NIC hardware instantiation, and then can mainly focus on the software leveraging the interfaces of programmable engines.

ADAPTIVE SWITCH: SWITCH FABRIC + FPGA

The paradigm of offloading computing to SmartNIC has been very successfully deployed in data centers and we make a further step by pushing computing and programmability from network edge (SmartNIC) to network core by proposing an so-called ``adaptive switch'' architecture, in the purpose of facilitating specialized computing accelerations and proprietary network protocol at line-rate. Adaptive switch is relevant but not the same as a P4 switch with the Protocol Independent Switch Architecture (PISA), which solves the limitation of PISA on event-driven management, stateful processing, and particular computation. The proposed Adaptive Switch architecture combines the advantage of Switching ASIC and FPGA: high throughput of switching ASIC and full programmability of FPGA.

HARDWARE AND SOFTWARE ACCELERATION: PUBLIC BLOCKCHAIN

A blockchain is essentially a distributed ledger of transactions, which is maintained by all the participating nodes of the P2P blockchain network. In public blockchains such as Bitcoin and Ethereum, anyone can join the network and the consensus mechanism is based on proof-of-work (PoW) algorithms which are computationally intensive. Previously, most efforts were on accelerating PoW, but the optimizations on networking transmission of the ledges/controls were largely ignored. We propose a hardware/software co-designed architecture for the mining proxy to accelerate the broadcast of periodic jobs. We customize the Stratum protocol with a layer 2 broadcast mechanism instead of using TCP/IP connections. The proposed architecture is implemented on a Xilinx Zynq SoC board (ONetSwitch45) where the layer 2 broadcast mechanism is offloaded on the FPGA. Our experiments demonstrated a speedup of 2079× in transmission time with 225 miners connected to the proxy, compared to an implementation on an Intel i7 server.

Guanwen Zhong, Haris Javaid, Hassaan Saadat, Lingchao Xu, Chengchen Hu and Gordon Brebner, "FastProxy: Hardware and Software Acceleration of Stratum Mining Proxy," 2019 Crypto Valley Conference on Blockchain Technology (CVCBT), Rotkreuz, Switzerland, 2019, pp. 73-76.

HARDWARE AND SOFTWARE ACCELERATION: PRIVATE BLOCKCHAIN

In private or permissioned blockchains, the identity of the nodes is known and authenticated cryptographically. The consensus mechanism is delegated to a few selected nodes in order to reduce bottlenecks in the consensus. We focus on performance improvements for Hyperledger Fabric, which is one of the most popular platforms as it is open-source and has already been shown to implement many enterprise applications. We re-architect the Fabric processing phase based on our analysis from fine-grained breakdown of the processing latency. We use comprehensive approaches for optimization including adding chiancode cache, parallel database operates with validation, hardware acceleration on ECDSA/database/consensus.

Haris Javaid, Chengchen Hu, Gordon J. Brebner, Optimizing Validation Phase of Hyperledger Fabric. MASCOTS 2019: 269-275