Cyberbotics (CYB) has published a series of guidelines to access the JUMAX machine, use the DFE, install the robot simulation on the JUMAX machine, develop multi-layer perceptrons and convolutional neural networks for autonomous car simulations. The complete guidelines are available online from the public wiki pages of the CYB optima project at https://github.com/cyberbotics/optima/wiki. We report here only the page entitled Basics of DFE Applications.
The Juelich Supercomputing Centre (JSC) is a partner of the OPTIMA project. The center owns multiple supercomputers for fast computation. Some of them embed a high number of CPU cores, others GPUs and finally one of them, Jumax, FPGA cards. Jumax is the abbreviation for Juelich-Maxeler. Maxeler is the partner which provides the FPGA hardware and the corresponding development environment.
DataFlow Engines (DFEs)
DataFlow Engines (DFEs) are processing systems developed by Maxeler and allow accelerated execution of CPU programs. They are based on FPGAs (Field Programmable Gate Arrays) and allow Dataflow Computing. In contrast to CPU control flow computing, where instructions of the program are more or less sequentially executed, dataflow computing allows parallelization of operations on big data. DFEs are composed of thousands of dataflow cores that can be re-programmed for each application to allow the biggest optimization possible. DFEs cores are very simple and each of them executes a very simple operation, but programming an application in space instead of in control is also really more complex. Fortunately, Maxeler provides a set of tools to make it really easier.
As can be understood, DFEs are essentially FPGAs with additional connections and memory added by Maxeler. In addition to the on-chip memory blocks, DFEs provide a large amount of external memory which can be accessed slower. Also fast connections, named MaxRing, allow communication between engines. On Jumax, DFEs are connected to the CPU with InfiniBand, a computer networking communications standard used in high-performance computing.
Jumax is equipped with a MPC-X card, which has 8 DFEs of MAX5 generation, each containing a Virtex UltraScale VU9P FPGA from Xilinx. This chip contains 2,586,000 logic cells, 6,840 Digital Signal Processors for multiplications and 340Mb of memory blocks. External slow memory has a capacity of 48Gb. Head to this page for a more detailed configuration of Jumax: JuMax DFEs (MAX5 gen).
In general, FPGAs are programmed using HDL (Hardware Description Language). But this language is not a programming language and may not be intuitive to everybody, so Maxeler developed MaxCompiler to ease the process.
Thanks to MaxCompiler, DFEs can be programmed using .maxj files (a java based language). MaxCompiler compiles this dataflow program into a java executable which produces the corresponding HDL files, in function of the vendor of the FPGA in the DFE. The synthesis step produces a set of logic operations from the HDL files. Placing step takes care of choosing the logic blocks to implement the set of logic operations on the chip. Finally, the routing step defines the wires that interconnect the blocks. One of the main tasks for the compiler is to find a route and placement which satisfies the frequency of the clock. If the frequency is too high or the design too complex, the compiler can return a timing failure error. For a more illustrated view of the process you can go to: From Graphs to Hardware and to: Substrate Agnostic Compilation.
MaxCompiler generates a .max file which contains the configuration of the DFEs for the given application. At runtime, the CPU program loads them to the DFEs, using MaxelerOS. MaxelerOS is running within Linux and DFEs and is responsible for making the link between CPU and DFE by loading, executing and unloading compiled .max files.
Finally, CPU programs can call functions running on DFEs defined in .max files. To do that, .max files must be linked to the CPU application using standard GCC linker. The dataflow implementations can then be called using the SLiC (Simple Live CPU) interface.
A synthesized illustration of what is going on is accessible here: MaxCompiler Architecture.
It is also possible to call DFE functions from programs coded in a language other than C. Sliccompile is a SLiC tool bundled with MaxCompiler allowing to execute DFE functions from any supported language by generating SliC Skins. They consist of all class, function or script files needed to call the kernel functions from Matlab, Python, and other CPU applications. A _simutils_ folder is created to make the link to related .max files.
A more detailed overview of the DFE architecture can be found in this presentation: Using, Understanding and Programming Data Flow Engines.
Programming a DFE application
So how are DFE applications programmed? In fact, it is not that hard: they are made of kernels (.maxj files), a manager (.maxj file) and a CPU application code (in C for example).
The kernels define functions that will run on DFEs. Programming kernels are very close to java but they use DFE specific variables. In kernels, basic arithmetic functions can be used, as well as loops or conditionals. For an example of a kernel, head to this page: A Basic Kernel.
The Manager configures the kernels, connects them together and links the stream from and to the CPU. For an example of a manager, head to this page: Configuring a Manager.
The CPU program can directly call a dataflow implementation in case the corresponding .max files are linked. For an example of a CPU application in C, go to this page: Simple CPU application.
A more detailed overview of the programming process can be found in this MaxCompiler tutorial.
Getting started on Jumax
To start creating DFE applications, you can start MaxIDE on Jumax using this page: Start MaxIDE.