• Skip to primary navigation
  • Skip to main content
  • Skip to footer
  • LinkedIn
  • Twitter
OPTIMA

OPTIMA

Optimizing Industrial Applications for Heterogeneous HPC systems

  • The Project
  • Partners
  • Components
    • Applications
    • Programming Environment
  • News and Events
    • News
    • Events
    • Press
  • Publications and Media
    • Publications
    • Deliverables
    • Branding
  • Contact

Programming Environment

You are here: Home / Programming Environment

The development of parallel systems on heterogeneous, especially FPGA-populated clusters, is still an active area of research. Emerging FPGA- based HPC clusters often lack support in parallel programming models.

Thus, in the scope of the OPTIMA project the programming environment has been carefully selected to use industry-proven software with the capability to support the efficient programming and the intercommunication between FPGAs. As a result, within OPTIMA we will utilize and further extend CARME, FPGA-ML, PARALLELWARE, GPI/GASPI, and MAXJ. All those programming paradigms, except MAXJ, have been developed within several EC-funded projects and have unique characteristics as analytically described in the next sections.

  • CARME is a multi-user software stack for deep learning on HPC clusters. At its current stage it mainly supports GPU clusters and provides customized containers for machine learning (e.g. for Tensorflow, Pytorch, scikit-learn). Currently CARME is mainly used by the deep learning group at Fraunhofer ITWM and in the scope of external training courses on our internal GPU cluster.
  • FPGA-ML is being developed and tested based on the requirements of, mainly, two HPC applications: an oil reservoir modelling application and a city traffic analysis one. Based on the overall requirements’ analysis of ECOSCALE and the evaluation of the final prototype, it seems that certain parts of the programming environment and the runtime (e.g. the support of Deep learning schemes and the MPI-based intercommunication mechanism) have room for improvements.
  • PARALLELWARE ANALYZER is the first static code analyzer specializing in performance. The Parallelware Artificial Intelligence (AI) engine leverages the expertise of the senior performance optimization engineers who have been doing it manually for the last decades. It provides actionable insights through performance optimization reports that help ensuring best practices to speedup the code through parallel computing in modern heterogeneous multicore chips. At the current stage, it mainly supports the C, C++ and Fortran programming languages, the OpenMP and OpenACC parallel programming APIs, and vectorization, multicore CPU and GPU parallel hardware platforms.
  • Regarding GASPI/GPI-2, it is industry-proven code. In the scope of the ExaNoDe and EuroExa project an Ethernet version providing host triggered FPGA communication has been deployed. This version is based on TCP sockets. The communication works with the help of GASPI segments. The segments are allocated and registered in memory regions to which the DMA controller has access. The data offload to the FPGA is handled by the DMA controller. In the scope of the EPiGRAM-HS project we are investigating methods to offload with OpenCL directives.
  • MaxJ is an industry-proven solution that has been used commercially by Maxeler internally and its customers. Furthermore, it is used by a large community of academic users. MaxJ has been successfully used to accelerate applications from various different application domains including seismic modelling, finance, machine learning, genomics, and various scientific applications with complex code bases. https://www.maxeler.com/solutions/universities/

An application in MaxJ is a combination of one or multiple kernels and a manager. The kernels express computation and the manager describes the connection between the kernels and external interfaces such as PCIe, DRAM or networking. The concept behind MaxJ is to provide the right level of abstraction for productive development and high performance. MaxJ hides the tedious and time-consuming aspects found in conventional hardware description languages while retaining enough control over the key aspects of the application so that the developer can reason about the computation and IO and push the application towards maximum performance. For example, when developing in MaxJ one would not be concerned with low-level details and optimisation of the dataflow graph such as scheduling of operations, but the developer will control pipelining, parallelism, and balance of computation and communication. MaxJ designs fully predictable in terms of performance and developers typically analyse, plan and optimise a design before implementing it. It has been shown that this model can result in performance equivalent to hand-optimised low-level VHDL implementation while at the same time requiring less development time than an approach in high-level synthesis.

The MaxCompiler tools for MaxJ provide extensive functionality for application analysis, optimisation, simulation and debug. One of the key challenges of developing the FPGA accelerators is the extremely long place and route times of the FPGA vendor backend, that can exceed 24h for large FPGA devices. In order to avoid unnecessary place and route runs, MaxCompiler provides tools to simulate and debug MaxJ code without the need to place and route them. Furthermore, the tools provide additional tools to assist with powerful but complex optimisation procedures such as using custom number formats.

Another key benefit of MaxJ is that it is completely FPGA vendor and devic agnostic. MaxJ code in not bound to any particular FPGA vendor tool chain or FPGA device architecture and therefore provides a simple route to moving applications between different accelerators that use different FPGA devices. For example, older generation MAX4 DFEs are based on Intel Stratix-V devices, and the MaxJ code developed for these devices can be recompiled for new MAX5 DFEs which use Xilinx UltraScale+ FPGAs. MaxCompiler also supports further Xilinx UltraScale+ based third party-platforms from Amazon Web Services and Xilinx. This is a significant benefit, since a common obstacle to the wider adoption of FPGA accelerators is the assumption that code development and optimisation is locked to one particular target devices and portability to other, or newer devices is limited or even impossible. This can be avoided with device-agnostic code and tools.

Progress beyond the State-of-the-Art

In terms of CARME it will be tested for its full support of FPGAs. Within OPTIMA we will also develop a library with FPGA kernels for Deep Learning and mathematical operations that can be run with CARME, thus extending CARME beyond the scope of a pure machine learning software stack. We propose that the frontend of this new library is written in Python to provide an easy-to-use interface that is highly accepted in the machine learning community. FPGA-ML will be improved by being combined with the other components of the OPTIMA programming environment. In particular, by replacing in FPGA-ML the CPU-demanding MPI intercommunications with the much lighter GASPI/GPI-2 we expect to increase the performance of the applications running on top of it. Moreover, CARME will allow FPGA-ML to support properly Machine/Deep learning applications while PARALLEWARE will provide the means for better task distributions and debugging. PARALLELWARE will be extended to fully support FPGAs, in addition to the existing support for multicore CPUs and GPUs.

Footer

Tweets by optima_hpc

Cookie Policy

Follow US

  • LinkedIn
  • Twitter

OPTIMA

This project has received funding from the European High-Performance Computing Joint Undertaking Joint Undertaking (JU) under grant agreement No 955739. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Greece, Germany, Italy, Netherlands, Spain, Switzerland.

Copyright © 2023 OPTIMA. All rights reserved. Return to top

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies. Read More


Cookie settingsACCEPT ALL

Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement1 yearThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThis cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid1 dayThis cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
SAVE & ACCEPT
Powered by CookieYes Logo