GPGPU
GPGPU stands for General-purpose computing on graphics processing units.
OpenCL
OpenCL (Open Computing Language) is an open, royalty-free parallel programming specification developed by the Khronos Group, a non-profit consortium.
The OpenCL specification describes a programming language, a general environment that is required to be present, and a C API to enable programmers to call into this environment.
OpenCL Runtime
To execute programs that use OpenCL, a compatible hardware runtime needs to be installed.
AMD/ATI
- opencl-mesa: free runtime for AMDGPU and Radeon
- opencl-amdAUR: proprietary standalone runtime for AMDGPU and AMDGPU PRO
- rocm-opencl-runtimeAUR: Part of AMD's ROCm GPU compute stack, officially supporting GFX8 and later cards (Fiji, Polaris, Vega), with unofficial and partial support for Navi10 based cards.
- amdapp-sdkAUR: AMD CPU runtime
NVIDIA
- opencl-nvidia: official NVIDIA runtime
Intel
- intel-compute-runtime: a.k.a. the Neo OpenCL runtime, the open-source implementation for Intel HD Graphics GPU on Gen8 (Broadwell) and beyond.
- beignetAUR: the open-source implementation for Intel HD Graphics GPU on Gen7 (Ivy Bridge) and beyond, deprecated by Intel in favour of NEO OpenCL driver, remains recommended solution for legacy HW platforms (e.g. Ivy Bridge, Haswell).
- intel-openclAUR: the proprietary implementation for Intel HD Graphics GPU on Gen7 (Ivy Bridge) and beyond, deprecated by Intel in favour of NEO OpenCL driver, remains recommended solution for legacy HW platforms (e.g. Ivy Bridge, Haswell).
- intel-opencl-runtimeAUR: the implementation for Intel Core and Xeon processors. It also supports non-Intel CPUs.
Others
- pocl: LLVM-based OpenCL implementation (hardware independent)
OpenCL ICD loader (libOpenCL.so)
The OpenCL ICD loader is supposed to be a platform-agnostic library that provides the means to load device-specific drivers through the OpenCL API. Most OpenCL vendors provide their own implementation of an OpenCL ICD loader, and these should all work with the other vendors' OpenCL implementations. Unfortunately, most vendors do not provide completely up-to-date ICD loaders, and therefore Arch Linux has decided to provide this library from a separate project (ocl-icd) which currently provides a functioning implementation of the current OpenCL API.
The other ICD loader libraries are installed as part of each vendor's SDK. If you want to ensure the ICD loader from the ocl-icd package is used, you can create a file in /etc/ld.so.conf.d
which adds /usr/lib
to the dynamic program loader's search directories:
/etc/ld.so.conf.d/00-usrlib.conf
/usr/lib
This is necessary because all the SDKs add their runtime's lib directories to the search path through ld.so.conf.d
files.
The available packages containing various OpenCL ICDs are:
- ocl-icd: recommended, most up-to-date
- intel-openclAUR by Intel. Provides OpenCL 2.0, deprecated in favour of intel-compute-runtime.
OpenCL Development
For OpenCL development, the bare minimum additional packages required, are:
- ocl-icd: OpenCL ICD loader implementation, up to date with the latest OpenCL specification.
- opencl-headers: OpenCL C/C++ API headers.
The vendors' SDKs provide a multitude of tools and support libraries:
- intel-opencl-sdkAUR: Intel OpenCL SDK (old version, new OpenCL SDKs are included in the INDE and Intel Media Server Studio)
-
amdapp-sdkAUR: This package is installed as
/opt/AMDAPP
and apart from SDK files it also contains a number of code samples (/opt/AMDAPP/SDK/samples/
). It also provides theclinfo
utility which lists OpenCL platforms and devices present in the system and displays detailed information about them. As the SDK itself contains a CPU OpenCL driver, no extra driver is needed to execute OpenCL on CPU devices (regardless of its vendor). - cuda: Nvidia's GPU SDK which includes support for OpenCL 1.1.
Implementations
To see which OpenCL implementations are currently active on your system, use the following command:
$ ls /etc/OpenCL/vendors
To find out all possible (known) properties of the OpenCL platform and devices available on the system, install clinfo.
Language bindings
- JavaScript/HTML5: WebCL
- Python: python-pyopencl
- D: cl4d or DCompute
- Java: Aparapi or JOCL (a part of JogAmp)
- Mono/.NET: Open Toolkit
- Go: OpenCL bindings for Go
- Racket: Racket has a native interface on PLaneT that can be installed via raco.
- Rust: ocl
- Julia: OpenCL.jl
SYCL
SYCL is another open and royalty-free standard by the Khronos Group that defines a single-source heterogeneous programming model for C++ on top of OpenCL 1.2.
SYCL consists of a runtime part and a C++ device compiler. The device compiler may target any number and kind of accelerators. The runtime is required to fall back to a pure CPU code path in case no OpenCL implementation can be found.
Implementations
- computecppAUR Codeplay's proprietary implementation of SYCL 1.2.1. Can target SPIR, SPIR-V and experimentally PTX (NVIDIA) as device targets.
- trisycl-gitAUR: Open source implementation mainly driven by Xilinx.
- hipsycl-cuda-gitAUR and hipsycl-rocm-gitAUR: Free implementation built over AMD's HIP instead of OpenCL. Is able to run on AMD and NVIDIA GPUs.
Checking For SPIR Support
Most SYCL implementations are able to compile the accelerator code to SPIR or SPIR-V. Both are intermediate languages designed by Khronos that can be consumed by an OpenCL driver. To check whether SPIR or SPIR-V are supported clinfo can be used:
$ clinfo | grep -i spir
Platform Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer cl_intel_vec_len_hint IL version SPIR-V_1.0 SPIR versions 1.2
ComputeCpp additionally ships with a tool that summarizes the relevant system information:
$ computecpp_info
Device 0: Device is supported : UNTESTED - Untested OS CL_DEVICE_NAME : Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz CL_DEVICE_VENDOR : Intel(R) Corporation CL_DRIVER_VERSION : 18.1.0.0920 CL_DEVICE_TYPE : CL_DEVICE_TYPE_CPU
Drivers known to at least partially support SPIR or SPIR-V include intel-compute-runtime, intel-opencl-runtimeAUR, pocl and amdgpu-pro-openclAUR[broken link: package not found].
SYCL Development
SYCL requires a working C++11 environment to be set up. There are a few open source libraries available:
- ComputeCpp SDK: Collection of code examples, cmake integration for ComputeCpp
- SYCL-DNN: Neural network performance primitives
- SYCL-BLAS: Linear algebra performance primitives
- VisionCpp: Computer Vision library
- SYCL Parallel STL: GPU implementation of the C++17 parallel algorithms
CUDA
CUDA (Compute Unified Device Architecture) is NVIDIA's proprietary, closed-source parallel computing architecture and framework. It requires an NVIDIA GPU, and consists of several components:
- Required:
- Proprietary NVIDIA kernel module
- CUDA "driver" and "runtime" libraries
- Optional:
- Additional libraries: CUBLAS, CUFFT, CUSPARSE, etc.
- CUDA toolkit, including the
nvcc
compiler - CUDA SDK, which contains many code samples and examples of CUDA and OpenCL programs
The kernel module and CUDA "driver" library are shipped in nvidia and opencl-nvidia. The "runtime" library and the rest of the CUDA toolkit are available in cuda. cuda-gdb
needs ncurses5-compat-libsAUR to be installed, see FS#46598.
Development
The cuda package installs all components in the directory /opt/cuda
. For compiling CUDA code, add /opt/cuda/include
to your include path in the compiler instructions. For example, this can be accomplished by adding -I/opt/cuda/include
to the compiler flags/options. To use nvcc
, a gcc
wrapper provided by NVIDIA, add /opt/cuda/bin
to your path.
To find whether the installation was successful and whether CUDA is up and running, you can compile the samples installed on /opt/cuda/samples
(you can simply run make
inside the directory, altough it is a good practice to copy the /opt/cuda/samples
directory to your home directory before compiling) and run the compiled samples. One way to check the installation is to run the deviceQuery
sample.
Language bindings
- Fortran: PGI CUDA Fortran Compiler
- Haskell: The accelerate package lists available CUDA backends
- Java: JCuda
- Mathematica: CUDAlink
- Mono/.NET: CUDAfy.NET, managedCuda
- Perl: KappaCUDA, CUDA-Minimal
- Python: python-pycuda
- Ruby: rbcuda
- Rust: cuda-sys (bindings) or RustaCUDA (high-level wrapper)
ROCm
ROCm (Radeon Open Compute) is AMD's open-source parallel computing architecture and framework. Although it requires an AMD GPU some ROCm tools are hardware agnostic. See the ROCm for Arch Linux repository for more information and installation instructions.
OpenCL Image Support
The latest ROCm versions now includes OpenCL Image Support used by GPGPU accelerated software such as Darktable. ROCm with the AMDGPU open source graphics driver are all that is required. AMDGPU PRO is not required.
$ /opt/rocm/bin/clinfo | grep -i "image support"
Image support Yes
List of GPGPU accelerated software
- Bitcoin
- Blender – CUDA support for Nvidia GPUs and OpenCL support for AMD GPUs. More information here.
- BOINC
- FFmpeg – more information here.
- Folding@home
- GIMP – experimental – more information here.
- HandBrake
- Hashcat
- LibreOffice Calc – more information here.
- clinfo – Find all possible (known) properties of the OpenCL platform and devices available on the system.
- cuda_memtestAUR – a GPU memtest. Despite its name, is supports both CUDA and OpenCL.
- darktable – OpenCL feature requires at least 1 GB RAM on GPU and Image support (check output of clinfo command).
- DaVinci Resolve - a non-linear video editor. Can use both OpenCL and CUDA.
- imagemagick
- lc0AUR - Used for searching the neural network (supports tensorflow, OpenCL, CUDA, and openblas)
- opencv
- pyritAUR
- python-pytorch-cuda - PyTorch with CUDA backend
- tensorflow-cuda - Port of TensorFlow to CUDA
- tensorflow-computecppAUR - Port of TensorFlow to SYCL