Building OpenCV with CUDA for Python in 3 lines through conda
As part of a group project in one university course, I had to pick up OpenCV. Our group agreed to use Python, so I need to install OpenCV for Python. Should be easy, my past self thought.
Pre-built CPU-only OpenCV packages for Python.
Check the manual build section if you wish to compile the bindings from source to enable additional modules such as CUDA.
~ https://github.com/opencv/opencv-python?tab=readme-ov-file#opencv-on-wheels
But reality is cruel. the opencv-python package on pypi.org is only built for CPU. The performance is not terrible, to be honest, but it does not match the scale of the project, especially when I have a GPU in the machine already.
I had a little hope when I saw this project that builds opencv-python against CUDA, Nvidia Video Codec SDK and cuDNN. But the Python code still does not run. Thus, I decided to make myself a build.
Now, many online solutions are pretty manual: git clone, apt install, etc. Instead, I devised a solution that utilizes the tools for Python: pip and conda. (mamba is conda-compatible, so the instructions apply to mamba as well).
We will perform on Ubuntu and bash, other environment can also try with few adjustments.
First, we create a new environment using conda. Here I chose the name compvision and selected python 3.11. Since NVIDIA has provided conda package for the CUDA SDK, I include it in the command. And I also add libva for Intel VA-API for building OpenCL in OpenCV:
conda create -n compvision python=3.11 cuda cudnn tbb libva -c nvidia
There is only one small issue in the package which makes cicc invisible. This is necessary if you are targeting specific CUDA_ARCH_BIN. We can fix it by creating a link in bin folder of the virtual environment.
ln -s $CONDA_PREFIX/nvvm/bin/cicc $CONDA_PREFIX/bin/cicc
Now we will build from source opencv-python using pip. Since we are building against CUDA, we must use the contrib variant. And if you don’t need GUI interface like me (using OpenCV through Jupyter notebook), then you can opt-in the headless variant. I also disabled NVIDIA Video Codec SDK since I don’t need it. The following is my build command:
CMAKE_ARGS="\
-DWITH_CUDA=ON -DWITH_CUDNN=ON -DOPENCV_DNN_CUDA=ON \
-DWITH_NVCUVID=OFF -DWITH_NVCUVENC=OFF -DWITH_CUBLAS=ON \
-DCUDA_ARCH_BIN=7.5 -DWITH_TBB=ON \
" pip install --no-binary opencv-contrib-python-headless opencv-contrib-python-headless
Optionally, if you like the library to run faster while trading off floating-point arithmetic precision, you can add -DENABLE_FAST_FATH=ON -DCUDA_FAST_MATH=ON in CMAKE_ARGS. You also can find more compile examples/flags in StackOverflow, OpenCV docs, and OpenCV GitHub CMakeLists.txt.
Now you can go have a coffee break while waiting for it to build. For this part, it is likely there are multiple compiler errors due to dependencies, and I will not account for them here since it varies depending on the machine. The majority of issues would likely be missing libraries or files located in a different path. If you are using apt, you can use apt-file to search for packages including the missing files.
After the build is complete (which took me roughly 30 minutes), we can check in Python by:
import cv2
print(cv2.getBuildInformation())
print(cv2.cuda.getCudaEnabledDeviceCount())
And there we go:
General configuration for OpenCV 4.10.0 =====================================
Version control: unknown
Extra modules:
Location (extra): /tmp/pip-install-q_5601f_/opencv-contrib-python-headless_8899128e35a84dcb90525b8113d5a9a5/opencv_contrib/modules
Version control (extra): unknown
Platform:
Timestamp: 2024-11-20T04:43:55Z
Host: Linux 6.8.0-45-generic x86_64
CMake: 3.31.0
CMake generator: Ninja
CMake build tool: /usr/bin/ninja
Configuration: Release
CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (16 files): + SSSE3 SSE4_1
SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (0 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (8 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (36 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
AVX512_SKX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
...
1