As part of a group project in one university course, I had to pick up OpenCV. Our group agreed to use Python, so I need to install OpenCV for Python. Should be easy, my past self thought.

Pre-built CPU-only OpenCV packages for Python.

Check the manual build section if you wish to compile the bindings from source to enable additional modules such as CUDA.

~ https://github.com/opencv/opencv-python?tab=readme-ov-file#opencv-on-wheels

But reality is cruel. the opencv-python package on pypi.org is only built for CPU. The performance is not terrible, to be honest, but it does not match the scale of the project, especially when I have a GPU in the machine already.

I had a little hope when I saw this project that builds opencv-python against CUDA, Nvidia Video Codec SDK and cuDNN. But the Python code still does not run. Thus, I decided to make myself a build.

Now, many online solutions are pretty manual: git clone, apt install, etc. Instead, I devised a solution that utilizes the tools for Python: pip and conda. (mamba is conda-compatible, so the instructions apply to mamba as well). We will perform on Ubuntu and bash, other environment can also try with few adjustments.

First, we create a new environment using conda. Here I chose the name compvision and selected python 3.11. Since NVIDIA has provided conda package for the CUDA SDK, I include it in the command. And I also add libva for Intel VA-API for building OpenCL in OpenCV:

conda create -n compvision python=3.11 cuda cudnn tbb libva -c nvidia

There is only one small issue in the package which makes cicc invisible. This is necessary if you are targeting specific CUDA_ARCH_BIN. We can fix it by creating a link in bin folder of the virtual environment.

ln -s $CONDA_PREFIX/nvvm/bin/cicc $CONDA_PREFIX/bin/cicc

Now we will build from source opencv-python using pip. Since we are building against CUDA, we must use the contrib variant. And if you don’t need GUI interface like me (using OpenCV through Jupyter notebook), then you can opt-in the headless variant. I also disabled NVIDIA Video Codec SDK since I don’t need it. The following is my build command:

CMAKE_ARGS="\
  -DWITH_CUDA=ON -DWITH_CUDNN=ON -DOPENCV_DNN_CUDA=ON \
  -DWITH_NVCUVID=OFF -DWITH_NVCUVENC=OFF -DWITH_CUBLAS=ON \
  -DCUDA_ARCH_BIN=7.5 -DWITH_TBB=ON \
  " pip install --no-binary opencv-contrib-python-headless opencv-contrib-python-headless

Optionally, if you like the library to run faster while trading off floating-point arithmetic precision, you can add -DENABLE_FAST_FATH=ON -DCUDA_FAST_MATH=ON in CMAKE_ARGS. You also can find more compile examples/flags in StackOverflow, OpenCV docs, and OpenCV GitHub CMakeLists.txt.

Now you can go have a coffee break while waiting for it to build. For this part, it is likely there are multiple compiler errors due to dependencies, and I will not account for them here since it varies depending on the machine. The majority of issues would likely be missing libraries or files located in a different path. If you are using apt, you can use apt-file to search for packages including the missing files.

After the build is complete (which took me roughly 30 minutes), we can check in Python by:

import cv2
print(cv2.getBuildInformation())
print(cv2.cuda.getCudaEnabledDeviceCount())

And there we go:


General configuration for OpenCV 4.10.0 =====================================
  Version control:               unknown

  Extra modules:
    Location (extra):            /tmp/pip-install-q_5601f_/opencv-contrib-python-headless_8899128e35a84dcb90525b8113d5a9a5/opencv_contrib/modules
    Version control (extra):     unknown

  Platform:
    Timestamp:                   2024-11-20T04:43:55Z
    Host:                        Linux 6.8.0-45-generic x86_64
    CMake:                       3.31.0
    CMake generator:             Ninja
    CMake build tool:            /usr/bin/ninja
    Configuration:               Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (16 files):         + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (8 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (36 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (5 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
...
1