PyOpenCL
PyOpenCL
Status: External library — Andreas Kloeckner's PyOpenCL
PyOpenCL is the Python binding to OpenCL maintained by Andreas Kloeckner. netcl uses PyOpenCL for all of its GPU interaction: buffer allocation, kernel build, kernel launch, event handling, context and queue management.
PyOpenCL is a thin wrapper around the OpenCL C API. It exposes
cl.Context, cl.CommandQueue, cl.Buffer, cl.Program,
cl.Kernel, cl.Event as Python objects, and a numpy-friendly
cl.array.Array class for high-level buffer arithmetic. The
Tensor type in netcl is essentially a
hand-written wrapper around a cl.Buffer plus a numpy view.
Overview
netcl depends on PyOpenCL at runtime. If PyOpenCL is not installed,
the import netcl succeeds, but every call into a GPU op raises
ImportError: pyopencl required. The [cpu] and [opencl]
extras in the netcl pyproject.toml install PyOpenCL.
PyOpenCL is licensed under the MIT license; the version netcl
targets is pyopencl >= 2024.1 (which exposes OpenCL 3.0 features
on drivers that support them). The bundled cl.array module is
used as a reference for netcl's own high-level arithmetic but is
not used at runtime — netcl's ops are dispatched through netcl's
own op system.
Where It Lives
- netcl code that uses PyOpenCL is in
core/backend/opencl.pyandcore/tensor.py. - The PyOpenCL import is guarded with a
try / except ImportErrorin every file that uses it, so the CPU-only build is importable.
How It Works
A typical PyOpenCL pattern in netcl:
import pyopencl as cl
# Context + queue — netcl's DeviceManager wraps this.
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
# Build a kernel.
src = """
__kernel void relu(__global const float* in, __global float* out) {
int gid = get_global_id(0);
out[gid] = in[gid] > 0.0f ? in[gid] : 0.0f;
}
"""
prg = cl.Program(ctx, src).build()
relu = prg.relu
# Allocate device buffers.
n = 1024
in_buf = cl.Buffer(ctx, cl.mem_flags.READ_WRITE, n * 4)
out_buf = cl.Buffer(ctx, cl.mem_flags.READ_WRITE, n * 4)
# Copy input.
import numpy as np
host_in = np.random.randn(n).astype(np.float32)
cl.enqueue_copy(queue, in_buf, host_in)
# Launch.
relu(queue, (n,), None, in_buf, out_buf)
# Copy back.
host_out = np.empty(n, dtype=np.float32)
cl.enqueue_copy(queue, host_out, out_buf).wait()
netcl's equivalent:
import netcl as nc
x = nc.Tensor.from_host(host_in)
y = nc.relu(x)
host_out = y.to_host()
The netcl version is identical at the call site; the implementation is built on top of the PyOpenCL version.
Code Example
A common use of PyOpenCL directly in netcl is the KernelSpec helper, which lets you write a one-off kernel and run it from netcl:
import netcl as nc
spec = nc.KernelSpec(
name="scale_by_two",
src="__kernel void scale_by_two(__global float* x) {"
" int gid = get_global_id(0);"
" x[gid] *= 2.0f;"
"}",
)
prg = spec.build(queue)
prg.run(queue, (n,), None, x)
The Tutorial: Custom OpenCL Kernel walks through a more complete example.
Performance & Trade-offs
- PyOpenCL is fast. The Python overhead per call is in the low microseconds; the GPU work is the bottleneck, not the binding.
- The
cl.arraymodule is convenient for prototyping but is not used at runtime in netcl. netcl's own op system has tighter integration with the BufferPool and the JIT Compiler. - If you need to drop down to PyOpenCL for a one-off kernel,
prefer
nc.KernelSpecovercl.Program— netcl's spec records the work-group size that the autotuner picked, so subsequent runs use the right size automatically. - PyOpenCL installs a SIGINT handler on first import to flush
pending DMA. If you install your own handler, call
cl.tools.install_sigint_handler()first.
See also
- OpenCL — the underlying standard.
- PyOpenCL — the netcl-side use of PyOpenCL.
- Tensor — the netcl wrapper around
cl.Buffer. - BufferPool — the pool of
cl.Buffers. - KernelSpec — the netcl helper for one-off kernels.
- PyOpenCL — this article.