netcl wiki
api

netcl.nn — Modules, Layers, ResNet

netcl.nn — Modules, Layers, ResNet

netcl.nn is the model-building layer of netcl. It is shaped after PyTorch's torch.nn but tuned for the OpenCL pipeline: every Module is a Tensor container, every Parameter is just a Tensor with requires_grad=True, and the heavy lifting happens in the JIT Compiler through the ops API and the autograd API.

All public symbols are re-exported from the package root:

from netcl.nn import (
    Module, Linear, Conv2d, BatchNorm2d, ReLU, LeakyReLU, Sigmoid, Tanh,
    Dropout, MaxPool2d, Flatten, LayerNorm, MultiheadAttention,
    TransformerEncoderLayer, Sequential, Parameter,
    MSELoss, L1Loss, SmoothL1Loss, BCELoss, BCEWithLogitsLoss,
    mse_loss, cross_entropy,
    xavier_uniform, kaiming_uniform, constant,
    build_sequential, example_mlp_config, example_cnn_config,
)
# ResNet18 is lazy-loaded to avoid pulling in the full model at import time:
from netcl.nn import ResNet18

Class Hierarchy

Module                          # base class
├── Parameter                   # Tensor wrapper, auto-registered
├── Linear                      # y = x @ W^T + b
├── Conv2d                      # 2D convolution
├── MaxPool2d
├── BatchNorm2d                 # incl. fused BN+ReLU at inference
├── LayerNorm
├── MultiheadAttention
├── TransformerEncoderLayer
├── Dropout
├── Flatten
├── Sigmoid / Tanh / ReLU / LeakyReLU
├── MSELoss / L1Loss / SmoothL1Loss / BCELoss / BCEWithLogitsLoss
├── Sequential                  # ordered container of Modules
└── ResNet18                    # pre-built ResNet model

nn.Module

Module is the base class. You subclass it, assign submodules and parameters in __init__, and implement forward(x). The training loop just calls model(x), which is implemented as self.forward(x).

from netcl.nn import Module, Linear
import netcl.autograd as ag

class MyNet(Module):
    def __init__(self, queue):
        super().__init__()
        self.fc1 = Linear(queue, 784, 256)
        self.fc2 = Linear(queue, 256, 10)

    def forward(self, x):
        x = ag.relu(self.fc1(x))
        return self.fc2(x)
Method Purpose
parameters() Returns a list of every Parameter (and requires_grad=True Tensor) in this module and all submodules.
train(mode=True) / eval() Switches training vs. eval mode — controls Dropout and BatchNorm2d behavior.
state_dict() / load_state_dict(d) Serialization helpers; implemented on each concrete subclass (Linear, Conv2d, etc.). Not implemented on the base Module.
compile_forward(sample_input) Compiles the eval-mode forward for a fixed input shape into a CompiledForward callable that skips Python graph construction on inference.
__call__(x) Alias for self.forward(x).

nn.Parameter

Parameter is a thin wrapper around a Tensor that is automatically registered with the parent Module when assigned as an attribute. The Tensor it wraps has requires_grad=True by default.

from netcl.nn import Parameter
from netcl.core.device import manager

q = manager.default("auto").queue
w = Parameter.from_shape(q, (10, 784), dtype="float32")
# Once assigned to a Module attribute, `w` appears in module.parameters().

In practice you almost never instantiate Parameter by hand — layer constructors (Linear, Conv2d, BatchNorm, …) build the right parameters for you with the right initial values from init.

nn.Linear

from netcl.nn import Linear
layer = Linear(queue, in_features, out_features, bias=True)
# queue is optional — omit it and Linear auto-discovers the default device
layer = Linear(in_features=784, out_features=256)

A fully-connected layer that computes y = x @ W^T + b. W has shape (out_features, in_features); b has shape (out_features,). Both are Parameters, initialized with Kaiming normal for W and zeros for b. The forward pass dispatches to the matmul op (auto-tuned via the KernelSelector and the JIT Compiler) plus a bias_add.

nn.Conv2d

from netcl.nn import Conv2d
layer = Conv2d(
    in_channels, out_channels, kernel_size,
    stride=1, padding=0, dilation=1, groups=1, bias=True,
)

2D convolution over NCHW tensors. Implemented as im2col plus a matmul by default; the optimized variant in ops/conv2d_optimized.py is selected by the KernelSelector when the workload and device warrant it. The 1×1 specialization goes through implicit-GEMM, and 3×3 stride-1 maps to the Winograd fused variant unless NETCL_CONV_WINOGRAD=0.

nn.MaxPool2d

2D max-pooling layer. Has a backward kernel registered in autograd/ops.py so it is fully differentiable.

from netcl.nn import MaxPool2d
pool = MaxPool2d(kernel_size=2, stride=2)

nn.BatchNorm2d

Per-channel normalization. In training mode it computes batch statistics; in eval mode it uses the running mean and variance. The fused variant combines BN+ReLU into a single kernel (see fused-ops table).

from netcl.nn import BatchNorm2d
bn = BatchNorm2d(num_features=64, eps=1e-5, momentum=0.1)

When model.eval() is set, bn uses its running_mean / running_var buffers and skips the batch-statistics update.

nn.LayerNorm

Normalizes across the last len(normalized_shape) dimensions of the input. No running-statistics state.

from netcl.nn import LayerNorm
ln = LayerNorm(normalized_shape=(64,))

nn.Dropout

Stochastic regularization. Inactive when model.eval() is in effect.

from netcl.nn import Dropout
drop = Dropout(p=0.5)

nn.build_sequential

Builds a Sequential model from a list of layer config dicts. Useful for config-file-driven architectures or hyperparameter search over layer widths.

from netcl.nn import build_sequential, example_mlp_config
from netcl.core.device import manager

q = manager.default("auto").queue

# Use a built-in config template
config = example_mlp_config(input_dim=784, hidden=256, num_classes=10)
model = build_sequential(q, config)

# Or write your own config
config = [
    {"type": "Linear", "args": {"in_features": 784, "out_features": 256}},
    {"type": "ReLU",   "args": {}},
    {"type": "Linear", "args": {"in_features": 256, "out_features": 10}},
]
model = build_sequential(q, config)

build_sequential_from_json(queue, path) does the same from a JSON file.

nn.ResNet18

Pre-built ResNet-18 model. Forward expects NCHW input. Default num_classes=10 (CIFAR-friendly); override for ImageNet heads.

from netcl.nn import ResNet18
from netcl.core.device import manager

q = manager.default("auto").queue
model = ResNet18(queue=q, num_classes=10)

Internally built from Conv2d, BatchNorm2d, ReLU, and the residual add — all dispatched to fused kernels where supported (see fused-ops table).

Fused Ops

Fused function Composed of Where it lives
linear_relu LinearReLU autograd/ops.py
conv2d_relu Conv2dReLU autograd/ops.py
conv2d_bias_relu Conv2dBiasReLU autograd/ops.py
conv2d_relu_bn Conv2dReLUBatchNorm2d autograd/ops.py
batch_norm2d_relu BatchNorm2dReLU autograd/ops.py
add_relu elementwise a + bReLU autograd/ops.py
bias_add_relu ConvBiasReLU autograd/ops.py
matmul_bias_relu matmulbias_addReLU autograd/ops.py
fused_ops.conv2d_relu_bn Conv2dReLUBatchNorm2d (op-level) ops/fused_ops.py
fused_ops.batch_norm2d_relu BatchNorm2dReLU (op-level) ops/fused_ops.py

The nn.functional wrappers call into autograd/ops.py, so every fused op is fully differentiable and participates in the Tape.

Initialization (nn.init)

Initializers are in netcl.nn.init and are also re-exported from netcl.nn. They work in-place on a Tensor parameter.

from netcl.nn import xavier_uniform, kaiming_uniform, constant

xavier_uniform(model.fc1.weight)      # Tanh-family activations
kaiming_uniform(model.fc1.weight)     # ReLU-family (the default for Linear/Conv2d)
constant(model.bn.weight, 1.0)        # any constant fill
Initializer Used for
kaiming_uniform Linear.weight, Conv2d.weight (the default).
xavier_uniform Tanh-family activations.
constant Any constant fill.

Functional (nn.functional)

Stateless variants of activations and losses. These wrap the same autograd ops that the layer classes call, so they are fully differentiable under a Tape.

from netcl.nn import functional as F

y    = F.relu(x)
y    = F.sigmoid(x)
loss = F.cross_entropy(logits, targets)
loss = F.mse_loss(pred, target)
loss = F.binary_cross_entropy(pred, target)
Function Notes
relu, leaky_relu Activations; both have backward kernels.
sigmoid, tanh Standard activations.
cross_entropy Logits + integer targets.
mse_loss Mean-squared error.
binary_cross_entropy Element-wise BCE.
binary_cross_entropy_with_logits BCE applied after sigmoid.
l1_loss, smooth_l1_loss L1 and Huber losses.
flatten Reshape op for use inside a Tape.

Putting It Together: a Small Classifier

import numpy as np
import netcl.autograd as ag
from netcl.core.device import manager
from netcl.nn import Linear, ReLU, Sequential, Dropout, cross_entropy
from netcl.optim import Adam

q = manager.default("auto").queue
model = Sequential(
    Linear(q, 784, 256), ReLU(), Dropout(p=0.1),
    Linear(q, 256, 128), ReLU(),
    Linear(q, 128, 10),
)
opt = Adam(model.parameters(), lr=3e-4)

for x, y in loader:                       # x: (B, 784),  y: (B,)
    with ag.Tape() as tape:
        logits = model(x)
        loss = cross_entropy(logits, y)
    tape.backward(loss)
    opt.step()
    opt.zero_grad()

See also