netcl.nn — Modules, Layers, ResNet
netcl.nn — Modules, Layers, ResNet
netcl.nn is the model-building layer of netcl. It is shaped after PyTorch's torch.nn
but tuned for the OpenCL pipeline: every Module is a Tensor
container, every Parameter is just a Tensor with
requires_grad=True, and the heavy lifting happens in the JIT Compiler
through the ops API and the autograd API.
All public symbols are re-exported from the package root:
from netcl.nn import (
Module, Linear, Conv2d, BatchNorm2d, ReLU, LeakyReLU, Sigmoid, Tanh,
Dropout, MaxPool2d, Flatten, LayerNorm, MultiheadAttention,
TransformerEncoderLayer, Sequential, Parameter,
MSELoss, L1Loss, SmoothL1Loss, BCELoss, BCEWithLogitsLoss,
mse_loss, cross_entropy,
xavier_uniform, kaiming_uniform, constant,
build_sequential, example_mlp_config, example_cnn_config,
)
# ResNet18 is lazy-loaded to avoid pulling in the full model at import time:
from netcl.nn import ResNet18
Class Hierarchy
Module # base class
├── Parameter # Tensor wrapper, auto-registered
├── Linear # y = x @ W^T + b
├── Conv2d # 2D convolution
├── MaxPool2d
├── BatchNorm2d # incl. fused BN+ReLU at inference
├── LayerNorm
├── MultiheadAttention
├── TransformerEncoderLayer
├── Dropout
├── Flatten
├── Sigmoid / Tanh / ReLU / LeakyReLU
├── MSELoss / L1Loss / SmoothL1Loss / BCELoss / BCEWithLogitsLoss
├── Sequential # ordered container of Modules
└── ResNet18 # pre-built ResNet model
nn.Module
Module is the base class. You subclass it, assign submodules and parameters in
__init__, and implement forward(x). The training loop just calls model(x), which is
implemented as self.forward(x).
from netcl.nn import Module, Linear
import netcl.autograd as ag
class MyNet(Module):
def __init__(self, queue):
super().__init__()
self.fc1 = Linear(queue, 784, 256)
self.fc2 = Linear(queue, 256, 10)
def forward(self, x):
x = ag.relu(self.fc1(x))
return self.fc2(x)
| Method | Purpose |
|---|---|
parameters() |
Returns a list of every Parameter (and requires_grad=True Tensor) in this module and all submodules. |
train(mode=True) / eval() |
Switches training vs. eval mode — controls Dropout and BatchNorm2d behavior. |
state_dict() / load_state_dict(d) |
Serialization helpers; implemented on each concrete subclass (Linear, Conv2d, etc.). Not implemented on the base Module. |
compile_forward(sample_input) |
Compiles the eval-mode forward for a fixed input shape into a CompiledForward callable that skips Python graph construction on inference. |
__call__(x) |
Alias for self.forward(x). |
nn.Parameter
Parameter is a thin wrapper around a Tensor that is automatically
registered with the parent Module when assigned as an attribute. The Tensor it wraps
has requires_grad=True by default.
from netcl.nn import Parameter
from netcl.core.device import manager
q = manager.default("auto").queue
w = Parameter.from_shape(q, (10, 784), dtype="float32")
# Once assigned to a Module attribute, `w` appears in module.parameters().
In practice you almost never instantiate Parameter by hand — layer constructors (Linear,
Conv2d, BatchNorm, …) build the right parameters for you with the right initial
values from init.
nn.Linear
from netcl.nn import Linear
layer = Linear(queue, in_features, out_features, bias=True)
# queue is optional — omit it and Linear auto-discovers the default device
layer = Linear(in_features=784, out_features=256)
A fully-connected layer that computes y = x @ W^T + b. W has shape
(out_features, in_features); b has shape (out_features,). Both are Parameters,
initialized with Kaiming normal for W and zeros for b. The
forward pass dispatches to the matmul op (auto-tuned via the
KernelSelector and the JIT Compiler) plus a
bias_add.
nn.Conv2d
from netcl.nn import Conv2d
layer = Conv2d(
in_channels, out_channels, kernel_size,
stride=1, padding=0, dilation=1, groups=1, bias=True,
)
2D convolution over NCHW tensors. Implemented as im2col plus a matmul by
default; the optimized variant in ops/conv2d_optimized.py is selected by the
KernelSelector when the workload and device warrant it. The 1×1 specialization
goes through implicit-GEMM, and 3×3 stride-1 maps to the Winograd fused variant
unless NETCL_CONV_WINOGRAD=0.
nn.MaxPool2d
2D max-pooling layer. Has a backward kernel registered in autograd/ops.py so it is
fully differentiable.
from netcl.nn import MaxPool2d
pool = MaxPool2d(kernel_size=2, stride=2)
nn.BatchNorm2d
Per-channel normalization. In training mode it computes batch statistics; in eval mode it uses the running mean and variance. The fused variant combines BN+ReLU into a single kernel (see fused-ops table).
from netcl.nn import BatchNorm2d
bn = BatchNorm2d(num_features=64, eps=1e-5, momentum=0.1)
When model.eval() is set, bn uses its running_mean / running_var buffers and skips
the batch-statistics update.
nn.LayerNorm
Normalizes across the last len(normalized_shape) dimensions of the input. No
running-statistics state.
from netcl.nn import LayerNorm
ln = LayerNorm(normalized_shape=(64,))
nn.Dropout
Stochastic regularization. Inactive when model.eval() is in effect.
from netcl.nn import Dropout
drop = Dropout(p=0.5)
nn.build_sequential
Builds a Sequential model from a list of layer config dicts. Useful for config-file-driven
architectures or hyperparameter search over layer widths.
from netcl.nn import build_sequential, example_mlp_config
from netcl.core.device import manager
q = manager.default("auto").queue
# Use a built-in config template
config = example_mlp_config(input_dim=784, hidden=256, num_classes=10)
model = build_sequential(q, config)
# Or write your own config
config = [
{"type": "Linear", "args": {"in_features": 784, "out_features": 256}},
{"type": "ReLU", "args": {}},
{"type": "Linear", "args": {"in_features": 256, "out_features": 10}},
]
model = build_sequential(q, config)
build_sequential_from_json(queue, path) does the same from a JSON file.
nn.ResNet18
Pre-built ResNet-18 model. Forward expects NCHW input. Default num_classes=10
(CIFAR-friendly); override for ImageNet heads.
from netcl.nn import ResNet18
from netcl.core.device import manager
q = manager.default("auto").queue
model = ResNet18(queue=q, num_classes=10)
Internally built from Conv2d, BatchNorm2d, ReLU, and the residual add — all
dispatched to fused kernels where supported (see fused-ops table).
Fused Ops
| Fused function | Composed of | Where it lives |
|---|---|---|
linear_relu |
Linear → ReLU |
autograd/ops.py |
conv2d_relu |
Conv2d → ReLU |
autograd/ops.py |
conv2d_bias_relu |
Conv2d → Bias → ReLU |
autograd/ops.py |
conv2d_relu_bn |
Conv2d → ReLU → BatchNorm2d |
autograd/ops.py |
batch_norm2d_relu |
BatchNorm2d → ReLU |
autograd/ops.py |
add_relu |
elementwise a + b → ReLU |
autograd/ops.py |
bias_add_relu |
Conv → Bias → ReLU |
autograd/ops.py |
matmul_bias_relu |
matmul → bias_add → ReLU |
autograd/ops.py |
fused_ops.conv2d_relu_bn |
Conv2d → ReLU → BatchNorm2d (op-level) |
ops/fused_ops.py |
fused_ops.batch_norm2d_relu |
BatchNorm2d → ReLU (op-level) |
ops/fused_ops.py |
The nn.functional wrappers call into autograd/ops.py, so every fused op is fully
differentiable and participates in the Tape.
Initialization (nn.init)
Initializers are in netcl.nn.init and are also re-exported from netcl.nn. They work
in-place on a Tensor parameter.
from netcl.nn import xavier_uniform, kaiming_uniform, constant
xavier_uniform(model.fc1.weight) # Tanh-family activations
kaiming_uniform(model.fc1.weight) # ReLU-family (the default for Linear/Conv2d)
constant(model.bn.weight, 1.0) # any constant fill
| Initializer | Used for |
|---|---|
kaiming_uniform |
Linear.weight, Conv2d.weight (the default). |
xavier_uniform |
Tanh-family activations. |
constant |
Any constant fill. |
Functional (nn.functional)
Stateless variants of activations and losses. These wrap the same autograd ops that the layer classes call, so they are fully differentiable under a Tape.
from netcl.nn import functional as F
y = F.relu(x)
y = F.sigmoid(x)
loss = F.cross_entropy(logits, targets)
loss = F.mse_loss(pred, target)
loss = F.binary_cross_entropy(pred, target)
| Function | Notes |
|---|---|
relu, leaky_relu |
Activations; both have backward kernels. |
sigmoid, tanh |
Standard activations. |
cross_entropy |
Logits + integer targets. |
mse_loss |
Mean-squared error. |
binary_cross_entropy |
Element-wise BCE. |
binary_cross_entropy_with_logits |
BCE applied after sigmoid. |
l1_loss, smooth_l1_loss |
L1 and Huber losses. |
flatten |
Reshape op for use inside a Tape. |
Putting It Together: a Small Classifier
import numpy as np
import netcl.autograd as ag
from netcl.core.device import manager
from netcl.nn import Linear, ReLU, Sequential, Dropout, cross_entropy
from netcl.optim import Adam
q = manager.default("auto").queue
model = Sequential(
Linear(q, 784, 256), ReLU(), Dropout(p=0.1),
Linear(q, 256, 128), ReLU(),
Linear(q, 128, 10),
)
opt = Adam(model.parameters(), lr=3e-4)
for x, y in loader: # x: (B, 784), y: (B,)
with ag.Tape() as tape:
logits = model(x)
loss = cross_entropy(logits, y)
tape.backward(loss)
opt.step()
opt.zero_grad()
See also
- Tensor — the Tensor that Module wraps.
- ops API — the elementwise, matmul, and conv2d primitives that the layers call into.
- autograd API — the Tape and
Nodethat power the backward pass. - JIT Compiler — how the elementwise, matmul, and conv kernels are compiled and fused.
- MNIST with MLP — a complete training example that uses Linear, Dropout, and Sequential.
- core API — the DeviceManager and Tensor foundation.
- optim API — the SGD / Adam / AdamW optimizers.