api

netcl.nn — Modules, Layers, ResNet

`netcl.nn` — Modules, Layers, ResNet

netcl.nn is the model-building layer of netcl. It is shaped after PyTorch's torch.nn but tuned for the OpenCL pipeline: every Module is a Tensor container, every Parameter is just a Tensor with requires_grad=True, and the heavy lifting happens in the JIT Compiler through the ops API and the autograd API.

All public symbols are re-exported from the package root:

from netcl.nn import (
    Module, Linear, Conv2d, BatchNorm2d, ReLU, LeakyReLU, Sigmoid, Tanh,
    Dropout, MaxPool2d, Flatten, LayerNorm, MultiheadAttention,
    TransformerEncoderLayer, Sequential, Parameter,
    MSELoss, L1Loss, SmoothL1Loss, BCELoss, BCEWithLogitsLoss,
    mse_loss, cross_entropy,
    xavier_uniform, kaiming_uniform, constant,
    build_sequential, example_mlp_config, example_cnn_config,
)
# ResNet18 is lazy-loaded to avoid pulling in the full model at import time:
from netcl.nn import ResNet18

Class Hierarchy

Module                          # base class
├── Parameter                   # Tensor wrapper, auto-registered
├── Linear                      # y = x @ W^T + b
├── Conv2d                      # 2D convolution
├── MaxPool2d
├── BatchNorm2d                 # incl. fused BN+ReLU at inference
├── LayerNorm
├── MultiheadAttention
├── TransformerEncoderLayer
├── Dropout
├── Flatten
├── Sigmoid / Tanh / ReLU / LeakyReLU
├── MSELoss / L1Loss / SmoothL1Loss / BCELoss / BCEWithLogitsLoss
├── Sequential                  # ordered container of Modules
└── ResNet18                    # pre-built ResNet model

`nn.Module`

Module is the base class. You subclass it, assign submodules and parameters in __init__, and implement forward(x). The training loop just calls model(x), which is implemented as self.forward(x).

from netcl.nn import Module, Linear
import netcl.autograd as ag

class MyNet(Module):
    def __init__(self, queue):
        super().__init__()
        self.fc1 = Linear(queue, 784, 256)
        self.fc2 = Linear(queue, 256, 10)

    def forward(self, x):
        x = ag.relu(self.fc1(x))
        return self.fc2(x)

Method	Purpose
`parameters()`	Returns a list of every `Parameter` (and `requires_grad=True` Tensor) in this module and all submodules.
`train(mode=True)` / `eval()`	Switches training vs. eval mode — controls Dropout and BatchNorm2d behavior.
`state_dict()` / `load_state_dict(d)`	Serialization helpers; implemented on each concrete subclass (Linear, Conv2d, etc.). Not implemented on the base `Module`.
`compile_forward(sample_input)`	Compiles the eval-mode forward for a fixed input shape into a `CompiledForward` callable that skips Python graph construction on inference.
`__call__(x)`	Alias for `self.forward(x)`.

`nn.Parameter`

Parameter is a thin wrapper around a Tensor that is automatically registered with the parent Module when assigned as an attribute. The Tensor it wraps has requires_grad=True by default.

from netcl.nn import Parameter
from netcl.core.device import manager

q = manager.default("auto").queue
w = Parameter.from_shape(q, (10, 784), dtype="float32")
# Once assigned to a Module attribute, `w` appears in module.parameters().

In practice you almost never instantiate Parameter by hand — layer constructors (Linear, Conv2d, BatchNorm, …) build the right parameters for you with the right initial values from init.

`nn.Linear`

from netcl.nn import Linear
layer = Linear(queue, in_features, out_features, bias=True)
# queue is optional — omit it and Linear auto-discovers the default device
layer = Linear(in_features=784, out_features=256)

A fully-connected layer that computes y = x @ W^T + b. W has shape (out_features, in_features); b has shape (out_features,). Both are Parameters, initialized with Kaiming normal for W and zeros for b. The forward pass dispatches to the matmul op (auto-tuned via the KernelSelector and the JIT Compiler) plus a bias_add.

`nn.Conv2d`

from netcl.nn import Conv2d
layer = Conv2d(
    in_channels, out_channels, kernel_size,
    stride=1, padding=0, dilation=1, groups=1, bias=True,
)

2D convolution over NCHW tensors. Implemented as im2col plus a matmul by default; the optimized variant in ops/conv2d_optimized.py is selected by the KernelSelector when the workload and device warrant it. The 1×1 specialization goes through implicit-GEMM, and 3×3 stride-1 maps to the Winograd fused variant unless NETCL_CONV_WINOGRAD=0.

`nn.MaxPool2d`

2D max-pooling layer. Has a backward kernel registered in autograd/ops.py so it is fully differentiable.

from netcl.nn import MaxPool2d
pool = MaxPool2d(kernel_size=2, stride=2)

`nn.BatchNorm2d`

Per-channel normalization. In training mode it computes batch statistics; in eval mode it uses the running mean and variance. The fused variant combines BN+ReLU into a single kernel (see fused-ops table).

from netcl.nn import BatchNorm2d
bn = BatchNorm2d(num_features=64, eps=1e-5, momentum=0.1)

When model.eval() is set, bn uses its running_mean / running_var buffers and skips the batch-statistics update.

`nn.LayerNorm`

Normalizes across the last len(normalized_shape) dimensions of the input. No running-statistics state.

from netcl.nn import LayerNorm
ln = LayerNorm(normalized_shape=(64,))

`nn.Dropout`

Stochastic regularization. Inactive when model.eval() is in effect.

from netcl.nn import Dropout
drop = Dropout(p=0.5)

`nn.build_sequential`

Builds a Sequential model from a list of layer config dicts. Useful for config-file-driven architectures or hyperparameter search over layer widths.

from netcl.nn import build_sequential, example_mlp_config
from netcl.core.device import manager

q = manager.default("auto").queue

# Use a built-in config template
config = example_mlp_config(input_dim=784, hidden=256, num_classes=10)
model = build_sequential(q, config)

# Or write your own config
config = [
    {"type": "Linear", "args": {"in_features": 784, "out_features": 256}},
    {"type": "ReLU",   "args": {}},
    {"type": "Linear", "args": {"in_features": 256, "out_features": 10}},
]
model = build_sequential(q, config)

build_sequential_from_json(queue, path) does the same from a JSON file.

`nn.ResNet18`

Pre-built ResNet-18 model. Forward expects NCHW input. Default num_classes=10 (CIFAR-friendly); override for ImageNet heads.

from netcl.nn import ResNet18
from netcl.core.device import manager

q = manager.default("auto").queue
model = ResNet18(queue=q, num_classes=10)

Internally built from Conv2d, BatchNorm2d, ReLU, and the residual add — all dispatched to fused kernels where supported (see fused-ops table).

Fused Ops

Fused function	Composed of	Where it lives
`linear_relu`	`Linear` → `ReLU`	`autograd/ops.py`
`conv2d_relu`	`Conv2d` → `ReLU`	`autograd/ops.py`
`conv2d_bias_relu`	`Conv2d` → `Bias` → `ReLU`	`autograd/ops.py`
`conv2d_relu_bn`	`Conv2d` → `ReLU` → `BatchNorm2d`	`autograd/ops.py`
`batch_norm2d_relu`	`BatchNorm2d` → `ReLU`	`autograd/ops.py`
`add_relu`	elementwise `a + b` → `ReLU`	`autograd/ops.py`
`bias_add_relu`	`Conv` → `Bias` → `ReLU`	`autograd/ops.py`
`matmul_bias_relu`	`matmul` → `bias_add` → `ReLU`	`autograd/ops.py`
`fused_ops.conv2d_relu_bn`	`Conv2d` → `ReLU` → `BatchNorm2d` (op-level)	`ops/fused_ops.py`
`fused_ops.batch_norm2d_relu`	`BatchNorm2d` → `ReLU` (op-level)	`ops/fused_ops.py`

The nn.functional wrappers call into autograd/ops.py, so every fused op is fully differentiable and participates in the Tape.

Initialization (`nn.init`)

Initializers are in netcl.nn.init and are also re-exported from netcl.nn. They work in-place on a Tensor parameter.

from netcl.nn import xavier_uniform, kaiming_uniform, constant

xavier_uniform(model.fc1.weight)      # Tanh-family activations
kaiming_uniform(model.fc1.weight)     # ReLU-family (the default for Linear/Conv2d)
constant(model.bn.weight, 1.0)        # any constant fill

Initializer	Used for
`kaiming_uniform`	`Linear.weight`, `Conv2d.weight` (the default).
`xavier_uniform`	Tanh-family activations.
`constant`	Any constant fill.

Functional (`nn.functional`)

Stateless variants of activations and losses. These wrap the same autograd ops that the layer classes call, so they are fully differentiable under a Tape.

from netcl.nn import functional as F

y    = F.relu(x)
y    = F.sigmoid(x)
loss = F.cross_entropy(logits, targets)
loss = F.mse_loss(pred, target)
loss = F.binary_cross_entropy(pred, target)

Function	Notes
`relu`, `leaky_relu`	Activations; both have backward kernels.
`sigmoid`, `tanh`	Standard activations.
`cross_entropy`	Logits + integer targets.
`mse_loss`	Mean-squared error.
`binary_cross_entropy`	Element-wise BCE.
`binary_cross_entropy_with_logits`	BCE applied after sigmoid.
`l1_loss`, `smooth_l1_loss`	L1 and Huber losses.
`flatten`	Reshape op for use inside a Tape.

Putting It Together: a Small Classifier

import numpy as np
import netcl.autograd as ag
from netcl.core.device import manager
from netcl.nn import Linear, ReLU, Sequential, Dropout, cross_entropy
from netcl.optim import Adam

q = manager.default("auto").queue
model = Sequential(
    Linear(q, 784, 256), ReLU(), Dropout(p=0.1),
    Linear(q, 256, 128), ReLU(),
    Linear(q, 128, 10),
)
opt = Adam(model.parameters(), lr=3e-4)

for x, y in loader:                       # x: (B, 784),  y: (B,)
    with ag.Tape() as tape:
        logits = model(x)
        loss = cross_entropy(logits, y)
    tape.backward(loss)
    opt.step()
    opt.zero_grad()

netcl.nn — Modules, Layers, ResNet

Class Hierarchy

nn.Module

nn.Parameter

nn.Linear

nn.Conv2d

nn.MaxPool2d

nn.BatchNorm2d

nn.LayerNorm

nn.Dropout

nn.build_sequential

nn.ResNet18