ResNet
ResNet
Status: Public API —
ResNet18is lazy-exported fromnetcl.nn. The internal classes (BasicBlock,Bottleneck) and factory functions (resnet34,resnet50, …) are innetcl.nn.resnetbut are not part of the public API.
ResNet is the family of residual networks introduced by He et al.
(2015) for ImageNet classification. The defining idea is the
residual connection: instead of learning a mapping H(x), the
block learns a residual F(x) = H(x) - x and the output is
F(x) + x. This makes it easy for the optimizer to learn an
identity mapping (just set F to zero), which is what the very
deep variants need to avoid the degradation problem (deeper
plain networks have higher training error than shallower ones).
netcl ships a faithful PyTorch-style ResNet implementation under
netcl.nn.resnet. The class hierarchy is:
BasicBlock— two 3x3 convolutions, used in ResNet-18 and ResNet-34.Bottleneck— 1x1, 3x3, 1x1 convolutions, used in ResNet-50 and deeper.ResNet— the full network, parameterised by the block class, the list of layer widths, and the number of classes.resnet18,resnet34,resnet50,resnet101,resnet152— pre-configured factory functions matching the original paper.
Overview
The macro-architecture is:
conv1 : 7x7, stride 2, output 64
maxpool : 3x3, stride 2
layer1 : N blocks, output 64
layer2 : N blocks, output 128, first block stride 2
layer3 : N blocks, output 256, first block stride 2
layer4 : N blocks, output 512, first block stride 2
avgpool : global average pool
fc : 512 * expansion -> num_classes
For BasicBlock, expansion=1. For Bottleneck, expansion=4
(the 1x1 conv expands the channel count by 4 in the third
convolution).
The first block of each layer (except layer1) applies a strided
1x1 convolution on the shortcut so the spatial size and the channel
count match. This is the only place where the shortcut is not a
plain identity.
Where It Lives
- File path:
nn/resnet.py. - Module path:
netcl.nn.resnet. - Public re-export:
from netcl.nn import ResNet18(lazy-loaded).
Diagram
How It Works
Each BasicBlock is:
def forward(self, x):
identity = x
out = self.bn1(self.conv1(x))
out = self.relu(out)
out = self.bn2(self.conv2(out))
if self.downsample is not None:
identity = self.downsample(x)
out = out + identity
return self.relu(out)
The convolution strategies are selected by the same kernel selector used by Conv2d (im2col, Winograd, etc.). The batch-norm is the fused implementation; the addition is a single elementwise kernel.
The full ResNet forward pass is a sequence of about 50 convolutions
for ResNet-50, with one JIT Compiler
fusion opportunity per residual tail (the bn-relu-add chain). The
JIT compiler does not currently see the whole ResNet — that is a
job for the pattern-based TrainingGraphCompiler — but the
per-block elementwise chains (mostly after the addition) are
fused.
Code Example
from netcl.nn import ResNet18
from netcl.core.device import manager
q = manager.default("auto").queue
# ResNet18 with a CIFAR-10 head (default num_classes=10)
model = ResNet18(queue=q, num_classes=10)
# ImageNet head
model = ResNet18(queue=q, num_classes=1000)
Performance & Trade-offs
- ResNet-50 runs at roughly 60% of the throughput of a hand-tuned cuDNN implementation on the same OpenCL hardware, because the OpenCL convolution selector is more conservative than cuDNN's autotuner. This is a known gap and the focus of ongoing work.
- The first 7x7 convolution is the most expensive single op in the network on small input sizes. Some practitioners replace it with three 3x3 convolutions for a small accuracy win and a measurable speed-up on small inputs.
- Under AMP, ResNet-50 trains comfortably in
fp16; the standard recipe uses
lr=0.1per 256-batch and a cosine schedule. - The bottleneck variant is faster than the basic variant for the same accuracy on ImageNet; the 1x1 convs are cheap and the 3x3 conv runs on a quarter of the channels.