MLP
MLP
Status: Public API in
netcl.nn.simple.MLP
MLP (Multi-Layer Perceptron) is a fully-connected feed-forward
neural network. In netcl it lives in netcl.nn.simple and is a
thin wrapper over a sequence of nn.Linear and nn.ReLU (or
other activation) modules, parameterised by the list of layer
widths.
The standard MLP constructor is:
MLP(input_dim, hidden_dim, output_dim, num_hidden_layers=1, activation=ReLU)
which builds a network with num_hidden_layers hidden layers of
width hidden_dim, a ReLU activation between every layer
(including before the final linear), and a final linear layer
to the output dimension.
Overview
A two-hidden-layer MLP(784, 256, 10) is:
x = Linear(784, 256)(x)
x = ReLU()(x)
x = Linear(256, 256)(x)
x = ReLU()(x)
x = Linear(256, 10)(x)
The class also exposes a forward(x) method that does the
forward pass and returns the logits. The full forward is one
matmul + ReLU per layer, all dispatched through the standard op
system, all participating in Tape autograd.
Where It Lives
- MLP is not a separate exported class. Build one with
Sequentialorbuild_sequentialfromnetcl.nn.
How It Works
MLP.__init__ constructs a nn.ModuleList of layers, with the
first layer being Linear(input_dim, hidden_dim), the last
layer being Linear(hidden_dim, output_dim), and the
intermediate layers being Linear(hidden_dim, hidden_dim). An
activation module is inserted after every layer except the last
(though some users prefer the "pre-activation" convention of
inserting the activation before each layer; both are supported
via the activation argument).
forward(x) walks the layer list and applies each in order.
The forward is fully differentiable; the parameters are picked
up by the standard model.parameters() call.
Code Example
A minimal MLP for MNIST:
from netcl.nn import Linear, ReLU, Sequential
from netcl.core.device import manager
q = manager.default("auto").queue
model = Sequential(
Linear(q, 784, 256), ReLU(),
Linear(q, 256, 256), ReLU(),
Linear(q, 256, 10),
)
A custom MLP with dropout:
from netcl.nn import Module, Linear, Dropout
import netcl.autograd as ag
class MyMLP(Module):
def __init__(self, queue):
super().__init__()
self.fc1 = Linear(queue, 784, 256)
self.fc2 = Linear(queue, 256, 10)
self.dropout = Dropout(0.5)
def forward(self, x):
x = self.dropout(ag.relu(self.fc1(x)))
return self.fc2(x)
Performance & Trade-offs
MLPis a thin convenience wrapper. There is no perf difference between usingMLPand writing the layer list by hand; the wrapper is just less typing.- The default activation is
ReLU. Other activations (GELU,Tanh,Sigmoid) are passed via theactivationargument. - For wide hidden layers (e.g.
hidden_dim=4096), the linear layers dominate the runtime and the activation is cheap. For narrow hidden layers (e.g.hidden_dim=16), the activation kernel-launch overhead is the bottleneck; use JIT Compiler to fuse the chain. - Under AMP, the linear layers run in fp16 and the accumulator is fp32. MLPs are the canonical fp16-friendly model.