netcl wiki
concepts

MLP

MLP

Status: Public API in netcl.nn.simple.MLP

MLP (Multi-Layer Perceptron) is a fully-connected feed-forward neural network. In netcl it lives in netcl.nn.simple and is a thin wrapper over a sequence of nn.Linear and nn.ReLU (or other activation) modules, parameterised by the list of layer widths.

The standard MLP constructor is:

MLP(input_dim, hidden_dim, output_dim, num_hidden_layers=1, activation=ReLU)

which builds a network with num_hidden_layers hidden layers of width hidden_dim, a ReLU activation between every layer (including before the final linear), and a final linear layer to the output dimension.

Overview

A two-hidden-layer MLP(784, 256, 10) is:

x = Linear(784, 256)(x)
x = ReLU()(x)
x = Linear(256, 256)(x)
x = ReLU()(x)
x = Linear(256, 10)(x)

The class also exposes a forward(x) method that does the forward pass and returns the logits. The full forward is one matmul + ReLU per layer, all dispatched through the standard op system, all participating in Tape autograd.

Where It Lives

  • MLP is not a separate exported class. Build one with Sequential or build_sequential from netcl.nn.

How It Works

MLP.__init__ constructs a nn.ModuleList of layers, with the first layer being Linear(input_dim, hidden_dim), the last layer being Linear(hidden_dim, output_dim), and the intermediate layers being Linear(hidden_dim, hidden_dim). An activation module is inserted after every layer except the last (though some users prefer the "pre-activation" convention of inserting the activation before each layer; both are supported via the activation argument).

forward(x) walks the layer list and applies each in order. The forward is fully differentiable; the parameters are picked up by the standard model.parameters() call.

Code Example

A minimal MLP for MNIST:

from netcl.nn import Linear, ReLU, Sequential
from netcl.core.device import manager

q = manager.default("auto").queue
model = Sequential(
    Linear(q, 784, 256), ReLU(),
    Linear(q, 256, 256), ReLU(),
    Linear(q, 256, 10),
)

A custom MLP with dropout:

from netcl.nn import Module, Linear, Dropout
import netcl.autograd as ag

class MyMLP(Module):
    def __init__(self, queue):
        super().__init__()
        self.fc1 = Linear(queue, 784, 256)
        self.fc2 = Linear(queue, 256, 10)
        self.dropout = Dropout(0.5)

    def forward(self, x):
        x = self.dropout(ag.relu(self.fc1(x)))
        return self.fc2(x)

Performance & Trade-offs

  • MLP is a thin convenience wrapper. There is no perf difference between using MLP and writing the layer list by hand; the wrapper is just less typing.
  • The default activation is ReLU. Other activations (GELU, Tanh, Sigmoid) are passed via the activation argument.
  • For wide hidden layers (e.g. hidden_dim=4096), the linear layers dominate the runtime and the activation is cheap. For narrow hidden layers (e.g. hidden_dim=16), the activation kernel-launch overhead is the bottleneck; use JIT Compiler to fuse the chain.
  • Under AMP, the linear layers run in fp16 and the accumulator is fp32. MLPs are the canonical fp16-friendly model.

See also

  • MLP — the API page.
  • Linear — the underlying layer type.
  • ReLU — the default activation.
  • BatchNorm — the typical normaliser.
  • JIT Compiler — fuses the activation chain.
  • AMP — fp16 forward / backward.
  • MLP — this article.