Bielik Anatomy 🧠 Ep.7 - Assembling a Full LLM from Custom Triton Kernels

Seventh episode of the Bielik Anatomy series. It’s moment of truth time - every custom Triton kernel built throughout the series (RMSNorm, MatMul, RoPE, Flash Attention, and SwiGLU) is now wired together into a fully functional Bielik 1.5B Instruct model architecture.

The episode walks through constructing the complete decoder layer step by step, tackles the unexpected challenge of handling bias in linear and activation layers, loads the official pretrained weights from HuggingFace safetensors, and spins up an interactive chat interface to talk with Bielik live.

Key engineering highlights covered:

Resources: