Bielik Anatomy 🧠 Ep.1 - Bielik LM in Triton - Can I Actually Pull This Off?

First episode of the Bielik Anatomy series - implementing the Polish language model Bielik 1.5 (1.6B parameters) from scratch using GPU kernels in OpenAI Triton.

This episode covers the Bielik 1.5 Instruct architecture, Grouped Query Attention (GQA) vs. Multi-Head Attention, SwiGLU activation and RMSNorm, and an introduction to GPU programming in Triton. It also lays out the full roadmap for the 8-episode series including Flash Attention, RoPE, and custom kernels.

Resources:

Polish πŸ‡΅πŸ‡± LLM Bielik v2.5 v2.6 v3.0 - Tools Calling and Structured Output πŸš€

This video (in Polish πŸ‡΅πŸ‡±) explores the new capabilities introduced across Bielik versions 2.5, 2.6, and 3.0 - specifically tools calling and structured output.

The episode walks through running Bielik on free Google Colab with the Unsloth library, explains what tools calling and structured output are and how to use them in practice, and traces the prompt formats used under the hood for both features. A sample application called e-bazarek demonstrates tools calling end-to-end, while a separate example shows practical use cases for structured output.

Resources:

Scaling Polish πŸ‡΅πŸ‡± LLM Bielik with Ray Cluster on Azure ☁️

This video (in Polish πŸ‡΅πŸ‡±) shows how to scale the Polish LLM Bielik across multiple machines simultaneously using Ray Cluster on Azure.

The guide covers preparing an Azure VM image, configuring a Ray Cluster to run LLMs across multiple nodes, launching Bielik and PLLuM on GPU instances in the cloud, and monitoring and optimizing model inference. The full setup is driven by Terraform, with Jupyter Lab as the interactive interface - making distributed LLM inference in the cloud accessible and reproducible.

Resources:

Running Polish πŸ‡΅πŸ‡± LLM Bielik on Azure VM πŸš€ Cheap and Fast AI in the Cloud

This video (in Polish πŸ‡΅πŸ‡±) shows how to run the Polish open-source LLM Bielik in the cloud using an Azure Spot VM - a cost-effective option for those without a local GPU.

The guide covers creating a free Azure trial account, installing Terraform, Azure CLI, and WSL2 on Windows, and automatically provisioning a GPU virtual machine with Terraform. Once the VM is running, it walks through installing NVIDIA CUDA, Docker, and the NVIDIA Container Toolkit, then launching the Bielik model (Q4/Q8 quantization) via mistral.rs. Finally, it demonstrates testing the model over SSH and through a Python API, and securing the endpoint with a Caddy reverse proxy.

Resources:

OnceUponAI πŸ€– Simplify Machine Learning Model Setup with Rust πŸ¦€ and Tauri

OnceUponAI is an app designed to simplify the setup and use of machine learning models - LLMs, embedding models, image generative models, and more. Whether you’re experimenting locally or building a scalable production solution, it streamlines the entire AI workflow.

The video covers setting up OnceUponAI on Linux and other platforms, installing GPU drivers and CUDA libraries, and running models on both CPU and CUDA-accelerated GPU environments. It also walks through running the app in desktop and headless server modes, spawning and interacting with models, and using REST APIs with OIDC security for a secure and scalable AI infrastructure.