First episode of the Bielik Anatomy series - implementing the Polish language model Bielik 1.5 (1.6B parameters) from scratch using GPU kernels in OpenAI Triton.
This episode covers the Bielik 1.5 Instruct architecture, Grouped Query Attention (GQA) vs. Multi-Head Attention, SwiGLU activation and RMSNorm, and an introduction to GPU programming in Triton. It also lays out the full roadmap for the 8-episode series including Flash Attention, RoPE, and custom kernels.
This video (in Polish π΅π±) explores the new capabilities introduced across Bielik versions 2.5, 2.6, and 3.0 - specifically tools calling and structured output.
The episode walks through running Bielik on free Google Colab with the Unsloth library, explains what tools calling and structured output are and how to use them in practice, and traces the prompt formats used under the hood for both features. A sample application called e-bazarek demonstrates tools calling end-to-end, while a separate example shows practical use cases for structured output.
This video (in Polish π΅π±) shows how to scale the Polish LLM Bielik across multiple machines simultaneously using Ray Cluster on Azure.
The guide covers preparing an Azure VM image, configuring a Ray Cluster to run LLMs across multiple nodes, launching Bielik and PLLuM on GPU instances in the cloud, and monitoring and optimizing model inference. The full setup is driven by Terraform, with Jupyter Lab as the interactive interface - making distributed LLM inference in the cloud accessible and reproducible.
This video (in Polish π΅π±) shows how to run the Polish open-source LLM Bielik in the cloud using an Azure Spot VM - a cost-effective option for those without a local GPU.
The guide covers creating a free Azure trial account, installing Terraform, Azure CLI, and WSL2 on Windows, and automatically provisioning a GPU virtual machine with Terraform. Once the VM is running, it walks through installing NVIDIA CUDA, Docker, and the NVIDIA Container Toolkit, then launching the Bielik model (Q4/Q8 quantization) via mistral.rs. Finally, it demonstrates testing the model over SSH and through a Python API, and securing the endpoint with a Caddy reverse proxy.
OnceUponAI is an app designed to simplify the setup and use of machine learning models - LLMs, embedding models, image generative models, and more. Whether youβre experimenting locally or building a scalable production solution, it streamlines the entire AI workflow.
The video covers setting up OnceUponAI on Linux and other platforms, installing GPU drivers and CUDA libraries, and running models on both CPU and CUDA-accelerated GPU environments. It also walks through running the app in desktop and headless server modes, spawning and interacting with models, and using REST APIs with OIDC security for a secure and scalable AI infrastructure.
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.Ok