Scaling Polish 🇵🇱 LLM Bielik with Ray Cluster on Azure ☁️

This video (in Polish 🇵🇱) shows how to scale the Polish LLM Bielik across multiple machines simultaneously using Ray Cluster on Azure.

The guide covers preparing an Azure VM image, configuring a Ray Cluster to run LLMs across multiple nodes, launching Bielik and PLLuM on GPU instances in the cloud, and monitoring and optimizing model inference. The full setup is driven by Terraform, with Jupyter Lab as the interactive interface - making distributed LLM inference in the cloud accessible and reproducible.

Resources: