Transform Your Coding Journey: Interactive Cheat Sheets with LLM Assistance

all

Cheat sheets are common companions in the journey through programming. They are incredibly helpful, offering quick references.

But what if we could take them a step further? Imagine these cheat sheets not just as static helpers, but as dynamic, interactive guides with the power of large language models. These enhanced cheat sheets don’t just provide information; they interact, they understand, and they assist. Let’s explore how we can make this leap.

Before you will continue reading please watch short introduction:

In the first step I have built Vue web application with responsive cheatsheet layout.

Next, I have brought Python into the browser using the Pyodide library. Pyodide is a port of CPython to WebAssembly. This means that we can run Python code right in the web browser, seamlessly integrating live coding examples and real-time feedback into cheatsheets.

The final, and perhaps the most exciting step, was adding LLM genie, our digital assistant. Using the mlc-llm library, I have embedded a powerful large language models into the web application. Currently we can choose and test several models like: RedPajama, LLama2 or Mistral. First and foremost, the LLM model, is designed to run directly in your browser on your device. This means that once the LLM is downloaded, all its processing and interactions happen locally, thus its performance depends on your device capabilities. If you want you to test it on my website:

https://www.onceuponai.dev/

Here, you can test the interactive cheat sheets and challenge the LLM with your code.

Data anonymization with AI

all

Data anonymization is the process of protecting private or sensitive information by erasing or encrypting identifiers that link an individual to stored data. This method is often used in situations where privacy is necessary, such as when sharing data or making it publicly available. The goal of data anonymization is to make it impossible (or at least very difficult) to identify individuals from the data, while still allowing the data to be useful for analysis and research purposes.

Before you will continue reading please watch short introduction:

I have decided to create a library which will help to simply anonymize data with high-performance. That’s why I have used Rust to code it. The library will use three algorithms which will anonymize data. Named Entity Recognition method enables the library to identify and anonymize sensitive named entities in your data, like names, organizations, locations, and other personal identifiers.

Here you can use existing models from HuggingFace for different languages for example:

The models are based on external libraries like pytorch. To avoid external dependencies I have used rust tract library which is a rust onnx implementation.

To use models we need to convert them to onnx format using the transformers library.

import os
import transformers
from transformers import AutoModelForMaskedLM, AutoTokenizer, AutoModelForTokenClassification
from transformers.onnx import FeaturesManager
from pathlib import Path
from transformers import pipeline

model_id='dslim/bert-base-NER'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)

feature='token-classification'

model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature)
onnx_config = model_onnx_config(model.config)

output_dir = "./dslim"
os.makedirs(output_dir, exist_ok=True)

# export
onnx_inputs, onnx_outputs = transformers.onnx.export(
        preprocessor=tokenizer,
        model=model,
        config=onnx_config,
        opset=13,
        output=Path(output_dir+"/model.onnx")
)

print(onnx_inputs)
print(onnx_outputs)
tokenizer.save_pretrained(output_dir)

Now we are ready to use the NER algorithm. We can simply run docker images with a yaml configuration file where we define an anonymization pipeline.

pipeline:
  - kind: ner
    model_path: ./dslim/model.onnx
    tokenizer_path: ./dslim/tokenizer.json
    token_type_ids_included: true
    id2label:
      "0": ["O", false]
      "1": ["B-MISC", true]
      "2": ["I-MISC", true]
      "3": ["B-PER", true]
      "4": ["I-PER", true]
      "5": ["B-ORG", true]
      "6": ["I-ORG", true]
      "7": ["B-LOC", true]
      "8": ["I-LOC", true]
docker run -it -v $(pwd):/app/ -p 8080:8080 qooba/anonymize-rs server --host 0.0.0.0 --port 8080 --config config.yaml

For the NER algorithm we can configure if the predicted entity will be replaced or not. For the example request we will receive an anonymized response and replace items.

curl -X GET "http://localhost:8080/api/anonymize?text=I like to eat apples and bananas and plums" -H "accept: application/json" -H "Content-Type: application/json"

Response:

{
    "text": "I like to eat FRUIT_FLASH0 and FRUIT_FLASH1 and FRUIT_REGEX0",
    "items": {
        "FRUIT_FLASH0": "apples",
        "FRUIT_FLASH1": "banans",
        "FRUIT_REGEX0": "plums"
    }
}

If needed we can deanonymize the data using a separate endpoint.

curl -X POST "http://localhost:8080/api/deanonymize" -H "accept: application/json" -H "Content-Type: application/json" -d '{
    "text": "I like to eat FRUIT_FLASH0 and FRUIT_FLASH1 and FRUIT_REGEX0",
    "items": {
        "FRUIT_FLASH0": "apples",
        "FRUIT_FLASH1": "banans",
        "FRUIT_REGEX0": "plums"
    }
}'

Response:

{
    "text": "I like to eat apples and bananas and plums"
}

If we prefer we can use the library from python code in this case we simply install the library. And we can use it in python.

We have discussed the first anonymization algorithm but what if it is not enough ? There are two additional methods. First is Flush Text algorithm which is a fast method for searching and replacing words in large datasets, used to anonymize predefined sensitive information. For flush text we can define configuration where we can read keywords in separate file where each line is a keyword or in the keyword configuration section.

The last method is simple Regex where we can define patterns which will be replaced.

We can combine several methods and build an anonymization pipeline:

pipeline:
  - kind: ner
    model_path: ./dslim/model.onnx
    tokenizer_path: ./dslim/tokenizer.json
    token_type_ids_included: true
    id2label:
      "0": ["O", false]
      "1": ["B-MISC", true]
      "2": ["I-MISC", true]
      "3": ["B-PER", true]
      "4": ["I-PER", true]
      "5": ["B-ORG", true]
      "6": ["I-ORG", true]
      "7": ["B-LOC", true]
      "8": ["I-LOC", true]
  - kind: flashText
    name: FRUIT_FLASH
    file: ./tests/config/fruits.txt
    keywords:
    - apple
    - banana
    - plum
  - kind: regex
    name: FRUIT_REGEX
    file: ./tests/config/fruits_regex.txt
    patterns:
    - \bapple\w*\b
    - \bbanana\w*\b
    - \bplum\w*\b

Remember that it uses automated detection mechanisms, and there is no guarantee that it will find all sensitive information. You should always ensure that your data protection measures are comprehensive and multi-layered.

How to use large language models on CPU with Rust ?

all

Currently large language models gain popularity due to their impressive capabilities. However, running these models often requires powerful GPUs, which can be a barrier for many developers. LLM a Rust library developed by the Rustformers GitHub organization is designed to run several large language models on CPU, making these powerful tools more accessible than ever.

Before you will continue reading please watch short introduction:

Currently GGML a tensor library written in C that provides a foundation for machine learning applications is used as a LLM backend.

GGML library uses a technique called model quantization. Model quantization is a process that reduces the precision of the numbers used in a machine learning model. For instance, a model might use 32-bit floating-point numbers in its calculations. Through quantization, these can be reduced to lower-precision formats, such as 16-bit integers or even 8-bit integers.

training

The GGML library, which Rustformers is built upon, supports a number of different quantization strategies. These include 4-bit, 5-bit, and 8-bit quantization. Each of these offers different trade-offs between efficiency and performance. For instance, 4-bit quantization will be more efficient in terms of memory and computational requirements, but it might lead to a larger decrease in model performance compared to 8-bit quantization.

training

LLM supports a variety of large language models, including:

  • Bloom
  • GPT-2
  • GPT-J
  • GPT-NeoX
  • Llama
  • MPT

The models needs to be converted into form readable by GGML library but thanks to the authors you can find ready to use models on huggingface.

To test it you can install llm-cli packge. Then you can chat with the model in the console.

cargo install llm-cli --git https://github.com/rustformers/llm

llm gptj infer -m ./gpt4all-j-q4_0-ggjt.bin -p "Rust is a cool programming language because"

To be able to talk with the model using http I have used actix server and built Rest API. Api expose endpoint which returns response asyncronously.

The solution is acomplished with simple UI interface.

To run it you need to clone the repository.

git clone https://github.com/qooba/llm-ui.git

Download selected model from hugging face.

training

curl -LO https://huggingface.co/rustformers/gpt4all-j-ggml/resolve/main/gpt4all-j-q4_0-ggjt.bin

In our case we will use gpt4all-j model with 4-bit quantization.

Finally we use cargo run in release mode with additional arguments host, port, model type and path to the model.

cargo run --release -- --host 0.0.0.0 --port 8089 gptj ./gpt4all-j-q4_0-ggjt.bin

training

Now we are ready to call rest api or talk with the model using ui interface.

Unleash the Power of AI on Your Laptop with GPT-4All

all

The world of artificial intelligence (AI) has seen significant advancements in recent years, with OpenAI’s GPT-4 being one of the most groundbreaking language models to date. However, harnessing the full potential of GPT-4 often requires high-end GPUs and expensive hardware, making it inaccessible for many users. That’s where GPT-4All comes into play! In this comprehensive guide, we’ll introduce you to GPT-4All, an optimized AI model that runs smoothly on your laptop using just your CPU.

Before you will continue reading please watch short introduction:

GPT-4All was trained on a massive, curated corpus of assistant interactions, covering a diverse range of tasks and scenarios. This includes word problems, story descriptions, multi-turn dialogues, and even code.

The authors have shared data and the code used to traind the model https://github.com/nomic-ai/gpt4all they have also prepared the technical report which describes all details.

At the first stage the authors collected one million prompt-response pairs using the GPT OpenAI API. Then they have cleaned and curated the data using Atlas project.

training

Finally the released model was trained using Low-Rank Adaptation approach which reduce the number of trainable parameters and required resources.

The authors have shared awesome library which automatially downloads the model and expose simple python API and additionally expose console interface.

To simplify interactions I have added simple Web UI interface. https://github.com/qooba/gpt4all-ui

To install it you have to clone the repository. Install requirements and you are ready to run the app (open: http://localhost:8000/) and prompt

git clone https://github.com/qooba/gpt4all-ui.git
cd gpt4all-ui
pip install -r requiremnets.txt

uvicorn main:app --reload

ui interface

Now you are ready to run GPT4All on your everyday laptop without requiring expensive hardware or high-end GPUs and prompt it in the browser.

Discover a Delicious Way to Use Delta Lake! Yummy Delta - #1 Introduction

yummy delta

Delta lake is an open source storage framework for building lake house architectures on top of data lakes.

Additionally it brings reliability to data lakes with features like: ACID transactions, scalable metadata handling, schema enforcement, time travel and many more.

Before you will continue reading please watch short introduction:

Delta lake can be used with compute engines like Spark, Flink, Presto, Trino and Hive. It also has API for Scala, Java, Rust , Ruby and Python.

delta lake

To simplify integrations with delta lake I have built a REST API layer called Yummy Delta.

Which abstracts multiple delta lake tables providing operations like: creating new delta table, writing and querying, but also optimizing and vacuuming. I have coded an overall solution in rust based on the delta-rs project.

Delta lake keeps the data in parquet files which is an open source, column-oriented data file format.

Additionally it writes the metadata in the transaction log, json files containing information about all performed operations.

The transaction log is stored in the delta lake _delta_log subdirectory.

delta lake

For example, every data write will create a new parquet file. After data write is done a new transaction log file will be created which finishes the transaction. Update and delete operations will be conducted in a similar way. On the other hand when we read data from delta lake at the first stage transaction files are read and then according to the transaction data appropriate parquet files are loaded.

Thanks to this mechanism the delta lake guarantees ACID transactions.

There are several delta lake integrations and one of them is delta-rs rust library.

Currently in delta-rs implementation we can use multiple storage backends including: Local filesystem, AWS S3, Azure Blob Storage and Azure Deltalake Storage Gen 2 and also Google Cloud Storage.

To be able to manage multiple delta tables on multiple stores I have built Yummy delta server which expose Rest API.

delta lake

Using API we can: list and create delta tables, inspect delta tables schema, append or override data in delta tables and additional operations like optimize or vacuum.

You can find API reference here: https://www.yummyml.com/delta

Moreover we can query data using Data Fusion sql-s. Query results will be returned as a stream thus we can process it in batches.

You can simply install Yummy delta as a python package:

pip3 install yummy[delta]

Then we need to prepare config file:

stores:
  - name: local
    path: "/tmp/delta-test-1/"
  - name: az
    path: "az://delta-test-1/"

And you are ready run server using command line:

yummy delta server -h 0.0.0.0 -p 8080 -f config.yaml

Now we are able to perform all operations using the REST API.