In this article I’d like to show how to predict video matte using machine learning model.
Before you will continue reading please watch short introduction:
In the previous article I have shown how to cut the background from the image:
AI Scissors – sharp cut with neural networks.
This time we will generate matte for video without green box using machine learning model.
Video matting, is a technique which helps to separate video into two or more layers, for example foreground and background.
Using this method, we generate alpha matte, which determine the boundaries between the layers,
and allows for example to substitute the background.
Nowadays these methods, are widely used in video conference software, and probably you know it very well.
But is it possible, to process 4K video and generate a high resolution alpha matte, without green screen props ?
Following the article: arxiv 2108.11515 we can achieve this using:
“The Robust High-Resolution Video Matting with Temporal Guidance method”.
The authors, have used recurrent architecture to exploit temporal information. Thus the model predictions,
are more coherent and this improves matting robustness.
Moreover, their proposed new training strategy, where they use both matting (VideoMatte240K, Distinctions-646, Adobe Image Matting)
and segmentation datasets (YouTubeVIS, COCO).
This mixture helps to achieve better quality, for complex datasets and prevents overfitting.
Neural network architecture, consists of three elements.
The first element is Feature-Extraction Encoder, which extracts individual frames features, especially accurately locating human subjects. The encoder, is based on the MobileNetV3-Large backbone.
The second element is Recurrent Decoder, that aggregates temporal information. Recurrent approach helps to learn, what information to keep and forget by itself, on a continuous stream of video.
And Finally Deep Guided Filter module for high-resolution upsampling.
Because the authors shared their work and models, I have prepared an easy to use docker based application which we can use to simply process your video.
To run it you will need docker and you can run it with GPU or without GPU card.
With GPU:
docker run -it--gpus all -p 8000:8000 --rm--name aimatting qooba/aimatting:robust
Without GPU:
docker run -it-p 8000:8000 --rm--name aimatting qooba/aimatting:robust
Then open address http://localhost:8000/ in your browser.
Because the model does not require any auxiliary inputs such as a trimap or a pre-captured background image we simply upload our video and choose required the background. Currently we can generate green screen background which can be then replaced in the video editing software.
We can also use predefined color, image or even video.
I have also prepared the app for the older algorithm version:
arxiv 2012.07810
To use please run:
docker run -it--gpus all -p 8000:8000 --rm--name aimatting qooba/aimatting:background
This version additionally requires the background image but sometimes achieves better results.
In this article I’d like to present a really delicious Feast extension Yummy.
Before you will continue reading please watch short introduction:
Last time I showed the Feast integration with the Dask
framework which helps to distribute ML solutions across the cluster
but doesn’t solve other problems.
Currently in Feast we have a warehouse based approach where Feast builds
and executes query appropriate for specific database engines.
Because of this architecture Feast can’t use multiple data sources
at the same time.
Moreover the logic which fetch historical features from offline data sources
is duplicated for every datasource implementation which makes it difficult to
maintain.
To solve this problems I have decided to create
Yummy
Feast extension, which is also published
as a pypi package.
In Yummy I have used a backend based approach which centralizes the
logic which fetches historical data from offline stores.
Currently: Spark, Dask,
Ray and Polars
backends are supported.
Moreover because the selected backend is responsible for joining the data we can use
multiple different data sources at the same time.
Additionally with Yummy we can start using a feature store on a single machine and then
distribute it using the selected cluster type.
We can also use ready to use platforms like: Databricks,
Coiled, Anyscale to scale our solution.
To use Yummy we have to install it:
pip install yummy
Then we have to prepare Feast configuration feature_store.yaml:
In this case we will use s3 as a feature store registry and redis as an online store.
The Yummy takes offline store responsibility and in this case we have selected
dask backend.
For dask, ray and polars backends we don’t have to set up the cluster to
work. In this case if we don’t provide cluster configuration they will run
locally. For Apache Spark additional configuration is required for local machines.
In the next step we need to provide feature store definition in the python file eg.
features.py
In this article I will show how we combine Feast and Dask library to create distributed feature store.
Before you will continue reading please watch short introduction:
The Feature Store is very important component of the MLops process which helps to manage historical and online features. With the Feast we can for example read historical features from the parquet files and then materialize them to the Redis as a online store.
But what to do if historical data size exceeds our machine capabilities ? The Dask library can help to solve this problem. Using Dask we can distribute the data and calculations across multiple machines. The Dask can be run on the single machine or on the cluster (k8s, yarn, cloud, HPC, SSH, manual setup). We can start with the single machine and then smoothly pass to the cluster if needed. Moreover thanks to the Dask we can read bunch of parquets using path pattern and evaluate distributed training using libraries like scikit-learn or XGBoost
I have prepared ready to use docker image thus you can simply reproduce all steps.
docker run --name feast -d--rm-p 8888:8888 -p 8787:8787 qooba/feast:dask
Then check the Jupyter notebook token which you will need to login:
But with the docker you will have the whole environment ready.
In the notebook you will can find all the steps:
Random data generation
I have used numpy and scikit-learn to generate 1M entities end historical data (10 features generated with make_hastie_10_2 function) for 14 days which I save as a parquet file (1.34GB).
Feast configuration and registry
feature_store.yaml - where I use local registry and Sqlite database as a online store.
features.py - with one file source (generate parquet) and features definition.
The create the Feast registry we have to run:
feast apply
Additionally I have created simple library which helps to inspect feast schema directly in the Jupyter notebook
Using Dask dataframe we can continue distributed training with the distributed data.
On the other hand if we will use Pandas dataframe the data will be computed to the one node.
To start distributed training with scikit-learn we can use Joblib library with the dask backend:
In this article I will show how to measure comments toxicity using Machine Learning models.
Before you will continue reading please watch short introduction:
Hate, rude and toxic comments are common problem in the internet which affects many people.
Today, we will prepare neural network,
which detects comments toxicity,
directly in the browser.
The goal is to create solution which will detect toxicity in the realtime and warn the user during writing,
which can discourage from writing toxic comments.
To do this, we will train the tensorflow lite model,
which will run in the browser using WebAssembly backend.
The WebAssembly (WASM) allows running C, C++ or RUST code at native speed.
Thanks to this, prediction performance will be better than running it using javascript tensorflowjs version.
Moreover, we can serve the model, on the static page, with no additional backend servers required.
Our model, will only classify, if the text is toxic, or not. Thus we need to start with preprocessing training data. Then we will use the tensorflow lite model maker library.
We will also use the Averaging Word Embedding specification which will create words embeddings and dictionary mappings using training data thus we can train the model in the different languages.
The Averaging Word Embedding specification based model will be small <1MB.
If we have small dataset we can use the pretrained embeddings. We can choose MobileBERT or BERT-Base specification.
In this case models will much more bigger 25MB w/ quantization 100MB w/o quantization for
MobileBERT and 300MB for BERT-Base (based on tutorial )
Using simple model architecture (Averaging Word Embedding), we can achieve about nighty five percent accuracy, and small model size, appropriate
for the web browser, and web assembly.
Now, let’s prepare the non-toxic forum web application,
where we can write the comments.
When we write non-toxic comments, the model won’t block it.
On the other hand, the toxic comments will be blocked,
and the user warned.
Of course, this is only client side validation, which can discourage users,
from writing toxic comments.
To run the example simply clone git repository and run simple server to serve the static page:
git clone https://github.com/qooba/ai-toxicless-texts.git
cd ai-toxicless-texts
python3 -m http.server
In this article I will show how we can extract music sources: bass, drums, vocals and other accompaniments using neural networks.
Before you will continue reading please watch short introduction:
Separation of individual instruments from arranged music is another area where machine learning
algorithms could help. Demucs solves this problem using neural networks.
The trained model (https://arxiv.org/pdf/1909.01174v1.pdf) use U-NET architecture which contains two parts encoder and decoder.
On the encoder input we put the original track and after processing we get bass, drums, vocals and other accompaniments at the decoder output.
The encoder, is connected to the decoder,
through additional LSTM layer,
as well as residual connections between subsequent layers.
Ok, we have neural network architecture but what about the training data ?
This is another difficulty which can be handled by the unlabeled data remixing pipeline.
We start with another classifier, which can find the parts of music,
which do not contain the specific instruments, for example drums.
Then, we mix it with well known drums signal, and separate the tracks
using the model.
Now we can compare, the separation results, with known drums track and mixture of other instruments.
According to this, we can calculate the loss (L1 loss), and use it during the training.
Additionally, we set different loss weights, for known track and the other.
The whole UI is kept in the docker image thus you can simply try it:
#for CPU
docker run --name aiaudioseparation -it-p 8000:8000 -v$(pwd)/checkpoints:/root/.cache/torch/hub/checkpoints --rm qooba/aimusicseparation
#for GPU
docker run --name aiaudioseparation --gpus all -it-p 8000:8000 -v$(pwd)/checkpoints:/root/.cache/torch/hub/checkpoints --rm qooba/aimusicseparation
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.Ok