AI Scissors – sharp cut with neural networks
10 Sep 2020Cutting photos background is one of the most tedious graphical task. In this article will show how to simplify it using neural networks.
I will use U[latex]^2[/latex]-Net networks which are described in detail in the arxiv article and python library rembg to create ready to use drag and drop web application which you can use running docker image.
The project code is available on my github https://github.com/qooba/aiscissors You can also use ready docker image: https://hub.docker.com/repository/docker/qooba/aiscissors
Before you will continue reading please watch quick introduction:
Neural network
To correctly remove the image background we need to select the most visually attractive objects in an image which is covered by Salient Object Detection (SOD). To connect a low memory and computation cost with competitive results against state of art methods the novel U[latex]^2[/latex]-Net architecture will be used.
U-Net convolutional networks have characteristic U shape with symmetric encoder-decoder structure. At each encoding stage the feature maps are downsampled (torch.nn.MaxPool2d) and then upsampled at each decoding stage (torch.nn.functional.upsample). Downsample features are transferred and concatenated with upsample features using residual connections.
U[latex]^2[/latex]-Net network uses two-level nested U-structure where the main architecture is a U-Net like encoder-decoder and each stage contains residual U-block. Each residual U-block repeats donwsampling/upsampling procedures which are also connected using residual connections.
Nested U-structure extracts and aggregates the features at each level and enables to capture local and global information from shallow and deep layers.
The U[latex]^2[/latex]-Net architecture is precisely described in arxiv article. Moreover we can go through the pytorch model definition of U2NET and U2NETP.
Additionally the authors also shared the pretrained models: U2NET (176.3MB) and U2NETP (4.7 MB).
The lighter U2NETP version is only 4.7 MB thus it can be used in mobile applications.
Web application
The neural network is wrapped with rembg library which automatically download pretrained networks and gives simple python api. To simplify the usage I have decided to create drag and drop web application (https://github.com/qooba/aiscissors)
In the application you can drag and the drop the image and then compare image with and without background side by side.
You can simply run the application using docker image:
docker run --name aiscissors -d -p 8000:8000 --rm -v $(pwd)/u2net_models:/root/.u2net qooba/aiscissors
if you have GPU card you can use it:
docker run --gpus all --name aiscissors -d -p 8000:8000 --rm -v $(pwd)/u2net_models:/root/.u2net qooba/aiscissors
To use GPU additional nvidia drivers (included in the NVIDIA CUDA Toolkit) are needed.
When you run the container the pretrained models are downloaded thus I have mount local directory u2net_models to /root/.u2net to avoid download each time I run the container.
References
https://arxiv.org/pdf/2005.09007.pdf
https://github.com/NathanUA/U-2-Net
https://github.com/danielgatis/rembg
U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection, Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin Pattern Recognition 106 107404 (2020)