How to code a nice user-guided foreground extraction algorithm? (Addendum)

After writing my last article on the Easy (user-guided) foreground extraction algorithm, I realized that you could maybe think I was exaggeratedly arguing the whole algorithm can be re-coded very quickly from scratch. After all, I’ve just illustrated how the things work by using G’MIC command lines, but there is already a lot of image processing algorithms implemented in G’MIC, so it is somehow a biased demonstration. So let me just give you the corresponding C++ code for the algorithm. Here again, you may find I cheat a little bit, because I use some functions of a C++ image processing library (CImg) that actually doe most of the hard implementation work for me.

But:

  1. You can use this library too in your own code. The CImg Library I use here works on multiple platforms and has a very permissive license, so you probably don’t have to re-code the key image processing algorithms by yourself. And even if you want to do so, you can still look at the source code of CImg. It is quite clearly organized and function codes are easy to find (the whole library is defined in a single header file CImg.h).
  2. I never said the whole algorithm could be done in few lines, only that it was easy to implement 🙂

So, dear C++ programmer, here is the simple prototype you need to make the foreground extraction algorithm work:

#include "CImg.h"
using namespace cimg_library;

int main() {

  // Load input color image.
  const CImg image("image.png");

  // Load input label image.
  CImg labels("labels.png");
  labels.resize(image.width(),image.height(),1,1,0);  // Be sure labels has the correct size.

  // Compute gradients.
  const float sigma = 0.002f*cimg::max(image.width(),image.height());
  const CImg blurred = image.get_blur(sigma);
  CImgList gradient = blurred.get_gradient("xy");
    // gradient[0] and gradient[1] are two CImg images which contain
    // respectively the X and Y derivatives of the blurred RGB image.

  // Compute the potential map P.
  CImg P(labels);
  cimg_forXY(P,x,y)
    P(x,y) = 1/(1 +
                cimg::sqr(gradient(0,x,y,0)) + // Rx^2
                cimg::sqr(gradient(0,x,y,1)) + // Gx^2
                cimg::sqr(gradient(0,x,y,2)) + // Bx^2
                cimg::sqr(gradient(1,x,y,0)) + // Ry^2
                cimg::sqr(gradient(1,x,y,1)) + // Gy^2
                cimg::sqr(gradient(1,x,y,2))); // By^2

  // Run the label propagation algorithm.
  labels.watershed(P,true);

  // Display the result and exit.
  (image,P,labels).display("Image - Potential - Labels");

  return 0;
}
c

To compile it, using g++ on Linux for instance, you have to type something like this in a shell (I assume you have all the necessarily headers and libraries installed):

$ g++ -o foo foo.cpp -Dcimg_use_png -lX11 -lpthread -lpng -lzsses

Now, execute it:

$ ./foo

and you get this kind of result:

cimg_ef

Once you have the resulting image of propagated labels, it is up to you to use it the way you want to split the original image into foreground/background layers or keep it as a mask associated to the color image.

The point is that even if you add the code of the important CImg methods I’ve used here, i.e. CImg<T>::get_blur(), CImg<T>::get_gradient() and CImg<T>::watershed(), you won’t probably exceed about 500 lines of C++ code. Yes, instead of a single line of G’MIC code. Now you see why I’m also a big fan of coding image processing stuffs directly in G’MIC instead of C++ ;).

A last remark before ending this addendum:

Unfortunately, the label propagation algorithm itself is hardly parallelizable. It is based on a priority queue and basically, the choice of a new label to set depends on how the latest label has been set. This is a sequential algorithm by nature. So, when processing large images, a good idea is to do the computations and preview the foreground extraction result only on a smaller preview of the original image. Then compute the full-res mask only once, after all key points have been placed by the user.

Well that’s it, happy coding!

How to code a nice user-guided foreground extraction algorithm?


Principle of the algorithm:

About three months ago, I’ve added a new filter in the G’MIC plug-in for GIMP, which allows to interactively extract a foreground object present in a color image, from its background (considering a single-layered bitmap image as the input). But instead of doing this the “usual” way, e.g. by letting the user crop the object manually with any kind of selection tools, this filter relies on a semi-automatic (user-guided) algorithm to do the job. Basically, the only thing the user needs to do is placing key points which tell about the nature of the underlying pixels: either they belong to the foreground (symbolized by green dots) or to the background (red dots). The algorithm tries then to propagate those labels all over the image, so that every image pixel get a binary label (either foreground or background). The needed interaction is quite limited, and the user doesn’t need to be an image processing guru to control what he does: left/right mouse buttons to add a new foreground/background key point (or move an existing one), middle button and mouse wheel to zoom in/out and pan shift. Nothing really complicated, easy mode.

gmic_extract

Fig.1. Principle of the user-guided foreground extraction algorithm: the user puts some foreground/background key points over a color image (left), and the algorithm propagates those labels for all pixels, then splits the initial image into two foreground/background layers (middle and right).

Of course, the algorithm which performs the propagation of these binary labels is quite dumb (as most image processing algorithms actually) despite the fact it tries to follow the contours detected in the image as much as possible. So, sometimes it just assign the wrong label to a pixel. But from the user point of view, this is not so annoying because it is really easy to locally correct the algorithm estimation by putting additional key points in the wrongly labelled regions, and run a new iteration of the algorithm.

Here is a quick video that shows how foreground extraction can be done when using this algorithm. Basically we start by placing one or two key points, run the algorithm (i.e. press the Space bar), then correct the erroneous areas by adding or moving key points, run the algorithm again, etc.. until the contours of the desired object to extract are precise enough. As the label propagation algorithm is reasonably fast (hence dumb), the corresponding workflow runs smoothly.

At this stage, two observations have to be done:

1. As a user, you can easily experiment this manipulation in GIMP, if you have the G’MIC plug-in installed (and if this is not the case, do not wait one second longer and go install it! 😉 ). This foreground extraction filter is located in Contours / Extract foreground [interactive]. When clicking on the Apply button, it opens a new interactive window where you can start adding your foreground/background key points. Press Space to update the extracted mask (i.e. run a new iteration of the algorithm) until it fits your requirements, then quit the window and your foreground (and optionally background) selection will be transferred back to GIMP. I must say I have been quite surprised by the good feedback I got for this particular filter. I’m aware this can be probably a time-saver in some cases, but I can’t get out of my head the overall simplicity of the underlying algorithm.

gmic_extract_gimp

Fig.2. Location of the “Extract foreground” filter in the G’MIC plug-in for GIMP.

2. As a developer, you need to convince yourself that this label propagation algorithm is trivial to code. Dear reader, if by chance you are a developer of an open-source software for image processing or photo retouching, and you think this kind of foreground extraction tool could be a cool feature to have, then be aware this can be re-coded from scratch in a very short amount of time. I’m not lying, the hardest part is actually coding the GUI. I illustrate this fact in the followings, and give some details about how this propagation algorithm is currently implemented in G’MIC.


Details of the implementation:

Now I assume that the hardest work has been already done, which is the design of the GUI for letting the user place foreground/background key points. Now, we have then two known images as the input of our algorithm.

framed_colander

Fig.3.a. The input color image (who’s that guy ?)

framed_colander_labels

Fig.3.b. An image of labels, containing the FG/BG key points

At this stage, the image of user-defined labels (in Fig.3.b.) contain pixels whose value can be only 0 (unlabeled pixels = black), 1 (labeled as background = gray) or 2 (labeled as foreground = white). The goal of the label propagation algorithm is then to set a value of 1 or to pixels that have a current label of 0. This label propagation is performed by a so-called Watershed algorithm, which is in fact nothing more than the usual Dijkstra’s algorithm applied on a regular grid (the support of the image) instead of a generic graph. The toughest parts of the label propagation algorithm are:

  1. Managing a priority queue (look at the pseudo-code on the Wikipedia page of the Dijkstra’s algorithm to see what I mean). This is a very classical algorithmic structure, so nothing terrible for a programmer.
  2. Defining a scalar field P of potentials (or priorities) to drive the label propagation. We have to define a potential/priority value for each image pixel, that will tells if the current pixel should be labeled in priority or not (hence the need for a priority queue). So P has the same size as the input image, but has only one channel (see it as a matrix of floating-point values if you prefer).

Suppose that you tells the propagation algorithm that all pixels have the same priority (so you feed it with a constant potential image P(x,y)=1). Then, your labels will propagate such that each reconstructed label will take the value of its nearest known (user-defined) label. This will not take the image data into account, so there is no chance to get a satisfactory result. Instead, the key idea is to define the potential image P such that it depends on the contour variations of the color image you want to segment. More precisely, you want pixels with low variations to have a high priority, while pixels on image structures and contours should be labeled last. In G’MIC, I’ve personally defined the potential image P from the input RGB image with the following (heuristic) formula:

potentials

where

gradient

is the usual definition of the image gradient, here appearing in the potential map P for each image channel R, G and B. To be more precise, we estimate those gradients from a slightly blurred version of the input image (hence the sigma added to R,G and B, which stands for the standard variation of the Gaussian blur applied on the RGB color image). Defined like this, the values of this potential map P are low for pixels on image contours, and high (with a maximum of 1) for pixels located on flat regions.

If you have the command-line interface gmic of G’MIC installed on your machine, you can easily compute and visualize such a propagation map P for your own image. Just type this simple command line in a shell:

$ gmic image.jpg -blur 0.2% -gradient_norm -fill '1/(1+i^2)'

For our sample image above, the priority map P looks like this (color white correspond to the value 1, and black to something close to 0):

framed_colander_potential

Fig.4. Estimated potential/priority map P.

Now, you have all the ingredients to extract your foreground. Run the label propagation algorithm on your image of user-defined labels, with the priority map P estimated from the input color image. It will output the desired image of propagated labels:

framed_colander

Fig.5.a. Input color image.

framed_colander_mask

Fig.5.b. Result of the labels propagation algorithm.

And you won’t believe it, but you can actually do all these things in a single command line with gmic:

$ gmic image.jpg labels.png --blur[0] 0.2% -gradient_norm[-1] -fill[-1] '1/(1+i^2)' -watershed[1] [2] -reverse[-2,-1]

In Fig.5.b., I let the foreground/background frontier pixels keep their values 0, to better see the estimated background (in grey) and foreground (in white). Those pixels are of course also labeled in the final version of the algorithm. And once you have computed this image of propagated labels, the extraction work is almost done. Indeed, this label image is a binary mask that you can use to easily split the input color image into two foreground/background layers with alpha channels:

framed_colander_foreground

Fig.6.a. Estimated foreground layer, after label propagation.

framed_colander_background

Fig.6.b. Estimated background layer, after label propagation.

So, now you have a better idea on how all this works… Isn’t that a completely stupid algorithm ? Apologies if I broke the magic of the thing! 😀

Of course, you can run the full G’MIC foreground extraction filter, not only from the plug-in, but also from the command line, by typing:

$ gmic image.jpg -x_segment , -output output.png

The input file should be a regular RGB image. This command generates an output file in RGBA mode, where only a binary alpha channel has been added to the image (the RGB colors of each pixel remain unchanged).

And now ?

You know the secret of this nice (but simple) foreground extraction algorithm. So you have two options:

  1. If you are not a developer, harass the developers of your favorite graphics software to include this “so-simple-to-code” algorithm into it!
  2. If you are a developer interested by the algorithm, just try to code a prototype for it. You will be surprised by how fast this could be actually done!

In any case, please don’t hate me if none of these suggestions worked ;). See you in a next post!

See also the Addendum I wrote, that describes how this can be implemented in C++.