Principle of the algorithm:
About three months ago, I’ve added a new filter in the G’MIC plug-in for GIMP, which allows to interactively extract a foreground object present in a color image, from its background (considering a single-layered bitmap image as the input). But instead of doing this the “usual” way, e.g. by letting the user crop the object manually with any kind of selection tools, this filter relies on a semi-automatic (user-guided) algorithm to do the job. Basically, the only thing the user needs to do is placing key points which tell about the nature of the underlying pixels: either they belong to the foreground (symbolized by green dots) or to the background (red dots). The algorithm tries then to propagate those labels all over the image, so that every image pixel get a binary label (either foreground or background). The needed interaction is quite limited, and the user doesn’t need to be an image processing guru to control what he does: left/right mouse buttons to add a new foreground/background key point (or move an existing one), middle button and mouse wheel to zoom in/out and pan shift. Nothing really complicated, easy mode.
Of course, the algorithm which performs the propagation of these binary labels is quite dumb (as most image processing algorithms actually) despite the fact it tries to follow the contours detected in the image as much as possible. So, sometimes it just assign the wrong label to a pixel. But from the user point of view, this is not so annoying because it is really easy to locally correct the algorithm estimation by putting additional key points in the wrongly labelled regions, and run a new iteration of the algorithm.
Here is a quick video that shows how foreground extraction can be done when using this algorithm. Basically we start by placing one or two key points, run the algorithm (i.e. press the Space bar), then correct the erroneous areas by adding or moving key points, run the algorithm again, etc.. until the contours of the desired object to extract are precise enough. As the label propagation algorithm is reasonably fast (hence dumb), the corresponding workflow runs smoothly.
At this stage, two observations have to be done:
1. As a user, you can easily experiment this manipulation in GIMP, if you have the G’MIC plug-in installed (and if this is not the case, do not wait one second longer and go install it! 😉 ). This foreground extraction filter is located in Contours / Extract foreground [interactive]. When clicking on the Apply button, it opens a new interactive window where you can start adding your foreground/background key points. Press Space to update the extracted mask (i.e. run a new iteration of the algorithm) until it fits your requirements, then quit the window and your foreground (and optionally background) selection will be transferred back to GIMP. I must say I have been quite surprised by the good feedback I got for this particular filter. I’m aware this can be probably a time-saver in some cases, but I can’t get out of my head the overall simplicity of the underlying algorithm.
2. As a developer, you need to convince yourself that this label propagation algorithm is trivial to code. Dear reader, if by chance you are a developer of an open-source software for image processing or photo retouching, and you think this kind of foreground extraction tool could be a cool feature to have, then be aware this can be re-coded from scratch in a very short amount of time. I’m not lying, the hardest part is actually coding the GUI. I illustrate this fact in the followings, and give some details about how this propagation algorithm is currently implemented in G’MIC.
Details of the implementation:
Now I assume that the hardest work has been already done, which is the design of the GUI for letting the user place foreground/background key points. Now, we have then two known images as the input of our algorithm.
At this stage, the image of user-defined labels (in Fig.3.b.) contain pixels whose value can be only 0 (unlabeled pixels = black), 1 (labeled as background = gray) or 2 (labeled as foreground = white). The goal of the label propagation algorithm is then to set a value of 1 or 2 to pixels that have a current label of 0. This label propagation is performed by a so-called Watershed algorithm, which is in fact nothing more than the usual Dijkstra’s algorithm applied on a regular grid (the support of the image) instead of a generic graph. The toughest parts of the label propagation algorithm are:
- Managing a priority queue (look at the pseudo-code on the Wikipedia page of the Dijkstra’s algorithm to see what I mean). This is a very classical algorithmic structure, so nothing terrible for a programmer.
- Defining a scalar field P of potentials (or priorities) to drive the label propagation. We have to define a potential/priority value for each image pixel, that will tells if the current pixel should be labeled in priority or not (hence the need for a priority queue). So P has the same size as the input image, but has only one channel (see it as a matrix of floating-point values if you prefer).
Suppose that you tells the propagation algorithm that all pixels have the same priority (so you feed it with a constant potential image P(x,y)=1). Then, your labels will propagate such that each reconstructed label will take the value of its nearest known (user-defined) label. This will not take the image data into account, so there is no chance to get a satisfactory result. Instead, the key idea is to define the potential image P such that it depends on the contour variations of the color image you want to segment. More precisely, you want pixels with low variations to have a high priority, while pixels on image structures and contours should be labeled last. In G’MIC, I’ve personally defined the potential image P from the input RGB image with the following (heuristic) formula:
is the usual definition of the image gradient, here appearing in the potential map P for each image channel R, G and B. To be more precise, we estimate those gradients from a slightly blurred version of the input image (hence the sigma added to R,G and B, which stands for the standard variation of the Gaussian blur applied on the RGB color image). Defined like this, the values of this potential map P are low for pixels on image contours, and high (with a maximum of 1) for pixels located on flat regions.
If you have the command-line interface gmic of G’MIC installed on your machine, you can easily compute and visualize such a propagation map P for your own image. Just type this simple command line in a shell:
$ gmic image.jpg -blur 0.2% -gradient_norm -fill '1/(1+i^2)'
For our sample image above, the priority map P looks like this (color white correspond to the value 1, and black to something close to 0):
Now, you have all the ingredients to extract your foreground. Run the label propagation algorithm on your image of user-defined labels, with the priority map P estimated from the input color image. It will output the desired image of propagated labels:
And you won’t believe it, but you can actually do all these things in a single command line with gmic:
$ gmic image.jpg labels.png --blur 0.2% -gradient_norm[-1] -fill[-1] '1/(1+i^2)' -watershed  -reverse[-2,-1]
In Fig.5.b., I let the foreground/background frontier pixels keep their values 0, to better see the estimated background (in grey) and foreground (in white). Those pixels are of course also labeled in the final version of the algorithm. And once you have computed this image of propagated labels, the extraction work is almost done. Indeed, this label image is a binary mask that you can use to easily split the input color image into two foreground/background layers with alpha channels:
So, now you have a better idea on how all this works… Isn’t that a completely stupid algorithm ? Apologies if I broke the magic of the thing! 😀
Of course, you can run the full G’MIC foreground extraction filter, not only from the plug-in, but also from the command line, by typing:
$ gmic image.jpg -x_segment , -output output.png
The input file should be a regular RGB image. This command generates an output file in RGBA mode, where only a binary alpha channel has been added to the image (the RGB colors of each pixel remain unchanged).
And now ?
You know the secret of this nice (but simple) foreground extraction algorithm. So you have two options:
- If you are not a developer, harass the developers of your favorite graphics software to include this “so-simple-to-code” algorithm into it!
- If you are a developer interested by the algorithm, just try to code a prototype for it. You will be surprised by how fast this could be actually done!
In any case, please don’t hate me if none of these suggestions worked ;). See you in a next post!
See also the Addendum I wrote, that describes how this can be implemented in C++.