How to code a nice user-guided foreground extraction algorithm?

Principle of the algorithm:

About three months ago, I’ve added a new filter in the G’MIC plug-in for GIMP, which allows to interactively extract a foreground object present in a color image, from its background (considering a single-layered bitmap image as the input). But instead of doing this the “usual” way, e.g. by letting the user crop the object manually with any kind of selection tools, this filter relies on a semi-automatic (user-guided) algorithm to do the job. Basically, the only thing the user needs to do is placing key points which tell about the nature of the underlying pixels: either they belong to the foreground (symbolized by green dots) or to the background (red dots). The algorithm tries then to propagate those labels all over the image, so that every image pixel get a binary label (either foreground or background). The needed interaction is quite limited, and the user doesn’t need to be an image processing guru to control what he does: left/right mouse buttons to add a new foreground/background key point (or move an existing one), middle button and mouse wheel to zoom in/out and pan shift. Nothing really complicated, easy mode.


Fig.1. Principle of the user-guided foreground extraction algorithm: the user puts some foreground/background key points over a color image (left), and the algorithm propagates those labels for all pixels, then splits the initial image into two foreground/background layers (middle and right).

Of course, the algorithm which performs the propagation of these binary labels is quite dumb (as most image processing algorithms actually) despite the fact it tries to follow the contours detected in the image as much as possible. So, sometimes it just assign the wrong label to a pixel. But from the user point of view, this is not so annoying because it is really easy to locally correct the algorithm estimation by putting additional key points in the wrongly labelled regions, and run a new iteration of the algorithm.

Here is a quick video that shows how foreground extraction can be done when using this algorithm. Basically we start by placing one or two key points, run the algorithm (i.e. press the Space bar), then correct the erroneous areas by adding or moving key points, run the algorithm again, etc.. until the contours of the desired object to extract are precise enough. As the label propagation algorithm is reasonably fast (hence dumb), the corresponding workflow runs smoothly.

At this stage, two observations have to be done:

1. As a user, you can easily experiment this manipulation in GIMP, if you have the G’MIC plug-in installed (and if this is not the case, do not wait one second longer and go install it! 😉 ). This foreground extraction filter is located in Contours / Extract foreground [interactive]. When clicking on the Apply button, it opens a new interactive window where you can start adding your foreground/background key points. Press Space to update the extracted mask (i.e. run a new iteration of the algorithm) until it fits your requirements, then quit the window and your foreground (and optionally background) selection will be transferred back to GIMP. I must say I have been quite surprised by the good feedback I got for this particular filter. I’m aware this can be probably a time-saver in some cases, but I can’t get out of my head the overall simplicity of the underlying algorithm.


Fig.2. Location of the “Extract foreground” filter in the G’MIC plug-in for GIMP.

2. As a developer, you need to convince yourself that this label propagation algorithm is trivial to code. Dear reader, if by chance you are a developer of an open-source software for image processing or photo retouching, and you think this kind of foreground extraction tool could be a cool feature to have, then be aware this can be re-coded from scratch in a very short amount of time. I’m not lying, the hardest part is actually coding the GUI. I illustrate this fact in the followings, and give some details about how this propagation algorithm is currently implemented in G’MIC.

Details of the implementation:

Now I assume that the hardest work has been already done, which is the design of the GUI for letting the user place foreground/background key points. Now, we have then two known images as the input of our algorithm.


Fig.3.a. The input color image (who’s that guy ?)


Fig.3.b. An image of labels, containing the FG/BG key points

At this stage, the image of user-defined labels (in Fig.3.b.) contain pixels whose value can be only 0 (unlabeled pixels = black), 1 (labeled as background = gray) or 2 (labeled as foreground = white). The goal of the label propagation algorithm is then to set a value of 1 or to pixels that have a current label of 0. This label propagation is performed by a so-called Watershed algorithm, which is in fact nothing more than the usual Dijkstra’s algorithm applied on a regular grid (the support of the image) instead of a generic graph. The toughest parts of the label propagation algorithm are:

  1. Managing a priority queue (look at the pseudo-code on the Wikipedia page of the Dijkstra’s algorithm to see what I mean). This is a very classical algorithmic structure, so nothing terrible for a programmer.
  2. Defining a scalar field P of potentials (or priorities) to drive the label propagation. We have to define a potential/priority value for each image pixel, that will tells if the current pixel should be labeled in priority or not (hence the need for a priority queue). So P has the same size as the input image, but has only one channel (see it as a matrix of floating-point values if you prefer).

Suppose that you tells the propagation algorithm that all pixels have the same priority (so you feed it with a constant potential image P(x,y)=1). Then, your labels will propagate such that each reconstructed label will take the value of its nearest known (user-defined) label. This will not take the image data into account, so there is no chance to get a satisfactory result. Instead, the key idea is to define the potential image P such that it depends on the contour variations of the color image you want to segment. More precisely, you want pixels with low variations to have a high priority, while pixels on image structures and contours should be labeled last. In G’MIC, I’ve personally defined the potential image P from the input RGB image with the following (heuristic) formula:




is the usual definition of the image gradient, here appearing in the potential map P for each image channel R, G and B. To be more precise, we estimate those gradients from a slightly blurred version of the input image (hence the sigma added to R,G and B, which stands for the standard variation of the Gaussian blur applied on the RGB color image). Defined like this, the values of this potential map P are low for pixels on image contours, and high (with a maximum of 1) for pixels located on flat regions.

If you have the command-line interface gmic of G’MIC installed on your machine, you can easily compute and visualize such a propagation map P for your own image. Just type this simple command line in a shell:

$ gmic image.jpg -blur 0.2% -gradient_norm -fill '1/(1+i^2)'

For our sample image above, the priority map P looks like this (color white correspond to the value 1, and black to something close to 0):


Fig.4. Estimated potential/priority map P.

Now, you have all the ingredients to extract your foreground. Run the label propagation algorithm on your image of user-defined labels, with the priority map P estimated from the input color image. It will output the desired image of propagated labels:


Fig.5.a. Input color image.


Fig.5.b. Result of the labels propagation algorithm.

And you won’t believe it, but you can actually do all these things in a single command line with gmic:

$ gmic image.jpg labels.png --blur[0] 0.2% -gradient_norm[-1] -fill[-1] '1/(1+i^2)' -watershed[1] [2] -reverse[-2,-1]

In Fig.5.b., I let the foreground/background frontier pixels keep their values 0, to better see the estimated background (in grey) and foreground (in white). Those pixels are of course also labeled in the final version of the algorithm. And once you have computed this image of propagated labels, the extraction work is almost done. Indeed, this label image is a binary mask that you can use to easily split the input color image into two foreground/background layers with alpha channels:


Fig.6.a. Estimated foreground layer, after label propagation.


Fig.6.b. Estimated background layer, after label propagation.

So, now you have a better idea on how all this works… Isn’t that a completely stupid algorithm ? Apologies if I broke the magic of the thing! 😀

Of course, you can run the full G’MIC foreground extraction filter, not only from the plug-in, but also from the command line, by typing:

$ gmic image.jpg -x_segment , -output output.png

The input file should be a regular RGB image. This command generates an output file in RGBA mode, where only a binary alpha channel has been added to the image (the RGB colors of each pixel remain unchanged).

And now ?

You know the secret of this nice (but simple) foreground extraction algorithm. So you have two options:

  1. If you are not a developer, harass the developers of your favorite graphics software to include this “so-simple-to-code” algorithm into it!
  2. If you are a developer interested by the algorithm, just try to code a prototype for it. You will be surprised by how fast this could be actually done!

In any case, please don’t hate me if none of these suggestions worked ;). See you in a next post!

See also the Addendum I wrote, that describes how this can be implemented in C++.

7 thoughts on “How to code a nice user-guided foreground extraction algorithm?

  1. […] writing my last article on the Easy (user-guided) foreground extraction algorithm, I realized that you could maybe think I was exaggeratedly arguing the whole algorithm […]

  2. This is really an instructive post, David, and definitely something I would like to include in my PhotoFlow editor. I think I can probably manage the GUI part in a rather simple way.

    So, if I understand correctly the gmic command line that would generate the image with the propagated labels (sort-of “extraction mask”) looks something like this:

    $ gmic image.jpg labels.png –blur[0] 0.2% -gradient_norm[-1] -fill[-1] ‘1/(1+i^2)’ -watershed[1] [2] -reverse[-2,-1]

    In my case, the most simple approach would be to save the image data to a temporary file, let g’mic process the mask, and then load the mask directly from disk.

    There are actually lots of possible uses of such a foreground extraction mask: selective background blurring, selective contrast or color adjustment… definitely something worth being tried!

    Thanks for making this available to all of us.

  3. I’m trying to run the foreground extraction from the C++ g’mic interface, but it seems I cannot get the command syntax right… pfraw-ztx4di.tif is the input image, pfraw-CoXd7u.tif the “labels” image.

    Why the “-blur[0] 0.2%” command is interpreted as an input image filename?

    Here is the full command I’m using, and the G’MIC verbose output:

    -verbose + -input pfraw-ztx4di.tif -n 0,255 -input pfraw-CoXd7u.tif –blur[0] 0.2% -gradient_norm[-1] -fill[-1] ‘1/(1+i^2)’ -watershed[1] [2] -reverse[-2,-1] -output pfraw-9S1Z3K.tif,float,lzw
    [gmic]-0./ Start G’MIC interpreter.
    [gmic]-0./ Increment verbosity level (set to 1).
    [gmic]-0./ Input all frames of TIFF file ‘pfraw-ztx4di.tif’ at position 0 (1 image 2700x1761x1x3).
    [gmic]-1./ Normalize image [0] in range [0,255].
    [gmic]-1./ Input all frames of TIFF file ‘pfraw-CoXd7u.tif’ at position 1 (1 image 2700x1761x1x1).
    [gmic]-2./ Input file ‘–blur[0]’ at position 2
    [gmic]-2./ *** Error *** Unknown filename ‘–blur[0]’.terminate called after throwing an instance of ‘gmic_exception’


    • This should not be the case. What version of libgmic are you using ?

      • Since you told me that my command is in principle correct, I’ve done a bit of investigation of g’mic command parsing wit the debugger… it turns out that my copy-paste of the command from the browser and into emacs has added some non-printable characters that were screwing up gmic’s parsing. After writing the commands by hand it now accepts the -blur command, but stops when executing -watershed:

        [gmic]-2./ Blur image [0] with standard deviation 0.2%, neumann boundary conditions and quasi-gaussian kernel.
        [gmic]-1./ Compute gradient norm of image [1].
        [gmic]-1./gradient_norm/ Decrement verbosity level (set to 0).
        [gmic]-1./gradient_norm/*repeat/*local/ Compute gradient of image [0] along axes ‘x’, with rotation invariant scheme.
        [gmic]-2./gradient_norm/*repeat/*local/ Compute pointwise square function of image [1].
        [gmic]-2./gradient_norm/*repeat/*local/ Compute gradient of image [0] along axes ‘y’, with rotation invariant scheme.
        [gmic]-3./gradient_norm/*repeat/*local/ Compute pointwise square function of image [2].
        [gmic]-3./gradient_norm/*repeat/*local/ Add images [1,2].
        [gmic]-2./gradient_norm/*repeat/*local/ Compute gradient of image [0] along axes ‘z’, with rotation invariant scheme.
        [gmic]-2./gradient_norm/*repeat/*local/ Compute pointwise square function of image [0].
        [gmic]-2./gradient_norm/*repeat/*local/ Add images [0,1].
        [gmic]-1./gradient_norm/*repeat/*local/ Set local variable s=’1′.
        [gmic]-1./gradient_norm/*repeat/*local/ Split image [0] along the ‘c’-axis, into blocs of 1 pixels.
        [gmic]-1./gradient_norm/*repeat/*local/ Add image [0].
        [gmic]-1./gradient_norm/*repeat/*local/ Compute pointwise square root of image [0].
        [gmic]-1./gradient_norm/ Increment verbosity level (set to 1).
        [gmic]-2./ Fill image [1] with expression ‘1/(1+i^2)’.
        [gmic]./ *** Error *** Command ‘-watershed’: Invalid selection [2] (contains starting indice ‘2’, not in range -2..1).terminate called after throwing an instance of ‘gmic_exception’

        By the way, I’m using the official version, directly compiled into photoflow (no external library).

        Thanks for the help!

        • It seems to me you forgot one dash when invoking ‘–blur’. The double dash is important, otherwise you miss an image when invoking watershed afterwards.

  4. I maybe got it finally to work: my command was using “-blur[0] 0.2%” instead of “–blur[0] 0.2%” (one hyphen missing), which seems to make a difference… now the command runs without errors. Time now to check if the output image is also meaningful…

    Sorry for spamming.