How to code a nice user-guided foreground extraction algorithm? (Addendum)

After writing my last article on the Easy (user-guided) foreground extraction algorithm, I realized that you could maybe think I was exaggeratedly arguing the whole algorithm can be re-coded very quickly from scratch. After all, I’ve just illustrated how the things work by using G’MIC command lines, but there is already a lot of image processing algorithms implemented in G’MIC, so it is somehow a biased demonstration. So let me just give you the corresponding C++ code for the algorithm. Here again, you may find I cheat a little bit, because I use some functions of a C++ image processing library (CImg) that actually doe most of the hard implementation work for me.

But:

  1. You can use this library too in your own code. The CImg Library I use here works on multiple platforms and has a very permissive license, so you probably don’t have to re-code the key image processing algorithms by yourself. And even if you want to do so, you can still look at the source code of CImg. It is quite clearly organized and function codes are easy to find (the whole library is defined in a single header file CImg.h).
  2. I never said the whole algorithm could be done in few lines, only that it was easy to implement 🙂

So, dear C++ programmer, here is the simple prototype you need to make the foreground extraction algorithm work:

#include "CImg.h"
using namespace cimg_library;

int main() {

  // Load input color image.
  const CImg image("image.png");

  // Load input label image.
  CImg labels("labels.png");
  labels.resize(image.width(),image.height(),1,1,0);  // Be sure labels has the correct size.

  // Compute gradients.
  const float sigma = 0.002f*cimg::max(image.width(),image.height());
  const CImg blurred = image.get_blur(sigma);
  CImgList gradient = blurred.get_gradient("xy");
    // gradient[0] and gradient[1] are two CImg images which contain
    // respectively the X and Y derivatives of the blurred RGB image.

  // Compute the potential map P.
  CImg P(labels);
  cimg_forXY(P,x,y)
    P(x,y) = 1/(1 +
                cimg::sqr(gradient(0,x,y,0)) + // Rx^2
                cimg::sqr(gradient(0,x,y,1)) + // Gx^2
                cimg::sqr(gradient(0,x,y,2)) + // Bx^2
                cimg::sqr(gradient(1,x,y,0)) + // Ry^2
                cimg::sqr(gradient(1,x,y,1)) + // Gy^2
                cimg::sqr(gradient(1,x,y,2))); // By^2

  // Run the label propagation algorithm.
  labels.watershed(P,true);

  // Display the result and exit.
  (image,P,labels).display("Image - Potential - Labels");

  return 0;
}
c

To compile it, using g++ on Linux for instance, you have to type something like this in a shell (I assume you have all the necessarily headers and libraries installed):

$ g++ -o foo foo.cpp -Dcimg_use_png -lX11 -lpthread -lpng -lzsses

Now, execute it:

$ ./foo

and you get this kind of result:

cimg_ef

Once you have the resulting image of propagated labels, it is up to you to use it the way you want to split the original image into foreground/background layers or keep it as a mask associated to the color image.

The point is that even if you add the code of the important CImg methods I’ve used here, i.e. CImg<T>::get_blur(), CImg<T>::get_gradient() and CImg<T>::watershed(), you won’t probably exceed about 500 lines of C++ code. Yes, instead of a single line of G’MIC code. Now you see why I’m also a big fan of coding image processing stuffs directly in G’MIC instead of C++ ;).

A last remark before ending this addendum:

Unfortunately, the label propagation algorithm itself is hardly parallelizable. It is based on a priority queue and basically, the choice of a new label to set depends on how the latest label has been set. This is a sequential algorithm by nature. So, when processing large images, a good idea is to do the computations and preview the foreground extraction result only on a smaller preview of the original image. Then compute the full-res mask only once, after all key points have been placed by the user.

Well that’s it, happy coding!

How to code a nice user-guided foreground extraction algorithm?


Principle of the algorithm:

About three months ago, I’ve added a new filter in the G’MIC plug-in for GIMP, which allows to interactively extract a foreground object present in a color image, from its background (considering a single-layered bitmap image as the input). But instead of doing this the “usual” way, e.g. by letting the user crop the object manually with any kind of selection tools, this filter relies on a semi-automatic (user-guided) algorithm to do the job. Basically, the only thing the user needs to do is placing key points which tell about the nature of the underlying pixels: either they belong to the foreground (symbolized by green dots) or to the background (red dots). The algorithm tries then to propagate those labels all over the image, so that every image pixel get a binary label (either foreground or background). The needed interaction is quite limited, and the user doesn’t need to be an image processing guru to control what he does: left/right mouse buttons to add a new foreground/background key point (or move an existing one), middle button and mouse wheel to zoom in/out and pan shift. Nothing really complicated, easy mode.

gmic_extract

Fig.1. Principle of the user-guided foreground extraction algorithm: the user puts some foreground/background key points over a color image (left), and the algorithm propagates those labels for all pixels, then splits the initial image into two foreground/background layers (middle and right).

Of course, the algorithm which performs the propagation of these binary labels is quite dumb (as most image processing algorithms actually) despite the fact it tries to follow the contours detected in the image as much as possible. So, sometimes it just assign the wrong label to a pixel. But from the user point of view, this is not so annoying because it is really easy to locally correct the algorithm estimation by putting additional key points in the wrongly labelled regions, and run a new iteration of the algorithm.

Here is a quick video that shows how foreground extraction can be done when using this algorithm. Basically we start by placing one or two key points, run the algorithm (i.e. press the Space bar), then correct the erroneous areas by adding or moving key points, run the algorithm again, etc.. until the contours of the desired object to extract are precise enough. As the label propagation algorithm is reasonably fast (hence dumb), the corresponding workflow runs smoothly.

At this stage, two observations have to be done:

1. As a user, you can easily experiment this manipulation in GIMP, if you have the G’MIC plug-in installed (and if this is not the case, do not wait one second longer and go install it! 😉 ). This foreground extraction filter is located in Contours / Extract foreground [interactive]. When clicking on the Apply button, it opens a new interactive window where you can start adding your foreground/background key points. Press Space to update the extracted mask (i.e. run a new iteration of the algorithm) until it fits your requirements, then quit the window and your foreground (and optionally background) selection will be transferred back to GIMP. I must say I have been quite surprised by the good feedback I got for this particular filter. I’m aware this can be probably a time-saver in some cases, but I can’t get out of my head the overall simplicity of the underlying algorithm.

gmic_extract_gimp

Fig.2. Location of the “Extract foreground” filter in the G’MIC plug-in for GIMP.

2. As a developer, you need to convince yourself that this label propagation algorithm is trivial to code. Dear reader, if by chance you are a developer of an open-source software for image processing or photo retouching, and you think this kind of foreground extraction tool could be a cool feature to have, then be aware this can be re-coded from scratch in a very short amount of time. I’m not lying, the hardest part is actually coding the GUI. I illustrate this fact in the followings, and give some details about how this propagation algorithm is currently implemented in G’MIC.


Details of the implementation:

Now I assume that the hardest work has been already done, which is the design of the GUI for letting the user place foreground/background key points. Now, we have then two known images as the input of our algorithm.

framed_colander

Fig.3.a. The input color image (who’s that guy ?)

framed_colander_labels

Fig.3.b. An image of labels, containing the FG/BG key points

At this stage, the image of user-defined labels (in Fig.3.b.) contain pixels whose value can be only 0 (unlabeled pixels = black), 1 (labeled as background = gray) or 2 (labeled as foreground = white). The goal of the label propagation algorithm is then to set a value of 1 or to pixels that have a current label of 0. This label propagation is performed by a so-called Watershed algorithm, which is in fact nothing more than the usual Dijkstra’s algorithm applied on a regular grid (the support of the image) instead of a generic graph. The toughest parts of the label propagation algorithm are:

  1. Managing a priority queue (look at the pseudo-code on the Wikipedia page of the Dijkstra’s algorithm to see what I mean). This is a very classical algorithmic structure, so nothing terrible for a programmer.
  2. Defining a scalar field P of potentials (or priorities) to drive the label propagation. We have to define a potential/priority value for each image pixel, that will tells if the current pixel should be labeled in priority or not (hence the need for a priority queue). So P has the same size as the input image, but has only one channel (see it as a matrix of floating-point values if you prefer).

Suppose that you tells the propagation algorithm that all pixels have the same priority (so you feed it with a constant potential image P(x,y)=1). Then, your labels will propagate such that each reconstructed label will take the value of its nearest known (user-defined) label. This will not take the image data into account, so there is no chance to get a satisfactory result. Instead, the key idea is to define the potential image P such that it depends on the contour variations of the color image you want to segment. More precisely, you want pixels with low variations to have a high priority, while pixels on image structures and contours should be labeled last. In G’MIC, I’ve personally defined the potential image P from the input RGB image with the following (heuristic) formula:

potentials

where

gradient

is the usual definition of the image gradient, here appearing in the potential map P for each image channel R, G and B. To be more precise, we estimate those gradients from a slightly blurred version of the input image (hence the sigma added to R,G and B, which stands for the standard variation of the Gaussian blur applied on the RGB color image). Defined like this, the values of this potential map P are low for pixels on image contours, and high (with a maximum of 1) for pixels located on flat regions.

If you have the command-line interface gmic of G’MIC installed on your machine, you can easily compute and visualize such a propagation map P for your own image. Just type this simple command line in a shell:

$ gmic image.jpg -blur 0.2% -gradient_norm -fill '1/(1+i^2)'

For our sample image above, the priority map P looks like this (color white correspond to the value 1, and black to something close to 0):

framed_colander_potential

Fig.4. Estimated potential/priority map P.

Now, you have all the ingredients to extract your foreground. Run the label propagation algorithm on your image of user-defined labels, with the priority map P estimated from the input color image. It will output the desired image of propagated labels:

framed_colander

Fig.5.a. Input color image.

framed_colander_mask

Fig.5.b. Result of the labels propagation algorithm.

And you won’t believe it, but you can actually do all these things in a single command line with gmic:

$ gmic image.jpg labels.png --blur[0] 0.2% -gradient_norm[-1] -fill[-1] '1/(1+i^2)' -watershed[1] [2] -reverse[-2,-1]

In Fig.5.b., I let the foreground/background frontier pixels keep their values 0, to better see the estimated background (in grey) and foreground (in white). Those pixels are of course also labeled in the final version of the algorithm. And once you have computed this image of propagated labels, the extraction work is almost done. Indeed, this label image is a binary mask that you can use to easily split the input color image into two foreground/background layers with alpha channels:

framed_colander_foreground

Fig.6.a. Estimated foreground layer, after label propagation.

framed_colander_background

Fig.6.b. Estimated background layer, after label propagation.

So, now you have a better idea on how all this works… Isn’t that a completely stupid algorithm ? Apologies if I broke the magic of the thing! 😀

Of course, you can run the full G’MIC foreground extraction filter, not only from the plug-in, but also from the command line, by typing:

$ gmic image.jpg -x_segment , -output output.png

The input file should be a regular RGB image. This command generates an output file in RGBA mode, where only a binary alpha channel has been added to the image (the RGB colors of each pixel remain unchanged).

And now ?

You know the secret of this nice (but simple) foreground extraction algorithm. So you have two options:

  1. If you are not a developer, harass the developers of your favorite graphics software to include this “so-simple-to-code” algorithm into it!
  2. If you are a developer interested by the algorithm, just try to code a prototype for it. You will be surprised by how fast this could be actually done!

In any case, please don’t hate me if none of these suggestions worked ;). See you in a next post!

See also the Addendum I wrote, that describes how this can be implemented in C++.

Visualizing the 3D point cloud of RGB colors


Preliminary note:

This is the very first post of my blog on Open Source Graphics. So, dear reader, welcome! I hope you will find some useful (or at least distracting) articles here. Please feel free to comment, and tell me about your advice and experience on graphics creation, photo retouching and image processing.


Understanding how the image colors are distributed:

According to Wikipedia, the histogram of an image is “a graphical representation of the tonal distribution in a digital image. It plots the number of pixels for each tonal value. By looking at the histogram for a specific image a viewer will be able to judge the entire tonal distribution at a glance.”

This is a very usual way to better understand the global statistics of the pixel values in an image. Often, when we look at an image histogram (which is a feature that most image editors and photo retouching programs propose), we see something like this:

histogram_luminance1

Fig.1. Lena image (left), and histogram of the luminances (right).

Here, the histogram (on the right) tells about the distribution of the image luminances, for each possible value of the luminance, from 0 to 255 (assuming a 8-bits per channel color image as an input). Sometimes, the displayed histogram rather contains the distribution of the pixel values simultaneously for the 3 image channels RGB like in the figure below:

histogram_colors

Fig.2. Lena image (left), and its RGB histogram (right).

For instance, this RGB histogram clearly shows that the image contains a lot of pixels with high values of Red, which is in fact not so surprising for this particular image of Lena. But what we still miss is the relationship between the three color components: do the pixels with a lot of Red cover a wide range of different colors, or is it just the same reddish color repeated all over the image ? Look at the image below: it has exactly the same RGB histogram as the image above. I’ve just mirrored the Green and Blue channels respectively along the x and y-axes. But, the perception of the image colors is in fact very different. We see a lot more greens in it.

histogram_colors2

Fig.3. Partially mirrored Lena image (left), and its RGB histogram.

So clearly, it could be interesting to get more clues about the correlation between the different color components of the pixels in an image. This requires to visualize something different, as the 3D distribution (or histogram) of the colors in the RGB space: each pixel (x,y) of a color image can be indeed viewed as a point with 3D coordinates (R(x,y),G(x,y),B(x,y)) located inside a cube of size 256x256x256, in the 3D color space RGB. The whole set of image pixels forms then a 3D point cloud in this 3D space. To visualize this pointcloud, each displayed point takes a color that can be either its actual RGB value (to get the 3D colors distribution), or a color expressing the number of occurrences of this RGB color in the initial image (to get the 3D colors histogram). For the Lena image, it looks like this. A wireframe representation of the RGB cube boundaries has been also added to better locate the image colors:

coldis_lena

Fig.4. Lena image (left), colors distribution in 3D RGB space (middle) and colors histogram in 3D RGB space (right).

With this way of representing colors distributions, we get a better feeling about:

  1. The global variety of the colors in an image.
  2. Locally, the amount of dispersion of color tones and shades around a given color.

I personally prefer the 3D colors distribution view (Fig.4.middle), as it is easier to see which color corresponds to which point, even if we lost the information about the color occurrences (but this is still hardly visible in the 3D colors histogram). Now, if we compare the 3D colors distributions of the Lena image and its partially-mirrored version (remember, they have the same RGB histogram), we can see a big difference !

coldis_lenaflip

Fig.5. Comparisons of the colors distributions in the 3D RGB space, for the Lena image, and its partially-mirrored version.

Let me show you some other examples, to illustrate that visualizing 3D colors distributions is a nice additional way to get more global information about how the colors of an image are arranged:

coldis_vegas

Fig.6. Las Vegas image (left) and its colors distribution in the 3D RGB space (right).

The image above has been shot by Patrick David in Las Vegas. We can see in the 3D colors distribution view, which colors are present and the input image (which is already quite colorful), and which are not (purples are almost non-existent). Below is another image (a portrait of Mairi) shot by Patrick:

coldis_mary

Fig.7. Portrait of Mairi (left) and its colors distribution in the 3D RGB space (right).

Here, the 3D colors distribution reflects the two dominant colors of the image (greys and skin/hair colors) and the continuity of their shading into the black. Note that this is not something we can see from the RGB histogram of the same input image, so this kind of visualization really says something new about the global statistics of the image colors.

mary_histo

Fig.8. Portrait of Mairi (left) and its RGB histogram (right).

Let me show you some more extreme cases:

coldis_bottles

Fig.9. Bottles image (left) and its colors distribution in the 3D RGB space (right).

We can clearly see three main color branches, corresponding to each of the three main colors of the bottles (blue, pink and yellow-green), with all the shades (to white) the corresponding pixels take. Now, let us take the same image which has been desaturated:

coldis_bottlesbw

Fig.10. Desaturated bottles image (left) and its colors distribution in 3D RGB space (right).

No surprise, the only 3D color points you get belong to the diagonal of the RGB cube, i.e. the points where R = G = B. This last example shows the opposite case with a (synthetic) image containing a lot of different colors :

coldis_plasma

Fig.11. Synthetic plasma image (left) and its colors distribution in 3D RGB space (right).

I think all these examples illustrate quite well that we can often learn a lot about the colors dispersion of an image by looking at his 3D colors distribution, in addition to the usual histograms. This is somehow something I miss in many image editors and image retouching software.

So, my next step will be to show how I’ve create those views, and how you can do the same with your own images. And of course, only with open-source imaging tools 😉  I chose to use the command-line interface of G’MIC for this particular task, because it is quite flexible to manage 3D vector objects (and also because I know it quite well!).


Generating 3D colors distribution views with G’MIC:

I assume you have already the command-line interface of G’MIC (which is called gmic) installed on your system. The commands I show below should be typed in a shell (I use bash on a machine running Linux, personally). The first thing to do after installing gmic is to ensure we get the latest update of the commands proposed in the G’MIC framework:

$ gmic -update

[gmic]-0./ Start G'MIC interpreter.
[gmic]-0./ Update commands from the latest definition file on the G'MIC server.
[gmic]-0./ End G'MIC interpreter.

Now, the main trick consists in using the specific G’MIC command -distribution3d, which converts an input color image as a 3D vector object corresponding to the desired colors distribution we want to visualize. I will also add two things to improve it a little bit:

  1. if the image is too big, I will reduce it to a reasonnable size, because the number of points in the generated 3D object is always equal to the number of pixels in the image, and we don’t want it to be unnecessarily large, for performance reasons.
  2. I will also merge the generated 3D pointcloud with the boundaries of the RGB color cube, in wireframe mode, as in the examples you’ve seen in the previous section.

This long (but comprehensive) command line below does the job:

$ gmic input_image.jpg -resize2dy "{min(h,512)}" -distribution3d -colorcube3d -primitives3d[-1] 1 -+3d

[gmic]-0./ Start G'MIC interpreter.
[gmic]-0./ Input file 'input_image.jpg' at position 0 (1 image 3283x4861x1x3).
[gmic]-1./ Resize 2d image [0] to 512 pixels along the y-axis, preserving 2d ratio.
[gmic]-1./ Get 3d color distribution of image [0].
[gmic]-1./ Input 3d RGB-color cube.
[gmic]-2./ Convert primitives of 3d object [1] to segments.
[gmic]-2./ Merge 3d objects [0,1].
[gmic]-1./ Display 3d object [0] = '[3d distribution]*' (177160 vertices, 177176 primitives).

It opens an interactive visualization window with the 3D colors distribution that you can rotate with your mouse:

gmic_view

Nice! But let me go one step further.

Now, I want to generate an animation (a video file basically) showing the 3D colors distribution object rotating, just beside the input image. For this, I can use the command -snapshot3d inside a -do…-while loop in order to get all the images of my animation. I won’t enter into much technical details because this post is not about the G’MIC programming language (for this, I refer the interested reader to these nicely written tutorial pages), but basically we can do this task easily by defining our own custom command for G’MIC. Just copy/paste the code snippet below into your $HOME/.gmic configuration file (or %APPDATA%/gmic for Windows users) in order to create a new command named -animate_color_distribution3d, that will be recognized by the G’MIC interpreter next time you invoke it:

animate_color_distribution3d :
    
  angle=0
  delta_angle=2

  # Create 3d object from input image.
  --local
    -resize2dy {min(h,512)} -distribution3d
    -colorcube3d -primitives3d[-1] 1
    -+3d -center3d -normalize3d -*3d 180
    -rotate3d 1,1,0,60
  -endl

  # Resize input image so that its height is exactly 300.
  -resize2dy[0] 300

  # Render animated frames (with height 300).
  -do
    --rotate3d[1] 0,1,0,$angle
    -snapshot3d[-1] 300,0,0,0,0,0 --a[0,-1] x
    -w[-1] -rm[-2]
    angle={$angle+$delta_angle}
  -while {$angle<360}
  -rm[0,1] -resize2dx 640

Once you’ve done that, the command is ready for use. It takes no arguments, so you can simply call it like this:

$ gmic input_image.jpg -animate_color_distribution3d -output output_video.avi

And after a while, your video file should be saved in your current path. That’s it !

Below, I show some examples which have been generated by this custom G’MIC command. The first picture is a beautiful (and colorful) drawing from David Revoy. I find its 3D color distribution particularly interesting.

The landscape picture below is also interesting, because of the almost bi-tonal nature of the image:

With the Joconde painting, we can also see that the number of effective color tones is quite reduced:

And if you apply the command -animate_color_distribution3d on a synthetic plasma image, as done below, you will get a really nice 3D pointcloud, just like the ones we can see in some sci-fi movies !


And now ?

Well, that’s it ! Nothing more to say for this post :). Surprisingly, it is already much longer than expected when I’ve started writing it. Next time you see a color image, maybe you could try to visualize its color distribution in the 3D RGB space along with the classical histograms. Maybe it will help you to better understand the dynamics of the image colors, who knows ?. At least we know that this is something we can do easily with FLOSS imaging tools. See you next time !