Image processing made easier with a powerful math expression evaluator.

Warning: This post contains personal thoughts about my research work in image processing. I’ll discuss about some of the issues I’m facing as an active developer of two open-source image processing frameworks (namely CImg and G’MIC). So keep in mind this will be a bit self-centered. There are high chances you find all this really boring if you are not a developer of image processing software yourself (and even if so). Anyhow, feel free to give your impressions after the reading!

1. Context and issues

In imaging science, image processing is processing of images using mathematical operations by using any form of signal processing for which the input is an image, such as a photograph or video frame.

That’s what Wikipedia says about  image processing. Selecting and ordering those mathematical operations is what actually defines algorithms, and implementing ready-to-use and interesting image processing algorithms is actually one of my goals, as well as making them available for interested users afterwards.

After all those years (>10) as a researcher in the image processing field (and like most of my colleagues), I can say I’ve already implemented a lot of these different algorithms. Mostly in C++ as far as I’m concerned. To be more precise, an important part of my work is even to design (and hopefully publish) my own image processing methods. Most of the time of course, my trials end up with clunky, ineffective and slow operators which give nothing interesting else than knowing the approach is not good enough to be followed. Someone who says everything he tries works right the first time is a liar. Step by step, I try to refine/optimize my prototypes or sometimes even take a completely different direction. Quickly, you realize that it is crucial in this job not to waste time when doing algorithm prototyping because the rate of success is in fact very low.

2644977732_bd871a2e5c_m

Don’t waste your time, in any occasion! (photo by (OvO), under CC-by-nc-sa.)

That’s actually one of the reason why I’ve started the G’MIC project. It was primarily designed as a helper to create and run custom image processing pipelines quickly (from the shell, basically). It saves me time, everyday. But the world of image processing algorithms is broad and sometimes you need to experiment with very low-level routines working at a pixel scale, trying such weird and unexpected stuffs that none of the “usual” image processing algorithms you already have in your toolbox can be of use as it is. Or it is used in a so diverted way that it gets hard to even think about using it adequately. In a word, your pixel-level algorithm won’t be expressed as a simple pipeline (or graph if you want to call it so) of macro-scale image processing operators. That’s the case for instance with most of the well known patch-based image processing algorithms (e.g. Non-Local-Means, or PatchMatch and plenty of variants), where each pixel value of the resulting image is computed from (a lot of) other pixel values whose spatial locations are sometimes not evenly distributed (but not random as well!).

Until now, when I was trying to implement this kind of algorithms, I was resigned to go back coding them in C++: It is one language I feel comfortable with, and I’m sure it will run fast enough most of the time. Indeed, computation time is often a bottleneck in image processing. Some of my colleagues are using scripting languages as Matlab or Python for algorithm prototyping. But they often need some tricks to avoid writing explicit code loops, or need to write at least some fast C/C++ modules that will be compiled and run from those higher-level interfaces, to ensure they get something fast enough (even for prototyping, I’m definitely not talking about optimized production code here!).

dec20-leadtime

But, I’m not really satisfied with my C++ solution: Generally, I end up with several small pieces of C++ sources I need to compile and maintain. I can hardly re-use them in a bigger pipeline, or redistribute them as clean packages without a lot of extra work. Because they are just intended to be prototypes: They often have only basic command-line interfaces and thus cannot be directly integrated into bigger and user-friendly image processing frameworks. Making a prototype algorithm really usable by others requires at least to wrap it as a plug-in/module for [..copy the name of your favorite image processing tool or language here..]. This generally represents a lot of boring coding workthat may even require more time and efforts than writing the algorithm itself!  And I don’t talk about maintenance. If you’ve ever tried to maintain a 10-year old C++ prototype code, lost in one of your sub-sub-sub-sub folder in your $HOME, you know what I mean. I’d definitely prefer a simpler solution that let me spend more time on writing the algorithm itself than packaging it or making it usable. After all, the primary purpose of my work is to create cool algorithms, not really coding user interfaces for them. On the other way, I am a scientist and I’m also happy to share my discoveries with users (and possibly get feedback from them!). How to make those prototyped algorithms finally usable without spending too much time on making them usable ? 🙂

Ideally, I’d like something that could nicely integrate into G’MIC (my favorite framework for doing image processing stuffs, of course 🙂 ). Even if at the end, those algorithms run a bit slower than they are in C++. One could suggest to make them as Octave or Scilab scripts/modules. But I’m the developer of G’MIC, so of course, I’d prefer a solution that help to extend my own project.

So finally, how could I code prototypes for new algorithms working at a pixel-level and make them readily available in G’MIC ? This question has worried me for a long time.

2. Algorithm code viewed as a complex math expression

In G’MIC, the closest thing to what I was looking for, is the command

-fill 'expression'

This command fills each pixel of a given image with the value evaluated from a “mathematical expression”. A mathematical expression being a quite vague concept, it appears you can already write some complex formulas. For instance, typing this on the command line:

$ gmic 400,400,1,3 -fill "X=x-w/2; Y=y-h/2; R=sqrt(X^2+Y^2); a=atan2(Y,X); if(R<=180,255*abs(cos(c+200*(x/w-0.5)*(y/h-0.5))),850*(a%(0.1*(c+1))))"

creates this weird-looking 400×400 color image (I advise you to put sunglasses):

"Image

Fig.1. One synthetic color image obtained by one application of the G’MIC command -fill.

Of course, the specified expression can refer to pixels of an existing input image. And so, it can modify the pixels of an image as well, as in the following example:

$ gmic leno.png -fill "(abs(i(x+1,y)-i(x-1,y)))^0.25"

which computes gamma-corrected differences of neighboring pixels along the X-axis, as shown below:

gmic_leno1

Fig.2.1. Original image leno.png

gmic_leno2

Fig.2.2. Result of the -fill command described above.

(As an aside, let me tell you I’ve recently received e-mails and messages from people who claim that using the image of our beloved Lena to illustrate an article or a blog post is “sexist” (someone even used the term “pornographic”…). I invite you reading the Lena story page if you don’t know why we commonly use this image. As I don’t want to hurt the over-sensibility of these people, I’ll be using a slight variation I’ve made by mixing a photograph of the blessed Jay Leno with the usual image of Lena. Let me call this the Leno image and everyone will be happy (but seriously, get a life!)).

So, as you can imagine, the command -fill already allows me to do a lof of complex and exotic things on images at a pixel level. Technically speaking, it uses the embedded math parser I’ve written for the CImg Library, a C++ open-source image processing library I’ve been developing since 1999 (and on which G’MIC is based). This math parser is quite small (around 1500 lines of C++ code) and quite fast as well, when it is applied on a whole image. That’s mainly because:

1. It uses parallelization (thanks to the use of OpenMP directives) to evaluate expressions on blocs of image pixels in a multi-threaded way.

2. Before being evaluated, the given math expression is pre-compiled on-the-fly by CImg, into a sequence of bytecodes. Then, the evaluation procedure (which is done for the whole image, pixel by pixel) requires only that bytecode sequence to be interpreted, which is way faster than parsing the input mathematical expression itself.

Anyway, I thought the complexity of the pixel-level algorithms I’d like to implement was really higher than just the evaluation of a mathematical formula. But wait… What is missing actually ? Not much more than loops and updatable variables… I already had variables (though non-updatable) and conditionals. Only loops were really missing. That looks like something I could try adding during my summer holidays, isn’t it ? 😉 So, that is where my efforts were focused on during these last weeks: I’ve added new functions to the CImg math parser that allow users to write their own loops in mathematical expressions, namely the functions dowhile(expr,_cond)whiledo(cond,expr) and for(init,cond,expr_end,_expr_body). Of course, it made me also review and re-implement large parts of the math parser code, and I took this opportunity to optimize the whole thing. A new version of the math parser has been made available for the release of G’MIC 1.6.5.2 at the end of August. I’m still working on this expression evaluator in CImg and new improvements and optimizations are ready for the upcoming version 1.6.6.0 of G’MIC (soon to be released).

3. A toy example: Julia fractals

So, what can we do now with these new math features in G’MIC ? Let me illustrate this with a toy example. The following custom G’MIC command renders a Julia fractal. To test it, you just have to copy/paste the following lines in a regular text file user.gmic:

julia_expr :
  1024,1024
  -fill "
    zr = -1.2 + 2.4*x/w;
    zi = -1.2 + 2.4*y/h;
    for (iter = 0, zr^2+zi^2<=4 && iter<256, ++iter,
      t = zr^2 - zi^2 + 0.4;
      (zi *= 2*zr) += 0.2;
      zr = t
    );
    iter"
  -map 7

and invoke the new command -juliar_expr it defines by typing this in the terminal:

$ gmic user.gmic -julia_expr

Then, you’ll get this 1024×1024 color image:

Rendering of the Julia fractal set only from filling an image with a math expression.

Fig.3. Rendering of a Julia fractal only by filling an image with a complex math expression, containing an iteration loop.

As you see, this custom user command -julia_expr is very short and is mainly based on the invokation of the -fill command of G’MIC. But the coolest thing of all happens when we look at the rendering time of that function. The timing measure has been performed on an ASUS laptop with a dual-core HT i7 2Ghz. This is what I get:

Edit : This post has been edited, on 09/20/2015 to reflect the new timings due to math parser optimization done after the initial post of this article.

$ gmic user.gmic -tic -julia -toc
[gmic]-0./ Start G'MIC interpreter.
[gmic]-0./ Input custom command file 'user.gmic' (added 1 command, total 6195).
[gmic]-0./ Initialize timer.
[gmic]-0./julia/ Input black image at position 0 (1 image 1024x1024x1x1).
[gmic]-1./julia/ Fill image [0] with expression ' zr = -1.2 + 2.4*x/w; zi = -1.2 + 2.4*(...)( zi *= 2*zr) += 0.2; zr = t ); iter '.
[gmic]-1./julia/ Map cube color LUT on image [0], with dirichlet boundary conditions.
[gmic]-1./ Elapsed time : 0.631 s.

Less than 0.7 second to fill a 1024×1024 image where each of the 1,048,576 pixels may require up to 256 iterations of a computation loop ? Definitely not bad for a prototyped code written in 5 minutes and which does not require compilation to run! Note that all my CPUs have been active during the computation. Trying the same G’MIC code on my machine at work (a powerful 4x 3-core HT Xeon 2.6Ghz) makes the same render in 0.176 second only!

But of course, one could say:

Why not using the native G’MIC command -mandelbrot instead (here, native means hard-coded as a C++ function) ? It is probably way faster!

Let me compare my previous code with the following G’MIC invokation (which renders exactly the same image):

$ gmic 1024,1024 -tic -mandelbrot -1.2,-1.2,1.2,1.2,256,1,0.4,0.2 -map 7 -toc
[gmic]-0./ Start G'MIC interpreter.
[gmic]-0./ Input black image at position 0 (1 image 1024x1024x1x1).
[gmic]-1./ Initialize timer.
[gmic]-1./ Draw julia fractal on image [0], from complex area (-1.2,-1.2)-(1.2,1.2) with c0 = (0.4,0.2) and 256 iterations.
[gmic]-1./ Map cube color LUT on image [0], with dirichlet boundary conditions.
[gmic]-1./ Elapsed time : 0.055 s.

That’s 12 times faster, than the previous command -julia_expr run on my laptop, indeed! A bit reassuring to know that C++ compiled into assembly code is faster than CImg home-made bytecode compiled on the fly 😉

But the point is: Suppose now I want to slightly modify the rendering of the fractal, i.e. I don’t want to display the maximum iteration anymore for each pixel (variable iter), but the latest value of the variable zi before the divergence test occurs. Look how simple this is to create a slightly modified command -julia_expr2 that does exactly what I want to do. I have the full control on what the function does at a pixel level:

julia_expr2 :
  1024,1024
  -fill "
    zr = -1.2 + 2.4*x/w;
    zi = -1.2 + 2.4*y/h;
    for (iter = 0, zr^2+zi^2<=4 && iter<256, ++iter,
      t = zr^2 - zi^2 + 0.4;
      (zi *= 2*zr) += 0.2;
      zr = t
    );
    zi"
  -normalize 0,255 -map 7

and this modified algorithm renders the image below (still in 0.7 second of course):

Fig.4. Slightly modified version of the Julia fractal by displaying another variable zi in the rendering algorithm.

Without these new loop features introduced in the math parser, I would have been forced to do one of these two things in G’MIC, to get the same result:

  1. Either, add some new options to the native command -mandelbrot to allow this new type of visualization. This basically means: Writing new pieces of C++ code, compiling a new version of G’MIC with these added features, package it and release it to make this available for everyone. Even if I already have some decent release scripts, this implies a lot of extra-work and packaging time. This is not like the user can get the new feature in a few minutes (if you’ve already used the filter update mechanism present in the G’MIC plug-in for GIMP, you know what I mean). And I don’t speak about all the possibilities I couldn’t think of (but the user will obviously need one day 🙂 ), when adding such new display options to a native command like -mandebrot
  2. Or, write a new G’MIC custom script able to compute the same kind of result. This would be indeed the best way to make it available for other people quickly. But here, as the algorithm is very specific and works at a pixel level, doing it as a pipeline of macro-operators is quite a pain. It means I would have to use 3 nested -repeat...-done loops (which are basically the loops commands used in G’MIC pipelines) and it would probably take ages to render, as a G’MIC pipeline is always purely interpreted without any pre-compilation steps. Even by using multi-threading, it would have been a nightmare to compute here.

In fact, the quite long math expression we are using in command -julia_expr2 defines one complex algorithm as a whole, and we know it will be pre-compiled into a sequence of bytecodes by CImg before being evaluated for each of the 1024×1024=1,048,576 pixels that compose the image. Of course, we are not as fast as a native C++ implementation of the same command, but at the same time, we gain so much flexibility and genericity in the stuffs we can do now, that this disadvantage is easily forgiven. And the processing time stays reasonable. For fast algorithm prototyping, this feature seems to be incredibly nice! I won’t be forced to unsheathe my C ++ compiler every time I want to experiment some very specific image processing algorithms working at the pixel level.

4. A more serious example: the “Non-Local Means”

The Non-Local-Means is a quite famous patch-based denoising/smoothing algorithm in image processing, introduced in 2005 by A. Buades (beware, his home page contains images of Lena, please do not click if you are too sensitive!). I won’t enter in all the implementation details, as several different methods has been proposed in the literature just for implementing it. But one of the most simple (and slowest) technique requires 4 nested loops per image pixel. What a good opportunity to try writing this “slow” algorithm using the G’MIC -fill function! It took me less than 10 minutes, to be honest:

nlmeans_expr : -check "${1=10}>0 && isint(${2=3}) && $2>0 && isint(${3=1}) && $3>0"
  sigma=$1  # Denoising strength.
  hl=$2     # Lookup half-size.
  hp=$3     # Patch half-size.
  -fill "
    value = 0;
    sum_weights = 0;
    for (q = -"$hl", q<="$hl", ++q,
      for (p = -"$hl", p<="$hl", ++p,
        diff = 0;
        for (s = -"$hp", s<="$hp", ++s,
          for (r = -"$hp", r<="$hp", ++r,
            diff += (i(x+p+r,y+q+s) - i(x+r,y+s))^2
          )
        );
        weight = exp(-diff/(2*"$sigma")^2);
        value += weight*i(x+p,y+q);
        sum_weights += weight
      )
    );
    value/(1e-5 + sum_weights)
"

Now, let’s test it on a noisy version of the Leno image:

$ gmic user.gmic leno.png -noise 20 -c 0,255 -tic --nlmeans_expr 35,3,1 -toc
[gmic]-0./ Start G'MIC interpreter.
[gmic]-0./ Input custom command file 'user.gmic'(added 1 command, total 6195).
[gmic]-0./ Input file 'leno.png' at position 0 (1 image 512x512x1x3).
[gmic]-1./ Add gaussian noise to image [0], with standard deviation 20.
[gmic]-1./ Cut image [0] in range [0,255].
[gmic]-1./ Initialize timer.
[gmic]-1./nlmeans_expr/ Set local variable sigma='35'.
[gmic]-1./nlmeans_expr/ Set local variable hl='3'.
[gmic]-1./nlmeans_expr/ Set local variable hp='1'.
[gmic]-1./nlmeans_expr/ Fill image [0] with expression ' value=0; sum_weights=0; for(q = -3,q<(...)ight ) ); value/(1e-5 + sum_weights) '.
[gmic]-2./ Elapsed time : 3.156 s.

which results on these two images displayed on the screen, the noisy version (left), and the denoised one using the Non-Local-Means algorithm (right). Of course, the timing may differ from one machine to another. I guess my 3 seconds run here is seemly (tested on my powerful PC at the lab). It still takes less than 20 seconds on my laptop. A crop of the results are presented below. The initial Leno image is a 512×512 RGB image, and the timing has been measured for processing the whole image, of course.

gmic_leno3

Fig.5.1. Crop of a noisy version of the Leno image, degraded with gaussian noise, std=20.

gmic_leno4

Fig.5.2. Denoised version using the NL-means algorithm (custom command -nlmeans_expr).

Here again, you could argue that the native G’MIC command -denoise does the same thing and is faster to run. It is, definitely. Jérome Boulanger (one very active G’MIC contributor) has written a nice custom command -nlmeans that implements the NL-means with a smarter algorithm (avoiding the need for 4 nested loops per pixel) which runs even faster (and it is already available in the plug-in for GIMP). But that’s not the point. What I show here is I’m now able to do some (relatively fast) prototyping of algorithms working at a pixel level in G’MIC, without having to write and compile C++ code. But the best of all is about integration: if the algorithm appears to be interesting/effective enough, I can add it to the G’MIC standard library in a few minutes, and quickly create a filter for the GIMP plug-in as well. Let say it right away, probably 5 minutes after I’ve finished writing the first version of the algorithm, the plug-in users should be able to get it and use it on their own images (and give positive/negative feedback to help for future improvements). That’s what I call a smooth, quick and painless integration! And that is exactly the kind of algorithms I couldn’t implement before as custom G’MIC commands running at a decent speed.

To me, it clearly opens exciting perspectives to quickly prototype and integrate new custom image processing algorithms into G’MIC in the future!

5. The Vector Painting filter

In fact, this has happened earlier than expected. I’ve been able to add one of my latest image filter (named Vector painting) in the G’MIC plug-in for GIMP recently. It has been somehow unexpected, because I was just doing some debugging for improving the CImg math expression evaluator. Briefly, suppose you want to determine for each pixel of an image, the discrete spatial orientation of the maximal value variation, with an angular precision of 45°: For each pixel centered in a 3×3 neighborhood, I want to estimate which pixel of the neighborhood has the highest difference with the center pixel (measured as the squared difference between the two pixel values). To make things simpler, I’ve considered doing this on the image luminance only instead of using all the RGB color channels. At the end, I transform each pixel value into a label (an integer in range [1,8]) that represents one of the possible 45°-orientations of the plane. That was typically the kind of problems that would require the use of custom loops working at a pixel level, so something I couldn’t do easily before the loop feature introduced in my math parser (or I would have done the prototype in C++).

The solution to this problem was surprisingly easy to write. Here again, it didn’t take much more than 5 minutes of work:

foo :
  -luminance
  -fill "dmax = -1; nmax = 0;
         for (n = 0, ++n<=8,
           p = arg(n,-1,0,1,-1,1,-1,0,1);
           q = arg(n,-1,-1,-1,0,0,1,1,1);
           d = (j(p,q,0,0,0,1)-i)^2;
           if(d>dmax,dmax = d; nmax = n,nmax)
         )"

And if we apply this new custom command -foo on our Leno image,

$ gmic user.gmic leno.png -foo

We get this result (after re-normalization of the label image in range [0,255]). Keep in mind that each pixel of the resulting image is an integer label originally in range [1,8]. And by the way, the computation time is ridiculous here (178ms for this 512×512 image).

gmic_vp_000001

Fig.6. Each pixel of the Leno image is replaced by a label saying about which of its 3×3 neighbors is the most different from the central pixel.

It actually looks a bit ugly. But that’s not surprising, as the original image contains noise and you’ll get a lot of small random variations in flat regions. As a result, the labels you get in those regions are noisy as well. Now, what happens when we blur the image before computing the labels? That should regularize the resulting image of labels as well. Indeed:

$ gmic user.gmic leno.png --blur 1% -foo

returns this:

gmic_vpb_000001

Fig.7. Each pixel of the blurred Leno image is replaced by a label saying about which of its 3×3 neighbors is the most different.

That’s interesting! Blurring the input image creates larger portions of constant labels, i.e. regions where the orientation of the maximal pixel variations is the same. And the original image contours keep appearing as natural frontiers of these labelled regions. Then, a natural idea would be to replace each connected region by the average color it overlays in the original color image. In G’MIC, this can be done easily with the command -blend shapeaverage.

$ gmic user.gmic leno.png --blur 1% -foo[-1] -blend shapeaverage

And what we get at the end is a nice piecewise constant abstraction of our initial image. Looks like a “vector painting”, no ? 😉

gmic_vp_res

Fig.8. Result of the “shape average” blending between the original Leno image, and its map of labels, as obtained with command -foo.

As you may imagine, changing the amplitude of the blurring makes the result more or less abstract. Having this, It didn’t take too much time to create a filter that could be run directly from the G’MIC plug-in interface for GIMP. That’s the exact code I wrote to integrate my initial algorithm prototype in G’MIC and make it usable by everyone. It was done in less than 5 minutes, really:

#@gimp Vector painting : gimp_vector_painting, gimp_vector_painting_preview(1)
#@gimp : Details = float(9,0,10)
#@gimp : sep = separator(), Preview type = choice("Full","Forward horizontal","Forward vertical","Backward horizontal","Backward vertical","Duplicate horizontal","Duplicate vertical")
#@gimp : sep = separator(), note = note("<small>Author: <i>David Tschumperl&#233;</i>.\nLatest update: <i>08/25/2015</i>.</small>")
gimp_vector_painting :
  -repeat $! -l[$>]
    --luminance -b[-1] {10-$1}%,1,1
    -f[-1] "dmax = -1; nmax = 0;
            for (n = 0, ++n<=8,
              p = arg(n,-1,0,1,-1,1,-1,0,1);
              q = arg(n,-1,-1,-1,0,0,1,1,1);
              d = (j(p,q,0,0,0,1)-i)^2;
              if (d>dmax, dmax = d; nmax = n,nmax)
            )"
    -blend shapeaverage
  -endl -done

gimp_vector_painting_preview :
  -gimp_split_preview "-gimp_vector_painting $*",$-1

Here is the resulting filter, as it can be seen in the G’MIC plug-in for GIMP (requires version 1.6.5.2+). Just after I’ve pushed it in the G’MIC standard library:

gmic_gimp_vp

Fig.9. The G’MIC plug-in for GIMP, running the “Vector Painting” filter.

Here again, that is how I conceive things should be done properly: 1. I create a quick algorithm prototype to transform an image into something else. 2. I decide that the algorithm is cool enough to be shared. 3. I add few lines to make it available immediately in the G’MIC image processing framework. What a gain of time compared to the time it would have taken by doing this in C++!

6. Comparison with ImageMagick’s -fx operator

While working on the improvement of my math expression evaluator in CImg, I’ve been wondering if what I was doing was not already existing in ImageMagick. Indeed, ImageMagick is one of the most well established open-source image processing framework, and I was almost sure they had already cope with the kind of questions I had for G’MIC. And of course, they had 🙂

So, they have a special operator -fx expression in convert that seems to be equivalent to what the G’MIC command -fill expression does. And yes, they probably had it for years, long before G’MIC even existed. But I admit I’ve almost completely stopped using ImageMagick tools when I’ve started developing my own C++ image processing library CImg, years ago. All the information you need to use this -fx operator in convert can be found on this documentation page, and even more examples on this page. Reading these pages was very instructive: I’ve noticed some interesting functions and notations they have in their own expression parser that I didn’t already have in mine (so, I’ve added some of them in my latest version of CImg!). Also I was particularly interested by this quote from their pages:

As people developed new types of image operations, they usually prototype it using a slow “-fx” operator first. When they have it worked out that ‘method’ is then converted into a new fast built-in operator in the ImageMagick Core library. Users are welcome to contribute their own “-fx” expressions (or other defined functions) that they feel would be a useful addition to IM, but which are not yet covered by other image operators, if they can be handled by one of the above generalized operators, it should be reasonably easy to add it.(…). What is really needed at this time is a FX expression compiler, that will pre-interpret the expression into a tighter and faster executable form. Someone was going to look into this but has since disappeared.

So it seems their -fx operator is quite slow as it re-parses the specified math expression for each image pixel. And when someone writes an interesting operator with -fx, they are willing to convert it into a C code and integrate this new built-in operator directly in the core ImageMagick library. It seems they don’t really mind adding new native hard-coded operators into IM, maybe even for very specific/unusual operators (at least they don’t mention it). That’s interesting, because that is precisely what I’m trying to avoid in G’MIC. My impression is that it’s often acceptable to be less efficient if the code we have to write for adding one feature is smaller, easier to maintain/upgrade and does not require releasing a new version to make this particular feature available. Personally, I’d always prefer to write a G’MIC custom command (so, a script that I can directly put in the G’MIC standard library) if possible, instead of adding the same feature as a new “native” built-in command (in C++). But maybe their -fx operator was that slow it was cumbersome to use in practice? I had to try!

And I’m a bit sorry to say this, but yes, it’s quite slow (and I have tested this on my pretty fast machine with 12 HT cores at 2.6Ghz). The ImageMagick -fx operator is able to use multiple cores which is clearly a good thing, but even with that, it is cumbersome to use on reasonably big images, with complex math expressions. In a sense, that reassures me about the usefulness of having developed my own math expression compiler in CImg. This pre-compilation step of the math expression into a shorter bytecode sequence seems then to be almost mandatory. I’ve done a quick timing comparison for some simple image effects that can be achieved similarly with both expression evaluators of G’MIC and ImageMagick. Most of the examples below have been actually taken from the -fx documentation pages. I’m dividing and multiplying my image values by 255 in the G’MIC examples below because ImageMagick formulas assume that RGB values of the pixels are defined in range [0,1]. These tests have been done with a high-resolution input image (of a motorbike) with size 3072×2048, in RGB mode. I’ve checked that both the ImageMagick and G’MIC invokations render the same images.

#
# Test1: Apply a sigmoid contrast function on the image colors.
#

$ time convert motorbike.jpg -fx "(1.0/(1.0+exp(10.0*(0.5-u)))-0.006693)*1.0092503" im_sigmo.jpg

real	0m9.033s
user	3m18.527s
sys	0m2.604s

$ time gmic -verbose - motorbike.jpg -/ 255 -fill "(1.0/(1.0+exp(10.0*(0.5-i)))-0.006693)*1.0092503" -* 255 -o gmic_sigmo.jpg,75

real    0m0.474s
user    0m3.183s
sys     0m0.111s
#
# Test2: Create a radial gradient from scratch.
#
$ time convert -size 3072x2048 canvas: -fx "Xi=i-w/2; Yj=j-h/2; 1.2*(0.5-hypot(Xi,Yj)/70.0)+0.5" im_radial.jpg

real	0m29.895s
user	8m11.320s
sys	2m59.184s

$ time gmic -verbose - 3072,2048 -fill "Xi=x-w/2; Yj=y-h/2; 1.2*(0.5-hypot(Xi,Yj)/70.0)+0.5" -cut 0,1 -* 255 -o gmic_radial.jpg

real    0m0.234s
user    0m0.990s
sys     0m0.045s

#
# Test3: Create a keftales pattern gradient from scratch.
#
$ time convert -size 3072x2048 xc: -channel G -fx  'sin((i-w/2)*(j-h/2)/w)/2+.5' im_gradient.jpg

real	0m2.951s
user	1m2.310s
sys	0m0.853s

$ time gmic -verbose - 3072,2048 -fill "sin((x-w/2)*(y-h/2)/w)/2+.5" -* 255 -o gmic_gradient.jpg

real    0m0.302s
user    0m1.164s
sys     0m0.061s
#
# Test4: Compute mirrored image along the X-axis.
#
$ time convert motorbike.jpg -fx 'p{w-i-1,j}' im_mirror.jpg 2>&1

real	0m4.409s
user	1m33.702s
sys	0m1.254s

$ time gmic -verbose - motorbike.jpg -fill "i(w-x-1,y)" -o gmic_mirror.jpg

real    0m0.495s
user    0m1.367s
sys     0m0.106s

The pre-compilation of the math expressions clearly makes a difference!

I would be really interested to compare the expression evaluators for more complex expressions, as the one I’ve used to compute Julia fractals with G’MIC for instance. I don’t have a deep knowledge of the ImageMagick syntax, so I don’t know what would be the equivalent command line. If you have any idea on how to do that, please let me know! I’d be interested also to get an idea on how Matlab is performing for the same kind of equations.

7. Conclusion and perspectives

What I conclude from all of this ? Well, I’m actually pretty excited by what the latest version of my expression evaluator integrated in G’MIC / CImg can finally do. It looks like it runs at a decent speed, at least compared to the one used in ImageMagick (which is definitely a reference project for image processing). I had also the idea of comparing it with GraphicsMagick, but I must admit I didn’t find the same -fx operator in it. And I didn’t find something similar (Maybe you could help teaching me how it works for GraphicsMagick?).

I’ve been already able to propose one (simple) artistic filter that I find interesting (Vector painting), and I’m very confident that these improvements of the math expression evaluator will open a lot of new possibilities for G’MIC. For allowing the design of new filters for everyone of course, but also to make my algorithm prototyping work easier and faster.

Could it be the beginning of a new boost for G’MIC? What do you think?

How to code a nice user-guided foreground extraction algorithm? (Addendum)

After writing my last article on the Easy (user-guided) foreground extraction algorithm, I realized that you could maybe think I was exaggeratedly arguing the whole algorithm can be re-coded very quickly from scratch. After all, I’ve just illustrated how the things work by using G’MIC command lines, but there is already a lot of image processing algorithms implemented in G’MIC, so it is somehow a biased demonstration. So let me just give you the corresponding C++ code for the algorithm. Here again, you may find I cheat a little bit, because I use some functions of a C++ image processing library (CImg) that actually doe most of the hard implementation work for me.

But:

  1. You can use this library too in your own code. The CImg Library I use here works on multiple platforms and has a very permissive license, so you probably don’t have to re-code the key image processing algorithms by yourself. And even if you want to do so, you can still look at the source code of CImg. It is quite clearly organized and function codes are easy to find (the whole library is defined in a single header file CImg.h).
  2. I never said the whole algorithm could be done in few lines, only that it was easy to implement 🙂

So, dear C++ programmer, here is the simple prototype you need to make the foreground extraction algorithm work:

#include "CImg.h"
using namespace cimg_library;

int main() {

  // Load input color image.
  const CImg image("image.png");

  // Load input label image.
  CImg labels("labels.png");
  labels.resize(image.width(),image.height(),1,1,0);  // Be sure labels has the correct size.

  // Compute gradients.
  const float sigma = 0.002f*cimg::max(image.width(),image.height());
  const CImg blurred = image.get_blur(sigma);
  CImgList gradient = blurred.get_gradient("xy");
    // gradient[0] and gradient[1] are two CImg images which contain
    // respectively the X and Y derivatives of the blurred RGB image.

  // Compute the potential map P.
  CImg P(labels);
  cimg_forXY(P,x,y)
    P(x,y) = 1/(1 +
                cimg::sqr(gradient(0,x,y,0)) + // Rx^2
                cimg::sqr(gradient(0,x,y,1)) + // Gx^2
                cimg::sqr(gradient(0,x,y,2)) + // Bx^2
                cimg::sqr(gradient(1,x,y,0)) + // Ry^2
                cimg::sqr(gradient(1,x,y,1)) + // Gy^2
                cimg::sqr(gradient(1,x,y,2))); // By^2

  // Run the label propagation algorithm.
  labels.watershed(P,true);

  // Display the result and exit.
  (image,P,labels).display("Image - Potential - Labels");

  return 0;
}
c

To compile it, using g++ on Linux for instance, you have to type something like this in a shell (I assume you have all the necessarily headers and libraries installed):

$ g++ -o foo foo.cpp -Dcimg_use_png -lX11 -lpthread -lpng -lzsses

Now, execute it:

$ ./foo

and you get this kind of result:

cimg_ef

Once you have the resulting image of propagated labels, it is up to you to use it the way you want to split the original image into foreground/background layers or keep it as a mask associated to the color image.

The point is that even if you add the code of the important CImg methods I’ve used here, i.e. CImg<T>::get_blur(), CImg<T>::get_gradient() and CImg<T>::watershed(), you won’t probably exceed about 500 lines of C++ code. Yes, instead of a single line of G’MIC code. Now you see why I’m also a big fan of coding image processing stuffs directly in G’MIC instead of C++ ;).

A last remark before ending this addendum:

Unfortunately, the label propagation algorithm itself is hardly parallelizable. It is based on a priority queue and basically, the choice of a new label to set depends on how the latest label has been set. This is a sequential algorithm by nature. So, when processing large images, a good idea is to do the computations and preview the foreground extraction result only on a smaller preview of the original image. Then compute the full-res mask only once, after all key points have been placed by the user.

Well that’s it, happy coding!

How to code a nice user-guided foreground extraction algorithm?


Principle of the algorithm:

About three months ago, I’ve added a new filter in the G’MIC plug-in for GIMP, which allows to interactively extract a foreground object present in a color image, from its background (considering a single-layered bitmap image as the input). But instead of doing this the “usual” way, e.g. by letting the user crop the object manually with any kind of selection tools, this filter relies on a semi-automatic (user-guided) algorithm to do the job. Basically, the only thing the user needs to do is placing key points which tell about the nature of the underlying pixels: either they belong to the foreground (symbolized by green dots) or to the background (red dots). The algorithm tries then to propagate those labels all over the image, so that every image pixel get a binary label (either foreground or background). The needed interaction is quite limited, and the user doesn’t need to be an image processing guru to control what he does: left/right mouse buttons to add a new foreground/background key point (or move an existing one), middle button and mouse wheel to zoom in/out and pan shift. Nothing really complicated, easy mode.

gmic_extract

Fig.1. Principle of the user-guided foreground extraction algorithm: the user puts some foreground/background key points over a color image (left), and the algorithm propagates those labels for all pixels, then splits the initial image into two foreground/background layers (middle and right).

Of course, the algorithm which performs the propagation of these binary labels is quite dumb (as most image processing algorithms actually) despite the fact it tries to follow the contours detected in the image as much as possible. So, sometimes it just assign the wrong label to a pixel. But from the user point of view, this is not so annoying because it is really easy to locally correct the algorithm estimation by putting additional key points in the wrongly labelled regions, and run a new iteration of the algorithm.

Here is a quick video that shows how foreground extraction can be done when using this algorithm. Basically we start by placing one or two key points, run the algorithm (i.e. press the Space bar), then correct the erroneous areas by adding or moving key points, run the algorithm again, etc.. until the contours of the desired object to extract are precise enough. As the label propagation algorithm is reasonably fast (hence dumb), the corresponding workflow runs smoothly.

At this stage, two observations have to be done:

1. As a user, you can easily experiment this manipulation in GIMP, if you have the G’MIC plug-in installed (and if this is not the case, do not wait one second longer and go install it! 😉 ). This foreground extraction filter is located in Contours / Extract foreground [interactive]. When clicking on the Apply button, it opens a new interactive window where you can start adding your foreground/background key points. Press Space to update the extracted mask (i.e. run a new iteration of the algorithm) until it fits your requirements, then quit the window and your foreground (and optionally background) selection will be transferred back to GIMP. I must say I have been quite surprised by the good feedback I got for this particular filter. I’m aware this can be probably a time-saver in some cases, but I can’t get out of my head the overall simplicity of the underlying algorithm.

gmic_extract_gimp

Fig.2. Location of the “Extract foreground” filter in the G’MIC plug-in for GIMP.

2. As a developer, you need to convince yourself that this label propagation algorithm is trivial to code. Dear reader, if by chance you are a developer of an open-source software for image processing or photo retouching, and you think this kind of foreground extraction tool could be a cool feature to have, then be aware this can be re-coded from scratch in a very short amount of time. I’m not lying, the hardest part is actually coding the GUI. I illustrate this fact in the followings, and give some details about how this propagation algorithm is currently implemented in G’MIC.


Details of the implementation:

Now I assume that the hardest work has been already done, which is the design of the GUI for letting the user place foreground/background key points. Now, we have then two known images as the input of our algorithm.

framed_colander

Fig.3.a. The input color image (who’s that guy ?)

framed_colander_labels

Fig.3.b. An image of labels, containing the FG/BG key points

At this stage, the image of user-defined labels (in Fig.3.b.) contain pixels whose value can be only 0 (unlabeled pixels = black), 1 (labeled as background = gray) or 2 (labeled as foreground = white). The goal of the label propagation algorithm is then to set a value of 1 or to pixels that have a current label of 0. This label propagation is performed by a so-called Watershed algorithm, which is in fact nothing more than the usual Dijkstra’s algorithm applied on a regular grid (the support of the image) instead of a generic graph. The toughest parts of the label propagation algorithm are:

  1. Managing a priority queue (look at the pseudo-code on the Wikipedia page of the Dijkstra’s algorithm to see what I mean). This is a very classical algorithmic structure, so nothing terrible for a programmer.
  2. Defining a scalar field P of potentials (or priorities) to drive the label propagation. We have to define a potential/priority value for each image pixel, that will tells if the current pixel should be labeled in priority or not (hence the need for a priority queue). So P has the same size as the input image, but has only one channel (see it as a matrix of floating-point values if you prefer).

Suppose that you tells the propagation algorithm that all pixels have the same priority (so you feed it with a constant potential image P(x,y)=1). Then, your labels will propagate such that each reconstructed label will take the value of its nearest known (user-defined) label. This will not take the image data into account, so there is no chance to get a satisfactory result. Instead, the key idea is to define the potential image P such that it depends on the contour variations of the color image you want to segment. More precisely, you want pixels with low variations to have a high priority, while pixels on image structures and contours should be labeled last. In G’MIC, I’ve personally defined the potential image P from the input RGB image with the following (heuristic) formula:

potentials

where

gradient

is the usual definition of the image gradient, here appearing in the potential map P for each image channel R, G and B. To be more precise, we estimate those gradients from a slightly blurred version of the input image (hence the sigma added to R,G and B, which stands for the standard variation of the Gaussian blur applied on the RGB color image). Defined like this, the values of this potential map P are low for pixels on image contours, and high (with a maximum of 1) for pixels located on flat regions.

If you have the command-line interface gmic of G’MIC installed on your machine, you can easily compute and visualize such a propagation map P for your own image. Just type this simple command line in a shell:

$ gmic image.jpg -blur 0.2% -gradient_norm -fill '1/(1+i^2)'

For our sample image above, the priority map P looks like this (color white correspond to the value 1, and black to something close to 0):

framed_colander_potential

Fig.4. Estimated potential/priority map P.

Now, you have all the ingredients to extract your foreground. Run the label propagation algorithm on your image of user-defined labels, with the priority map P estimated from the input color image. It will output the desired image of propagated labels:

framed_colander

Fig.5.a. Input color image.

framed_colander_mask

Fig.5.b. Result of the labels propagation algorithm.

And you won’t believe it, but you can actually do all these things in a single command line with gmic:

$ gmic image.jpg labels.png --blur[0] 0.2% -gradient_norm[-1] -fill[-1] '1/(1+i^2)' -watershed[1] [2] -reverse[-2,-1]

In Fig.5.b., I let the foreground/background frontier pixels keep their values 0, to better see the estimated background (in grey) and foreground (in white). Those pixels are of course also labeled in the final version of the algorithm. And once you have computed this image of propagated labels, the extraction work is almost done. Indeed, this label image is a binary mask that you can use to easily split the input color image into two foreground/background layers with alpha channels:

framed_colander_foreground

Fig.6.a. Estimated foreground layer, after label propagation.

framed_colander_background

Fig.6.b. Estimated background layer, after label propagation.

So, now you have a better idea on how all this works… Isn’t that a completely stupid algorithm ? Apologies if I broke the magic of the thing! 😀

Of course, you can run the full G’MIC foreground extraction filter, not only from the plug-in, but also from the command line, by typing:

$ gmic image.jpg -x_segment , -output output.png

The input file should be a regular RGB image. This command generates an output file in RGBA mode, where only a binary alpha channel has been added to the image (the RGB colors of each pixel remain unchanged).

And now ?

You know the secret of this nice (but simple) foreground extraction algorithm. So you have two options:

  1. If you are not a developer, harass the developers of your favorite graphics software to include this “so-simple-to-code” algorithm into it!
  2. If you are a developer interested by the algorithm, just try to code a prototype for it. You will be surprised by how fast this could be actually done!

In any case, please don’t hate me if none of these suggestions worked ;). See you in a next post!

See also the Addendum I wrote, that describes how this can be implemented in C++.