Learning R: Painting with Fire


A few months ago I published a post on recursion: To understand Recursion you have to understand Recursion…. In this post we will see how to use recursion to fill free areas of an image with colour, the caveats of recursion and how to transform a recursive algorithm into a loop-based version using a queue – so read on…

The recursive version of the painting algorithm we want to examine here is very easy to understand, Wikipedia gives the pseudocode of the so called flood-fill algorithm:


Flood-fill (node, target-color, replacement-color):

  • If target-color is equal to replacement-color, return.
  • If the color of node is not equal to target-color, return.
  • Set the color of node to replacement-color.
  • Perform Flood-fill (one step to the south of node, target-color, replacement-color).
    Perform Flood-fill (one step to the north of node, target-color, replacement-color).
    Perform Flood-fill (one step to the west of node, target-color, replacement-color).
    Perform Flood-fill (one step to the east of node, target-color, replacement-color).
  • Return.

The translation into R couldn’t be any easier:

floodfill <- function(row, col, tcol, rcol) {
  if (tcol == rcol) return()
  if (M[row, col] != tcol) return()
  M[row, col] <<- rcol
  floodfill(row - 1, col    , tcol, rcol) # south
  floodfill(row + 1, col    , tcol, rcol) # north
  floodfill(row    , col - 1, tcol, rcol) # west
  floodfill(row    , col + 1, tcol, rcol) # east
  return("filling completed")
}

We take the image from Wikipedia as an example:

M <- matrix(c(1, 1, 1, 1, 1, 1, 1, 1, 1,
              1, 0, 0, 0, 1, 0, 0, 0, 1,
              1, 0, 0, 0, 1, 0, 0, 0, 1,
              1, 0, 0, 1, 0, 0, 0, 0, 1,
              1, 1, 1, 0, 0, 0, 1, 1, 1,
              1, 0, 0, 0, 0, 1, 0, 0, 1,
              1, 0, 0, 0, 1, 0, 0, 0, 1,
              1, 0, 0, 0, 1, 0, 0, 0, 1,
              1, 1, 1, 1, 1, 1, 1, 1, 1), 9, 9)
image(M, col = c(0, 1))

We now fill the three areas with three different colours and then plot the image again:

startrow <- 5; startcol <- 5
floodfill(startrow, startcol, 0, 2)
## [1] "filling completed"

startrow <- 3; startcol <- 3
floodfill(startrow, startcol, 0, 3)
## [1] "filling completed"

startrow <- 7; startcol <- 7
floodfill(startrow, startcol, 0, 4)
## [1] "filling completed"

image(M, col = 1:4)

This seems to work pretty well but the problem is that the more nested the algorithm becomes the bigger the stack has to be – which could lead to overflow errors. One comment on my original post on recursion read:

just keep in mind that recursion is useful in industrial work only if tail optimization is supported. otherwise your code will explode at some indeterminate time in the future. […]

One possibility is to increase the size of the stack with options(expressions = 10000) but even this may not be enough. Therefore we transform our recursive algorithm into a loop-based one and use a queue instead of a stack! The pseudocode from Wikipedia:


Flood-fill (node, target-color, replacement-color):

  • If target-color is equal to replacement-color, return.
  • If color of node is not equal to target-color, return.
  • Set the color of node to replacement-color.
  • Set Q to the empty queue.
  • Add node to the end of Q.
  • While Q is not empty:
    • Set n equal to the first element of Q.
    • Remove first element from Q.
    • If the color of the node to the west of n is target-color,
      set the color of that node to replacement-color and add that node to the end of Q.
    • If the color of the node to the east of n is target-color,
      set the color of that node to replacement-color and add that node to the end of Q.
    • If the color of the node to the north of n is target-color,
      set the color of that node to replacement-color and add that node to the end of Q.
    • If the color of the node to the south of n is target-color,
      set the color of that node to replacement-color and add that node to the end of Q.
  • Continue looping until Q is exhausted.
  • Return.

Because of the way the algorithm fills areas it is also called forest fire. Again, the translation into valid R code is straightforward:

floodfill <- function(row, col, tcol, rcol) {
  if (tcol == rcol) return()
  if (M[row, col] != tcol) return()
  Q <- matrix(c(row, col), 1, 2)
  while (dim(Q)[1] > 0) {
    n <- Q[1, , drop = FALSE]
    west  <- cbind(n[1]    , n[2] - 1)
    east  <- cbind(n[1]    , n[2] + 1)
    north <- cbind(n[1] + 1, n[2]    )
    south <- cbind(n[1] - 1, n[2]    )
    Q <- Q[-1, , drop = FALSE]
    if (M[n] == tcol) {
      M[n] <<- rcol
      if (M[west] == tcol)  Q <- rbind(Q, west)
      if (M[east] == tcol)  Q <- rbind(Q, east)
      if (M[north] == tcol) Q <- rbind(Q, north)
      if (M[south] == tcol) Q <- rbind(Q, south)
    }
  }
  return("filling completed")
}

As an example we will use a much bigger picture (it can be downloaded from here: Unfilledcirc.png):

library(png)
img <- readPNG("data/Unfilledcirc.png")
M <- img[ , , 1]
M <- ifelse(M < 0.5, 0, 1)
M <- rbind(M, 0)
M <- cbind(M, 0)
image(M, col = c(1, 0))

And now for the filling:

startrow <- 100; startcol <- 100
floodfill(startrow, startcol, 0, 2)
## [1] "filling completed"

startrow <- 50; startcol <- 50
floodfill(startrow, startcol, 1, 3)
## [1] "filling completed"

image(M, col = c(1, 0, 2, 3))

As you can see, with this version of the algorithm much bigger areas can be filled!

I also added both R implementations to the respective section of Rosetta Code: Bitmap/Flood fill.

Check Machin-like formulae with arbitrary-precision arithmetic

Happy New Year to all of you! Let us start the year with something for your inner maths nerd 🙂

Rosetta Stone
Source: Wikimedia

For those of you who don’t yet know Rosetta Code: it is a real cool site where you can find lots of interesting code examples in all kinds of different languages for many different tasks. Of course R is also present big time (at the time of writing 426 code examples!): Rosetta Code for R.

The name of the site is inspired by the famous Rosetta Stone of Ancient Egypt which is inscribed with three different versions of the same text: in Ancient Egyptian hieroglyphs, Demotic script, and Ancient Greek script which proved invaluable in deciphering Egyptian hieroglyphs and thereby opening the window into ancient Egyptian history.

Now, a few days a ago I again added an example (for the other tasks I solved I will write more posts in the future, so stay tuned!). The task is to verify the correctness of Machin-like formulae using exact arithmetic.

A little bit of mathematical background is in order, so Wikipedia to the rescue:

Machin-like formulae are a popular technique for computing \pi to a large number of digits. They are generalizations of John Machin]s formula from 1706:

    \[\frac{\pi}{4} = 4 \arctan \frac{1}{5} - \arctan \frac{1}{239}\]

which he used to compute \pi to 100 decimal places.

Machin-like formulae have the form

    \[c_0 \frac{\pi}{4} = \sum_{n=1}^N c_n \arctan \frac{a_n}{b_n}\]

where a_n and b_n are positive integers such that a_n < b_n, c_n is a signed non-zero integer, and c_0 is a positive integer.

The exact task is to verify that the following Machin-like formulae are correct by calculating the value of tan (right hand side) for each equation using exact arithmetic and showing they equal one:

{\pi\over4} = \arctan{1\over2} + \arctan{1\over3}
{\pi\over4} = 2 \arctan{1\over3} + \arctan{1\over7}
{\pi\over4} = 4 \arctan{1\over5} - \arctan{1\over239}
{\pi\over4} = 5 \arctan{1\over7} + 2 \arctan{3\over79}
{\pi\over4} = 5 \arctan{29\over278} + 7 \arctan{3\over79}
{\pi\over4} = \arctan{1\over2} + \arctan{1\over5} + \arctan{1\over8}
{\pi\over4} = 4 \arctan{1\over5} - \arctan{1\over70} + \arctan{1\over99}
{\pi\over4} = 5 \arctan{1\over7} + 4 \arctan{1\over53} + 2 \arctan{1\over4443}
{\pi\over4} = 6 \arctan{1\over8} + 2 \arctan{1\over57} + \arctan{1\over239}
{\pi\over4} = 8 \arctan{1\over10} - \arctan{1\over239} - 4 \arctan{1\over515}
{\pi\over4} = 12 \arctan{1\over18} + 8 \arctan{1\over57} - 5 \arctan{1\over239}
{\pi\over4} = 16 \arctan{1\over21} + 3 \arctan{1\over239} + 4 \arctan{3\over1042}
{\pi\over4} = 22 \arctan{1\over28} + 2 \arctan{1\over443} - 5 \arctan{1\over1393} - 10 \arctan{1\over11018}
{\pi\over4} = 22 \arctan{1\over38} + 17 \arctan{7\over601} + 10 \arctan{7\over8149}
{\pi\over4} = 44 \arctan{1\over57} + 7 \arctan{1\over239} - 12 \arctan{1\over682} + 24 \arctan{1\over12943}

The same should be done for the last and most complicated case…

{\pi\over4} = 88 \arctan{1\over172} + 51 \arctan{1\over239} + 32 \arctan{1\over682} + 44 \arctan{1\over5357} + 68 \arctan{1\over12943}

… but it should be confirmed that the following, slightly changed, formula is incorrect by showing tan (right hand side) is not one:

{\pi\over4} = 88 \arctan{1\over172} + 51 \arctan{1\over239} + 32 \arctan{1\over682} + 44 \arctan{1\over5357} + 68 \arctan{1\over12944}

This is what I contributed to Rosetta Code:

library(Rmpfr)
prec <- 1000 # precision in bits
`%:%` <- function(e1, e2) '/'(mpfr(e1, prec), mpfr(e2, prec)) # operator %:% for high precision division
# function for checking identity of tan of expression and 1, making use of high precision division operator %:%
tanident_1 <- function(x) identical(round(tan(eval(parse(text = gsub("/", "%:%", deparse(substitute(x)))))), (prec/10)), mpfr(1, prec))
 
tanident_1( 1*atan(1/2)    +  1*atan(1/3) )
## [1] TRUE
tanident_1( 2*atan(1/3)    +  1*atan(1/7))
## [1] TRUE
tanident_1( 4*atan(1/5)    + -1*atan(1/239))
## [1] TRUE
tanident_1( 5*atan(1/7)    +  2*atan(3/79))
## [1] TRUE
tanident_1( 5*atan(29/278) +  7*atan(3/79))
## [1] TRUE
tanident_1( 1*atan(1/2)    +  1*atan(1/5)   +   1*atan(1/8) )
## [1] TRUE
tanident_1( 4*atan(1/5)    + -1*atan(1/70)  +   1*atan(1/99) )
## [1] TRUE
tanident_1( 5*atan(1/7)    +  4*atan(1/53)  +   2*atan(1/4443))
## [1] TRUE
tanident_1( 6*atan(1/8)    +  2*atan(1/57)  +   1*atan(1/239))
## [1] TRUE
tanident_1( 8*atan(1/10)   + -1*atan(1/239) +  -4*atan(1/515))
## [1] TRUE
tanident_1(12*atan(1/18)   +  8*atan(1/57)  +  -5*atan(1/239))
## [1] TRUE
tanident_1(16*atan(1/21)   +  3*atan(1/239) +   4*atan(3/1042))
## [1] TRUE
tanident_1(22*atan(1/28)   +  2*atan(1/443) +  -5*atan(1/1393) + -10*atan(1/11018))
## [1] TRUE
tanident_1(22*atan(1/38)   + 17*atan(7/601) +  10*atan(7/8149))
## [1] TRUE
tanident_1(44*atan(1/57)   +  7*atan(1/239) + -12*atan(1/682)  +  24*atan(1/12943))
## [1] TRUE

tanident_1(88*atan(1/172)  + 51*atan(1/239) +  32*atan(1/682)  +  44*atan(1/5357) + 68*atan(1/12943))
## [1] TRUE
tanident_1(88*atan(1/172)  + 51*atan(1/239) +  32*atan(1/682)  +  44*atan(1/5357) + 68*atan(1/12944))
## [1] FALSE

As you can see all statements are TRUE except for the last one!

In the code I make use of the Rmpfr package (from Martin Maechler of ETH Zürich, Switzerland) which is based on the excellent GMP (GNU Multiple Precision) library. I define a new infix operator %:% for high-precision division and after that convert all standard divisions in the formulae to high-precision divisions and calculate the tan. Before I check if the result is identical to one I round it to 100 decimal places which is more than enough given the precision of log_{10}(2^{1000})=301.03, so about 300 decimal places, in the example.

Please let me know in the comments what you think of this approach and whether you see room for improvement for the code – Thank you!