An individual-based model of a flow-through system

Introduction

Note: this is an interactive document. Please use the Hide resp. Code buttons to fold or unfold the code of individual code cunks or of the complete document. The content is best viewed with Chrome or Firefox. The code is written in the R programming language for statistical computing (R Core Team 2018).

General approach

Let’s assume the following system of ordinary differential equations for a flow-through bioreactor (chemostat, cf. Novick and Szilard 1950; Herbert, Elsworth, and Telling 1956), where \(X\) is the concentration of organisms (e.g. cells/volume) in the system growing on a limited resource \(S\) (e.g. nutrient/volume). The instantanaeous growth rate \(r\) (1/time) depends on a maximum growth rate \(r_{max}\) and the resource concentration \(S\) via a Monod equation, with half-saturatio constant \(k_s\). Fresh medium with resource concentration \(S_0\) is imported with a dilution rate \(D\) that controls also export of cells and of remaining medium. A stoichiometric coefficient (\(Y\), yield) converts between concentration units of cells and nutrients:

\[ \frac{dX}{dt} = (r - D) \cdot X\\ \frac{dS}{dt} = D \cdot (S_0-S) - \frac{S}{Y}\\ \text{with:}\\ r = r_{max} \cdot \frac{S}{k_s+S} \] The above system of equations applies for a continuous system, but there may be reasons to adapt (or approximate) such a model to a discrete-time step and a discrete number of individuals. One reason may be a relatively small number of organisms (e.g. a few hundred rotifers instead of 10^8 cells of bacteria) or the nesseccity to describe individual variation (e.g. age, size, cell content) of the organisms.

This can be done with a so-called individual-based model, where the equations of such a system are, in principle, quite similar. Its main differences are:

the population is represented by a countable number (\(N\)) of individuals instead of a state variable \(X\),
the model is solved as a stochastic process where probabilities are used instead of rate parameters (1/time),
an event-like fixed time step scheme is used instead of a continuous integrator.

Plain implementation

The following example implements a very basic individual-based approach. The population of individuals is implemented as a data frame (inds) with, in this example, only one single column (age). Substrate \(S\) is implemented as a non-individual-based pool variable, the parameters are in a vector (parms).

All life-actvities are carried out in one live-function. Mortality and cell division are realized by subsetting randomly selected surviving individuals from the population, or by adding copies of randomly selected “dividing cells” to the population. The age of divided cells is set to zero for both daughters of each parent cell.

Mortality and substrate input depend on \(D\), while cell division depends on the substrate concentration. A safeguard check prevents \(S\) for becoming negative and a control parameter DELTAT allows an adaption of the time step.

The solution of the model is then straightforward by iterating the live-function in a for-loop.

live <- function(time, inds, S, parms, DELTAT){
  inds$age    <- inds$age + DELTAT

  with(parms, {

    ## dilute
    N    <- nrow(inds)
    rnd  <- runif(N)
    inds <- subset(inds, D * DELTAT < rnd)
    S <- S + D * (S0 - S) * DELTAT

    ## cell division
    N    <- nrow(inds)
    r <- rmax * S / (ks + S)
    rnd <- runif(N)
    ndx <- (rnd < r * DELTAT)
    newinds <- subset(inds, ndx)

    # set age of new individuals to zero
    if (nrow(newinds) > 0) newinds$age <- 0

    ## resource consumption
    dN <- nrow(newinds)
    dS <- - 1/Y * dN

    ## safeguard: divide only if resource remains positive
    if (S >= dS) {
      inds$age[ndx] <- 0            # set age of divided cells to zero
      S <- S + dS                   # update resource
      inds <- rbind(inds, newinds)  # add new individuals to population
    }
    list(inds=inds, S=S)
  })
}

inds  <- data.frame(age=rep(0, 1000))
S     <- 1000
parms <- list(rmax = 0.5, ks = 500, D = .1, S0 = 1000,Y = 1)
DELTAT <- 0.1
times <- seq(0, 100, DELTAT)

o <- data.frame(time=times[1], N=nrow(inds), S=S, maxage=max(inds$age))

for (i in 2:length(times)) {
  state <- live(times[i], inds, S, parms, DELTAT)
  inds <- state$inds
  S    <- state$S
  ret <- c(time = times[i], N=nrow(inds), S=S, maxage=max(inds$age))
  o <- rbind(o, ret)
}

The simulation shows then a typical time series. The curves show a stochastic pattern, that can be reduced if the sample size (the number of individuals) is increased to, e.g. 10.000 or more. The yield is set to \(Y=1\) to make comparisons easier.

matplot(o$time, o[c("N", "S", "maxage")], xlab="time", ylab="N, S", type="l")

head(o)

##   time    N         S maxage
## 1  0.0 1000 1000.0000    0.0
## 2  0.1 1025  963.0000    0.1
## 3  0.2 1054  927.3700    0.2
## 4  0.3 1077  896.0963    0.3
## 5  0.4 1100  863.1353    0.4
## 6  0.5 1120  829.5040    0.5

Object orientation with simecol

The above implementation is extremely simple and may already serve as a starting point for extensions. However, almost all variables are held in global work space, which can become cumbersome and confusing when the application grows.

Package simecol (Petzoldt and Rinke 2007) suggests an object oriented programming (OOP) approach for such cases. In contrast to other OOP approaches for IBMs that model individuals as objects in their habitat, in an ecosystem and so forth, simecol “models the components of a simulation model”. This is quite pragmatic and aims to make comparison of different simulations (scenarios, instances) easier. Details of the approach are found in Petzoldt and Rinke (2007).

In the example below, all different processes of the live-loop are subdivided in separated functions in the equations-slot of the object that are then called from the main function. Parameters (parms), the init-values of the state variables (a list with inds the table of individuals and S the substrate pool), and the times are encapsulated in the object. The solver is set to iteration that is built-in in the package.

library(simecol)

ibm_test <- new("indbasedModel",
  main = function(time, obj, parms) {
    #if (parms$D > max(parms$r, parms$DELTAT)) stop("time step to large") # CFL criterion
    obj <- live(obj, parms)
    obj <- dilute(obj, parms)
    obj <- divide(obj, parms)
    obj
  },
  equations = list(
    newbact = function(n) {
      if (n > 0) {
        data.frame(age = rep(0, n))
      } else {
        NULL
      }
    },
    live = function(obj, parms){
      obj$inds$age    <- obj$inds$age + parms$DELTAT
      obj
    },
    dilute  = function(obj, parms) {
      S    <- obj$S
      inds <- obj$inds
      N    <- nrow(inds)

      with(parms, {
        rnd <- runif(N)
        inds <- subset(inds, D * DELTAT < rnd)
        S <- S + D * (S0 - S) * DELTAT
        list(inds=inds, S=S)
      })
    },
    divide = function(obj, parms) {
      inds <- obj$inds
      N    <- nrow(inds)
      S    <- obj$S
      with(parms, {
        r <- rmax * S / (ks + S)

        ## cell division
        rnd <- runif(N)
        ndx <- (rnd < r * DELTAT)
        newinds <- subset(inds, ndx)

        # set age of new individuals to zero
        if (nrow(newinds) > 0) newinds$age <- 0

        ## resource consumption
        dN <- nrow(newinds)
        dS <- - 1/Y * dN

        ## safeguard: divide only if resource remains positive
        if (S >= dS) {
          if(nrow(inds)) {
            inds$age[ndx] <- 0            # set age of divided cells to zero
            inds <- rbind(inds, newinds)  # add new individuals to population
          }
          S <- S + dS                   # update resource
        }

        list(inds=inds, S=S)
      })
    }
  ),
  parms = list(
    rmax = 0.5,
    ks = 500,
    D = .1,
    S0 = 1000,
    Y = 1
  ),
  init = list(inds = data.frame(age=rep(0, 1000)), S = 1000),
  times = c(from=0, to=100, by=.1),
  solver = "iteration"
)

Simulation

The simulation of the model can then, in principle, be triggered by:

ibm_test <- sim(ibm_test)
o <- out(ibm_test)

Observer function

In this case, the complete set of state variables (i.e. S and the complete data frame of inds) ist stored for each time step, so that ibm_test contains both, the original model (with equation and parameters) and all simulation results, that can be extracted with out(). It is then up to the user to analyze the results from out.

The analysis of results can already be made during the simulation by providing an optional observer-function. It gets the full state information and aggregates the required parts, and it can also be used for printing status information or for animated graphics during the simulation.

observer(ibm_test) <- function(state, time, i, out, y) {
  S <- state$S
  N <- nrow(state$inds)
  age <- max(state$inds$age)

  ## for debugging
  #if ((i %% 20) == 0) cat(i, "S=", S, "N=", N, "\n")

  if (N > 1e5) stop("N > 1e6\n")
  if (S <0 ) stop ("S < 0")

  c(time=time, N=N, S=S, age=age)
}

The simulation can then be run with the sim-function:

ibm_test <- sim(ibm_test)
o <- out(ibm_test)
matplot(o$time, o[c("N", "S", "age")], xlab="time", ylab="N, S", type="l")

Repeated simulations

As the IBM is a stochastic model, it would make sense to repeat it several times. The following figure shows four independent simulation runs:

par(mfrow=c(2,2), mar=c(4,3,1,1)+0.1)
ret <- lapply(1:4, FUN=function(x) cbind(run=x, out(sim(ibm_test))))
ret <- lapply(ret, function(o) matplot(o$time, o[c("N", "S")], xlab="time", ylab="N, S", type="l"))

… and the next one shows the bandwidth of 50 simulations.

library(plyr)
library(ggplot2)
ret <- lapply(1:50, FUN=function(x) cbind(run=x, out(sim(ibm_test))))
longtab <- rbind.fill(ret)

ggplot(longtab, aes(time, N)) + geom_bin2d(binwidth=c(1,10))

ggplot(longtab, aes(time, S)) + geom_bin2d(binwidth=c(1,10))

Here, lapplyreturns a list of 100 data frames with the run number in the first column. Function rbind.fill (from package plyr) then concatenates all these data frames together into tidy database format longtab that can then be plotted with one of the ggplot functions.

Equilibrium

Finally, let’s run the IBM over a dilution (\(D\)) rate gradient with each 10 replicates and assume that the state at the final time step (\(t=100\)) is close to the equilibrium of the chemostat. To speed up, function parLapply from package parallel is employed, so that the simulations run simultanaeously on the 4 CPU cores. This is, in principle, straightforward, one has only to make sure that all required functions, data, and packages (here simecol) are transferred or loaded to all respective worker R instances.

library(parallel)

observer(ibm_test) <- function(state, time, i, out, y) {
  S <- state$S
  N <- nrow(state$inds)
  c(time=time, N=N, S=S)
}

D <- rep(seq(0, 0.4, 0.01), each=10)
cl <- makeCluster(getOption("cl.cores", 4))

ret <- parLapply(
  cl,
  1:length(D),
  function(i, D, ibm) {
    require(simecol)
    parms(ibm)["D"] <- D[i]
    cbind(run=i, D=D[i], out(sim(ibm)))
  },
  D=D, ibm=ibm_test
)
stopCluster(cl)

longtab <- rbind.fill(ret)
steady <- longtab[longtab$time == max(longtab$time), ]

ggplot(steady, aes(D, N)) + geom_point()

ggplot(steady, aes(D, S)) + geom_point()

ggplot(steady, aes(D, D*N)) + geom_point()

The resulting pattern looks similar to the equilibrium diagram of the deterministic simulation above.

Outlook

The chemostat IBM so far shows the general approach. The interesting part will start, if the individuals get additional properties (e.g. specific substances) or behaviour (e.g. internal clock, resting stages), or if other populations (e.g. grazers) are introduced.

In case of interest and reasonable validation data, please let me know. Constructive comments are welcome.

References

Herbert, Denis, R Elsworth, and RC Telling. 1956. “The Continuous Culture of Bacteria; a Theoretical and Experimental Study.” Microbiology 14 (3). Microbiology Society: 601–22.

Novick, Aaron, and Leo Szilard. 1950. “Description of the Chemostat.” Science 112 (2920). American Association for the Advancement of Science: 715–16.

Petzoldt, Thomas, and Karsten Rinke. 2007. “simecol: An Object-Oriented Framework for Ecological Modeling in R.” Journal of Statistical Software 22 (9): 1–31.

R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.