Note: this is an interactive document. Please use the Hide
resp. Code
buttons to fold or unfold the code of individual code cunks or of the complete document. The content is best viewed with Chrome or Firefox. The code is written in the R programming language for statistical computing (R Core Team 2018).
Let’s assume the following system of ordinary differential equations for a flow-through bioreactor (chemostat, cf. Novick and Szilard 1950; Herbert, Elsworth, and Telling 1956), where \(X\) is the concentration of organisms (e.g. cells/volume) in the system growing on a limited resource \(S\) (e.g. nutrient/volume). The instantanaeous growth rate \(r\) (1/time) depends on a maximum growth rate \(r_{max}\) and the resource concentration \(S\) via a Monod equation, with half-saturatio constant \(k_s\). Fresh medium with resource concentration \(S_0\) is imported with a dilution rate \(D\) that controls also export of cells and of remaining medium. A stoichiometric coefficient (\(Y\), yield) converts between concentration units of cells and nutrients:
\[ \frac{dX}{dt} = (r - D) \cdot X\\ \frac{dS}{dt} = D \cdot (S_0-S) - \frac{S}{Y}\\ \text{with:}\\ r = r_{max} \cdot \frac{S}{k_s+S} \] The above system of equations applies for a continuous system, but there may be reasons to adapt (or approximate) such a model to a discrete-time step and a discrete number of individuals. One reason may be a relatively small number of organisms (e.g. a few hundred rotifers instead of 10^8 cells of bacteria) or the nesseccity to describe individual variation (e.g. age, size, cell content) of the organisms.
This can be done with a so-called individual-based model, where the equations of such a system are, in principle, quite similar. Its main differences are:
The following example implements a very basic individual-based approach. The population of individuals is implemented as a data frame (inds
) with, in this example, only one single column (age
). Substrate \(S\) is implemented as a non-individual-based pool variable, the parameters are in a vector (parms
).
All life-actvities are carried out in one live
-function. Mortality and cell division are realized by subsetting randomly selected surviving individuals from the population, or by adding copies of randomly selected “dividing cells” to the population. The age
of divided cells is set to zero for both daughters of each parent cell.
Mortality and substrate input depend on \(D\), while cell division depends on the substrate concentration. A safeguard check prevents \(S\) for becoming negative and a control parameter DELTAT
allows an adaption of the time step.
The solution of the model is then straightforward by iterating the live
-function in a for-loop.
live <- function(time, inds, S, parms, DELTAT){
inds$age <- inds$age + DELTAT
with(parms, {
## dilute
N <- nrow(inds)
rnd <- runif(N)
inds <- subset(inds, D * DELTAT < rnd)
S <- S + D * (S0 - S) * DELTAT
## cell division
N <- nrow(inds)
r <- rmax * S / (ks + S)
rnd <- runif(N)
ndx <- (rnd < r * DELTAT)
newinds <- subset(inds, ndx)
# set age of new individuals to zero
if (nrow(newinds) > 0) newinds$age <- 0
## resource consumption
dN <- nrow(newinds)
dS <- - 1/Y * dN
## safeguard: divide only if resource remains positive
if (S >= dS) {
inds$age[ndx] <- 0 # set age of divided cells to zero
S <- S + dS # update resource
inds <- rbind(inds, newinds) # add new individuals to population
}
list(inds=inds, S=S)
})
}
inds <- data.frame(age=rep(0, 1000))
S <- 1000
parms <- list(rmax = 0.5, ks = 500, D = .1, S0 = 1000,Y = 1)
DELTAT <- 0.1
times <- seq(0, 100, DELTAT)
o <- data.frame(time=times[1], N=nrow(inds), S=S, maxage=max(inds$age))
for (i in 2:length(times)) {
state <- live(times[i], inds, S, parms, DELTAT)
inds <- state$inds
S <- state$S
ret <- c(time = times[i], N=nrow(inds), S=S, maxage=max(inds$age))
o <- rbind(o, ret)
}
The simulation shows then a typical time series. The curves show a stochastic pattern, that can be reduced if the sample size (the number of individuals) is increased to, e.g. 10.000 or more. The yield is set to \(Y=1\) to make comparisons easier.
matplot(o$time, o[c("N", "S", "maxage")], xlab="time", ylab="N, S", type="l")
head(o)
## time N S maxage
## 1 0.0 1000 1000.0000 0.0
## 2 0.1 1025 963.0000 0.1
## 3 0.2 1054 927.3700 0.2
## 4 0.3 1077 896.0963 0.3
## 5 0.4 1100 863.1353 0.4
## 6 0.5 1120 829.5040 0.5
The above implementation is extremely simple and may already serve as a starting point for extensions. However, almost all variables are held in global work space, which can become cumbersome and confusing when the application grows.
Package simecol (Petzoldt and Rinke 2007) suggests an object oriented programming (OOP) approach for such cases. In contrast to other OOP approaches for IBMs that model individuals as objects in their habitat, in an ecosystem and so forth, simecol “models the components of a simulation model”. This is quite pragmatic and aims to make comparison of different simulations (scenarios, instances) easier. Details of the approach are found in Petzoldt and Rinke (2007).
In the example below, all different processes of the live-loop are subdivided in separated functions in the equations
-slot of the object that are then called from the main
function. Parameters (parms
), the init
-values of the state variables (a list with inds
the table of individuals and S
the substrate pool), and the times
are encapsulated in the object. The solver is set to iteration
that is built-in in the package.
library(simecol)
ibm_test <- new("indbasedModel",
main = function(time, obj, parms) {
#if (parms$D > max(parms$r, parms$DELTAT)) stop("time step to large") # CFL criterion
obj <- live(obj, parms)
obj <- dilute(obj, parms)
obj <- divide(obj, parms)
obj
},
equations = list(
newbact = function(n) {
if (n > 0) {
data.frame(age = rep(0, n))
} else {
NULL
}
},
live = function(obj, parms){
obj$inds$age <- obj$inds$age + parms$DELTAT
obj
},
dilute = function(obj, parms) {
S <- obj$S
inds <- obj$inds
N <- nrow(inds)
with(parms, {
rnd <- runif(N)
inds <- subset(inds, D * DELTAT < rnd)
S <- S + D * (S0 - S) * DELTAT
list(inds=inds, S=S)
})
},
divide = function(obj, parms) {
inds <- obj$inds
N <- nrow(inds)
S <- obj$S
with(parms, {
r <- rmax * S / (ks + S)
## cell division
rnd <- runif(N)
ndx <- (rnd < r * DELTAT)
newinds <- subset(inds, ndx)
# set age of new individuals to zero
if (nrow(newinds) > 0) newinds$age <- 0
## resource consumption
dN <- nrow(newinds)
dS <- - 1/Y * dN
## safeguard: divide only if resource remains positive
if (S >= dS) {
if(nrow(inds)) {
inds$age[ndx] <- 0 # set age of divided cells to zero
inds <- rbind(inds, newinds) # add new individuals to population
}
S <- S + dS # update resource
}
list(inds=inds, S=S)
})
}
),
parms = list(
rmax = 0.5,
ks = 500,
D = .1,
S0 = 1000,
Y = 1
),
init = list(inds = data.frame(age=rep(0, 1000)), S = 1000),
times = c(from=0, to=100, by=.1),
solver = "iteration"
)
The simulation of the model can then, in principle, be triggered by:
ibm_test <- sim(ibm_test)
o <- out(ibm_test)
In this case, the complete set of state variables (i.e. S and the complete data frame of inds) ist stored for each time step, so that ibm_test contains both, the original model (with equation and parameters) and all simulation results, that can be extracted with out()
. It is then up to the user to analyze the results from out
.
The analysis of results can already be made during the simulation by providing an optional observer
-function. It gets the full state information and aggregates the required parts, and it can also be used for printing status information or for animated graphics during the simulation.
observer(ibm_test) <- function(state, time, i, out, y) {
S <- state$S
N <- nrow(state$inds)
age <- max(state$inds$age)
## for debugging
#if ((i %% 20) == 0) cat(i, "S=", S, "N=", N, "\n")
if (N > 1e5) stop("N > 1e6\n")
if (S <0 ) stop ("S < 0")
c(time=time, N=N, S=S, age=age)
}
The simulation can then be run with the sim
-function:
ibm_test <- sim(ibm_test)
o <- out(ibm_test)
matplot(o$time, o[c("N", "S", "age")], xlab="time", ylab="N, S", type="l")
As the IBM is a stochastic model, it would make sense to repeat it several times. The following figure shows four independent simulation runs:
par(mfrow=c(2,2), mar=c(4,3,1,1)+0.1)
ret <- lapply(1:4, FUN=function(x) cbind(run=x, out(sim(ibm_test))))
ret <- lapply(ret, function(o) matplot(o$time, o[c("N", "S")], xlab="time", ylab="N, S", type="l"))
… and the next one shows the bandwidth of 50 simulations.
library(plyr)
library(ggplot2)
ret <- lapply(1:50, FUN=function(x) cbind(run=x, out(sim(ibm_test))))
longtab <- rbind.fill(ret)
ggplot(longtab, aes(time, N)) + geom_bin2d(binwidth=c(1,10))
ggplot(longtab, aes(time, S)) + geom_bin2d(binwidth=c(1,10))
Here, lapply
returns a list of 100 data frames with the run number in the first column. Function rbind.fill
(from package plyr) then concatenates all these data frames together into tidy database format longtab
that can then be plotted with one of the ggplot functions.
Finally, let’s run the IBM over a dilution (\(D\)) rate gradient with each 10 replicates and assume that the state at the final time step (\(t=100\)) is close to the equilibrium of the chemostat. To speed up, function parLapply
from package parallel is employed, so that the simulations run simultanaeously on the 4 CPU cores. This is, in principle, straightforward, one has only to make sure that all required functions, data, and packages (here simecol) are transferred or loaded to all respective worker R instances.
library(parallel)
observer(ibm_test) <- function(state, time, i, out, y) {
S <- state$S
N <- nrow(state$inds)
c(time=time, N=N, S=S)
}
D <- rep(seq(0, 0.4, 0.01), each=10)
cl <- makeCluster(getOption("cl.cores", 4))
ret <- parLapply(
cl,
1:length(D),
function(i, D, ibm) {
require(simecol)
parms(ibm)["D"] <- D[i]
cbind(run=i, D=D[i], out(sim(ibm)))
},
D=D, ibm=ibm_test
)
stopCluster(cl)
longtab <- rbind.fill(ret)
steady <- longtab[longtab$time == max(longtab$time), ]
ggplot(steady, aes(D, N)) + geom_point()
ggplot(steady, aes(D, S)) + geom_point()
ggplot(steady, aes(D, D*N)) + geom_point()
The resulting pattern looks similar to the equilibrium diagram of the deterministic simulation above.
The chemostat IBM so far shows the general approach. The interesting part will start, if the individuals get additional properties (e.g. specific substances) or behaviour (e.g. internal clock, resting stages), or if other populations (e.g. grazers) are introduced.
In case of interest and reasonable validation data, please let me know. Constructive comments are welcome.
Herbert, Denis, R Elsworth, and RC Telling. 1956. “The Continuous Culture of Bacteria; a Theoretical and Experimental Study.” Microbiology 14 (3). Microbiology Society: 601–22.
Novick, Aaron, and Leo Szilard. 1950. “Description of the Chemostat.” Science 112 (2920). American Association for the Advancement of Science: 715–16.
Petzoldt, Thomas, and Karsten Rinke. 2007. “simecol: An Object-Oriented Framework for Ecological Modeling in R.” Journal of Statistical Software 22 (9): 1–31.
R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Copyright and original author: tpetzoldt, 2019-02-06