08-Nonlinear Regression

Applied Statistics – A Practical Course

Thomas Petzoldt

2025-02-08

Nonlinear regression

Many phenomena in nature are non-linear!

Linear or non-linear?

Some non-linear problems can be solved with linear methods
e.g., polynomials or periodic (sine and cosine) functions
others can be fitted by using transformations

Example

\[y = a \cdot x^b\] can be transformed to:

\[\ln(y) = \ln(a) + b \cdot \ln(x)\]

but: linearization transforms also the residuals
transformation is sometimes correct and sometimes wrong
homogeneity of variances

Linearization or direct nonlinear fitting?

Linearising transformations

log, square root, reciprocals …
can be applied if the residual variance remains (or is becoming) homogenous.
but in some cases: transformations lead to heteroscedasticity
\(\Rightarrow\) biased regression parameters.

Nonlinear regression

very flexible, user-defined functions,
no transformation required
but: requires iterative optimization methods,
in theory only local optima can be found,
parameters are not in all cases identifiable.

Nonlinear regression

Iterative search for the minimum of the sum of squares

How to find the global minimum?

Goodness of fit: Residual sum of squared differences \(RSS\):

\[ RSS = \sum_{i=1}^n\left(y_i- f(\mathbf x_i, \mathbf p)\right)^2 = \min ! \]

with \(y\): dependent var, \(\mathbf x\): matrix of independent vars,\(\mathbf p\): parameter vector, \(n\): sample size

Use of an iterative solver \(\rightarrow\) see next slides

Nonlinear coefficient of determination \(R^2\)

not related to the (linear) correlation coefficient
can be calculated from the residual and total variances

\[ R^2 = 1 - \frac{\text{variance of the residuals}}{\text{variance of the y-data}} = 1- \frac{s^2_\varepsilon}{s^2_y} \]

nonlinear \(r^2\) measures the fraction of the explained variance
… other indices can be used in addition, e.g. MSE, RMSE, NSE, …

General principle of optimization algorithms

The minimum of the squared residuals (RSS) is searched by iteration:

Start from an initial guess for the parameter set.
Try to find a parameter set with lower RSS.
Is the new parameter set better?
- No: Reject the new parameters and goto 2
- Yes: Accept the new parameters and goto 4
Is the new parameter set good enough?
- No: goto 2
- Yes: goto 5
Parameter set found.

Deterministic Methods

Gradient Descent

go one step into the direction of steepest descent
\(\rightarrow\) relatively simple
\(\rightarrow\) relatively robust for “more complicated’’ problems

Newton’s Method

quadratic approximation of the RSS function
try to estimate the minimum
\(\rightarrow\) takes curvature into account
\(\rightarrow\) faster for “well behaving” problems
several versions:
quasi-Newton, Gauss-Newton, L-BFGS, …

Levenberg-Marquardt

interpolates between Gauss-Newton and gradient descent.

Newton method (red) uses curvature information to converge faster than gradient descent (green).
Source: Wikipedia.

Stochastic Methods

Use statistical principles (random search)

Classical methods

Simulated annealing: simulates heating and cooling of matter \(\rightarrow\) “Crystallisation process”.
Controlled random search Price (1983)

Evolutionary Algorithms

analogy to genetics: mutation and selection
follows a “population” of several parameter sets in parallel
information exchange (“crossover”) between parameter sets
\(\rightarrow\) for complicated problems with large number of parameters
nowadays builtin in Microsoft Excel and LibreOffice Calc

… and many more.

Examples

Enzyme kinetics
Growth of organisms
Calibration of complex models
- in chemistry, biology, engineering, finance business and social sciences
- water: hydrology, hydrophysics, groundwater, wastewater, water quality …

Enzyme Kinetics

… can be described with the well-known Michaelis-Menten function:

\[ v = v_{max} \frac{S}{k_m + S} \]

Linearization vs. (true) nonlinear regression

Linearizing transformation

[>] Appropriate if transformation improves homogeneity of variances
[+] Fast, simple and easy.
[+] Analytical solution returns the global optimum.
[-] Only a limited set of functions can be fitted.
[-] Can lead to wrongly transformed error structure and biased results.

Nonlinear Regression

[>] Appropriate if error structure is already homogeneous and/or analytical solution does not exist.
[+] Can be used to fit arbitrary functions, given that the parameters are identifiable.
[-] Needs start values and considerable computation time.
[-] Best solution (global optimum) is not guaranteed.

Nonlinear regression in R: simple exponential

Fit model

# example data
x <- 1:10
y <- c(1.6, 1.8, 2.1, 2.8, 3.5, 
       4.1, 5.1, 5.8, 7.1, 9.0)

# initial parameters for the optimizer
pstart <- c(a = 1, b = 1)

# nonlinear least squares
fit <- nls(y ~ a * exp(b * x), start = pstart)
summary(fit)


Formula: y ~ a * exp(b * x)

Parameters:
  Estimate Std. Error t value Pr(>|t|)    
a 1.263586   0.049902   25.32 6.34e-09 ***
b 0.194659   0.004716   41.27 1.31e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1525 on 8 degrees of freedom

Number of iterations to convergence: 13 
Achieved convergence tolerance: 5.956e-08

Plot result

# additional x-values to get a smooth curve
x1 <- seq(1, 10, 0.1)
y1 <- predict(fit, data.frame(x = x1))
plot(x, y)
lines(x1, y1, col = "red")

Fitted parameters


Formula: y ~ a * exp(b * x)

Parameters:
  Estimate Std. Error t value Pr(>|t|)    
a 1.263586   0.049902   25.32 6.34e-09 ***
b 0.194659   0.004716   41.27 1.31e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1525 on 8 degrees of freedom

Number of iterations to convergence: 13 
Achieved convergence tolerance: 5.956e-08

Estimate”: the fitted parameters
Std. error: \(s_{\bar{x}}\): indicates reliability of the estimate
t- and p-values: no over-interpretation!
in the non-linear world, “non-significant” parameters can be structurally necessary.

Coefficient of determination \(r^2 = 1-\frac{s^2_\varepsilon}{s^2_y}\)

(Rsquared <- 1 - var(residuals(fit))/var(y))

[1] 0.9965644

Michaelis-Menten-Kinetics

Code

S <-c(25, 25, 10, 10, 5, 5, 2.5, 2.5, 1.25, 1.25)
V <-c(0.0998, 0.0948, 0.076, 0.0724, 0.0557,
      0.0575, 0.0399, 0.0381, 0.017, 0.0253)

## Michaelis-Menten kinetics
f <- function(S, Vm, K) { Vm * S/(K + S) }

pstart <- c(Vm = 0.1, K = 5)
model_fit <- nls(V ~ f(S, Vm, K), start = pstart)
summary(model_fit)

plot(S, V, xlim = c(0, max(S)), ylim = c(0, max(V)))
x1 <-seq(0, 25, length = 100)
lines(x1, predict(model_fit, data.frame(S = x1)), col = "red")

Coefficient of determination

(1 - var(residuals(model_fit))/var(V))

[1] 0.9895147

Results


Formula: V ~ f(S, Vm, K)

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
Vm  0.11713    0.00381   30.74 1.36e-09 ***
K   5.38277    0.46780   11.51 2.95e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.003053 on 8 degrees of freedom

Number of iterations to convergence: 3 
Achieved convergence tolerance: 6.678e-06

Michaelis-Menten-Kinetics with selfstart

Function SSmicmen determines start parameters automatically.
Only few selfstart functions available in R

Code

S <- c(25, 25, 10, 10, 5, 5, 2.5, 2.5, 1.25, 1.25)
V <- c(0.0998, 0.0948, 0.076, 0.0724, 0.0557,
       0.0575, 0.0399, 0.0381, 0.017, 0.0253)

model_fit <- nls(V ~ SSmicmen(S, Vm, K))

plot(S, V, xlim = c(0, max(S)), ylim = c(0, max(V)))
x1 <-seq(0, 25, length = 100)
lines(x1, predict(model_fit, data.frame(S = x1)), col="red")

Results

summary(model_fit, correlation=TRUE)


Formula: V ~ f(S, Vm, K)

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
Vm  0.11713    0.00381   30.74 1.36e-09 ***
K   5.38277    0.46780   11.51 2.95e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.003053 on 8 degrees of freedom

Correlation of Parameter Estimates:
  Vm  
K 0.88

Number of iterations to convergence: 3 
Achieved convergence tolerance: 6.678e-06

Plot

Note: Correlation of parameters

high absolute values of correlation indicate non-identifiability of parameters
critical value depends on the data
sometimes, better start values or another optimization algorithm can help

Practical hints

plot data
find good starting values by thinking about it or by trial and error
avoid very small and/or very large numbers
\(\longrightarrow\) rescale the problem to values between approx 0.001 to 1000
start with a simple function and add terms and parameters sequentially
Don’t take significance of parameters too seriously. A non-significant parameter may be necessary for the structure of the model, removal of it will invalidate the whole model.

Appendix

Lineweaver-Burk transformation vs. nonlinear fit

Code of the method comparison slide

S <-c(25, 25, 10, 10, 5, 5, 2.5, 2.5, 1.25, 1.25)
V <-c(0.0998, 0.0948, 0.076, 0.0724, 0.0557, 
      0.0575, 0.0399, 0.0381, 0.017, 0.0253)
model_fit<-nls(V ~ SSmicmen(S, Vm, K))

par(mfrow=c(1,2), las=1, lwd=2)
## Lineweaver-Burk
x <- 1/S; y <- 1/V

plot(x, y, xlab="1/S", ylab="1/V", xlim=c(-0.2,1), ylim=c(0, 80), pch=16,
  main="Linearisation\n(Lineweaver-Burk)")
abline(h=0, lwd=1, col="grey")
abline(v=0, lwd=1, col="grey")
m <- lm(y ~ x)
abline(m, col = "red")
Vm_l <- 1/coef(m)[1]
Km_l <- coef(m)[2] * Vm_l
#legend("topleft", c("vmax = 1/intercept", "km = slope * vmax"), 
#        box.lwd=1, bg="white")
text(0.35, 75, "1/V = 1/vmax + km / vmax * S")

## retransformed, both together
plot(S, V, xlim = c(0, max(S)),ylim=c(0, max(V)), pch=16, main="No Transformation")
x1 <-seq(0, 25, length=100)
lines(x1, Vm_l * x1 / (Km_l + x1), col="red")
lines(x1, predict(model_fit, data.frame(S=x1)), col="blue")
legend("bottomright", legend=c("linearisation", "nonlinear"), 
       box.lwd=1, lwd=2, col=c("red", "blue"))
text(15, 0.04, "V = S * vmax / (km + S)")

See: https://en.wikipedia.org/wiki/Lineweaver-Burk_plot

References

Price, W. L. (1977). A controlled random search procedure for global optimization. The Computer Journal, 20(4), 367–370.

Price, W. L. (1983). Global optimization by controlled random search. Journal of Optimization Theory and Applications, 40(3), 333–348.

Soetaert, K., & Petzoldt, T. (2010). Inverse modelling, sensitivity and monte carlo analysis in R using package FME. Journal of Statistical Software, 33(3), 1–28. https://doi.org/10.18637/jss.v033.i03

08-Nonlinear Regression

Nonlinear regression

Linearization or direct nonlinear fitting?

Nonlinear regression

How to find the global minimum?

General principle of optimization algorithms

Deterministic Methods

Stochastic Methods

Examples

Enzyme Kinetics

Linearization vs. (true) nonlinear regression

Nonlinear regression in R: simple exponential

Fitted parameters

Michaelis-Menten-Kinetics

Michaelis-Menten-Kinetics with selfstart

Practical hints

Further reading

Appendix

Lineweaver-Burk transformation vs. nonlinear fit

References