sample 1: 3 3 2 4 1 mean: 2.6
sample 2: 6 1 1 6 1 mean: 3
sample 3: 6 5 6 6 5 mean: 5.6
Applied Statistics – A Practical Course
2025-09-16
Descriptive Research
Experimental Research
Strong inference requires clear hypothesis and experimental research.
Weak inference derived from observations and data.
\(\rightarrow\) descriptive research delivers the data for creating the hypotheses.
Attributed to an English philosopher from the 14th century (“Occams razor”)
When you have two competing theories that make exactly the same predictions, the simpler one is the better.
In the context ofstatistical analysis and modeling:
One of the most important scientific principles
\(\rightarrow\) But nature is complex, over-simplification has to be avoided.
y = a + b \(\cdot\) x
variables: everything that is measured or experimentally manipulated, e.g phosphorus concentration in a lake, air temperature, or abundance of animals.
parameters: values that are estimated by a statistical model, e.g. mean, standard deviation, slope of a linear model.
Independent variables (explanation variables, predictors)
Dependent variables (response variables, target variables, predicted variables)
The “level” of variables increases from binary to ratio scale. It is always possible to convert a higher to a lower level.
The “level” of variables increases from binary to ratio scale. It is always possible to convert a higher to a lower level scale:
Transformation to a lower scale results in a certain amount of information loss, but allows to use additional methods from the lower-level scale.
Explanation: If we apply rank correlation to metric data, we essentially apply a method for the ordinal scale to metric data. In this case, we loose information about the differences between the values, but also decrease influence of extreme values and outliers.
Transformation from metric to binary can be useful, if the metric data are not precise enough. So for example, counting animals (e.g. wolves) in a certain area may depend on too many factors (structure of the landscape, experience of people, season etc.) so that the exact numbers (abundances) are questionable. In such cases, transformation to a binary scale (present/absent) and using a respective test (e.g. logistic regression or Fisher’s exact test) will be more reliable.
Other examples are the comparison of floods between different rivers, e.g. a large and a small ones, or occurrences of genes in a molecular biological analysis.
Classical definition
\[ p = \frac{\text{number of selected cases}}{\text{number of all possible cases}} \]
Axiomatic definition
Sample
Subjects, from which we have measurements or observations
Population
Set of all subjects that had the same chance to become part of the sample.
\(\Rightarrow\) The population is defined by the way how samples are taken
\(\Rightarrow\) Samples should be representative for our intended observational subject.
Random sampling
Stratified sampling
The population is subdivided into classes of similar subjects (strata).
The strata are separately analysed and then the the information is weighted and combined to infer about the population.
Stratified sampling requires information about the size and representativity of the strata.
Examples: election forecasts, depth layers in a lake, age classes for animals.
Random errors
Systematic Errors also called bias
“True” parameters of the population
“Calculated” parameters from a sample
A single measurement \(x_i\) of a random variable \(X\) can be written as the sum of the expected value \(\mathbf{E}(X)\) of the random variable and a random error \(\varepsilon_i\).
\[\begin{align} x_i &= \mathbf{E}(X) + \varepsilon_i\\ \mathbf{E}(\varepsilon)&=0 \end{align}\]
Example:
Example: 3 people with 5 trials:
sample 1: 3 3 2 4 1 mean: 2.6
sample 2: 6 1 1 6 1 mean: 3
sample 3: 6 5 6 6 5 mean: 5.6
Overall mean: \(\bar{x} = 3.73\) is close to \(\mu = 3.5\).