Overview
| Module code | BIO-06.61-035 |
| Instructors | Dr. Saskia Otto, Dr. Monika Eberhard |
| Prerequisites | Data Science 1 and Data Science 2 |
| License | CC-BY-SA 4.0 International |
The third module moves from analysing smaller datasets to the exploratory analysis of larger data collections. You learn advanced statistical modelling methods and get an introduction to machine learning – both supervised approaches (e.g. multiple regression, resampling) and unsupervised methods (e.g. cluster analysis, PCA).
Learning Objectives
After completing this module, students can:
- describe fundamental concepts of exploratory data analysis and machine learning
- apply advanced linear models (multi-factor ANOVA, ANCOVA, mixed models)
- conduct and interpret multiple linear regressions
- use resampling techniques (bootstrapping, permutation tests)
- apply unsupervised learning methods (cluster analysis, PCA)
- choose appropriate methods for biological research questions
- apply fundamentals of Open Science and document and communicate scientific results using R Markdown or Quarto
Vorlesungsfolien (WiSe 2025/2026)
Die interaktiven HTML-Vorlesungsfolien wurden von Saskia Otto mit Quarto revealjs erstellt. Beim Betrachten der Präsentation ermöglichen folgende Tastaturkombinationen unterschiedliche Anzeigemodi:
- o zeigt den Übersichtsmodus an
- w wechselt in den Breitbandmodus
- f wechselt in den Vollbildmodus
- h erlaubt das Hervorheben von Code
- ctrl (Windows) bzw. cmd (Mac) UND + / - zum rein- und rauszoomen
- p öffnet ein Pop-up Fenster für zusätzliche Informationen (funktioniert allerdings nicht bei Safari)
- mit esc kann wieder in den normalen Modus gewechselt werden.
Lizenz der Vorlesungsfolien
Diese Arbeit ist lizenziert unter einer Creative Commons Attribution-ShareAlike 4.0 International License mit Ausnahme der entliehenen und mit Quellenangabe versehenen Abbildungen.
Accompanying Learning Materials
- Moodle course: UHH MIN Login
- RStudio Server/Posit Workbench of the Department of Biology: the URL is provided via the Moodle course (login credentials are sent by email)
- RStudio Server via JupyterHub of the MIN Faculty: https://code.min.uni-hamburg.de/hub/ (access via BAN credentials)
- swirl courses: DSBswirl – interactive exercises in R (DSB-05, DSB-06)
- Cheatsheets: Reference cards on statistics with R, LaTeX formulas, and Markdown
- Case studies: Showcases from the course
- Open Science templates: UHHformats, UHHthesis, SCIproj
Book Recommendations
- German:
- Bärlocher, F. (1999): Biostatistik – Praktische Einführung in Konzepte und Methoden, Thieme Verlag, 206 pp.
- Dormann, C. (2017): Parametrische Statistik – Verteilungen, Maximum Likelihood und GLM in R, Springer Spektrum, 363 pp.
- English:
- Crawley, M.J. (2013): The R Book, 2nd edition, Wiley & Sons, West Sussex, UK, 945 pp. → Comprehensive volume (nearly 1,000 pages!). Covers both basic statistics and a wide range of statistical modelling approaches.
- Quinn, G.P. & Keough, M.J. (2002): Experimental Design and Data Analysis for Biologists, Cambridge University Press, UK, 553 pp.
- Zuur, A.F., Ieno, E.N., Walker, N.J., Saveliev, A.A. & Smith, G.M. (2009): Mixed Effects Models and Extensions in Ecology with R, Springer, New York, USA, 574 pp. Further information: highstat.com → Covers simple linear regression models and their limitations, and describes alternative approaches. Contains various ecological case studies in which the EDA cycle is well described.
- James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013): An Introduction to Statistical Learning with Applications in R, Springer, 426 pp.