Data Science Tools for the Life Sciences



General Information:

format:                       block course

date:                           Monday 04.09.2023, through Wednesday 20.09.2023

sign up until:             31.05.2023

time:                           Monday through Friday, 9:00 until 17:00

place:                         Seminarraum 0.08 (Multimedia-Raum), Rechenzentrum, Felix-Hausdorff- Strasse 18, 17489 Greifswald

teaching language: English

Pass the course:      term paper that will be graded


moodle link:    


signup via HIS:


… or find the course in the “Vorlesungsverzeichnis”:


Mathematisch-Naturwissenschaftliche Fakultät / Institut für Data Science /



Responsible instructor: Dr. Alexander Scheuerlein, research assistant at the Institute of Data Science, University of Greifswald




Participants (max 20):

The course targets students in the Life sciences. Specifically, students of:

-       Zoology

o   (Bachelor students of Zoology receive credit points for a “Wahlspezial” module if they get written permission from Prof. Dr Steffen Harzsch before they sign up. See here for details of the process:


-       Botany

-       Human Biology

-       Environmental Natural Sciences

-       Environmental Sciences

-       Psychology




The course intends to provide students in life sciences expertise and adequate tools to identify data that are relevant for a research question, and analyze them in a meaningful way. For a reproducible environment, the course will utilize the open-source language package of R on Jupyterhub.


Major emphasis of the course will be on data retrieval, data management, and data handling in R.


·       Students will learn how to handle different data formats, and how to employ the appropriate tools to read these data into the R programming environment.


·       Students will learn how to perform exploratory data analysis, with an emphasis on data visualization based on the “tidyverse” collection of packages in R.


·       Students will be introduced to basic statistics, and how these can be applied in R. Starting with the basic concepts of statistics (distributions, means, variance, hypothesis testing, regression analysis), students will be introduced to the deployment of general linear models (glm). Particular emphasis will be laid on the underlying assumptions of the statistical models, and how to check whether these assumptions are met (residual analysis). Further concepts taught will be the use of model selection based on information criteria (AIC), multivariate approaches, generalized models (poisson, binomial), generalized additive models and models with random effects.  In the end, students will be able to run generalized additive models, interpret them, plot the results in a meaningful way, and understand when random effects are useful in model design.


·       Students will be introduced to the basic concept of machine learning and apply an example. Programming in this section will be done with Python. The main purpose of this section is to illustrate the broad applicability of machine learning for day-to-day routine tasks, such as cell counting or image analysis.


The course will end with a project: a research question should be conceived (or will be provided), and the relevant data should be retrieved (or will be provided). The students are expected to generate exploratory plots, and apply appropriate statistical tools to address the research question properly. To pass the course students have to produce term paper (as a jupyter notebook) which will be graded.