Most state-of-the-art machine-learning models are based on either
- the (self)-attention (or transformer) mechanism (which is used in Large Language Models like GPT3)
- diffusion models (which are used in generative image models like DALL-E and Stable Diffusion)

We investigate their mathematical underpinning, as well as implementation details.

Prerequisites:
Linear Algebra 1+2 is a must;
Probability theory is a must;
knowledge of python is a must (but you can learn it, on your own, in parallel);
Machine Learning is beneficial (that lecture is held by Prof. Stanke this semester);
knowledge of stochastic differential equations (SDEs) is beneficial (a lecture
'Stochastische Differentialgleichungen: Theorie und Numerik' by Prof. Pulch is
held this semester)

The course and the exercise session are held in English.