Title: Topic modelling

Organizers: Organized within COMPSTAT by the COST Action HiTEc.

Date: 30th of August 30 (5 hours).

Requeriments: Programming skills in R and/or Python are desirable.

Trainer: Dr. Ivan Savin, Associate professor at ESCP Business School https://www.ivanvsavin.org/ Ivan completed his PhD on computational econometrics at the Justus Liebig University in Giessen in 2011 and his habilitation at the Karlsruhe Institute of Technology in 2017. Ivan uses methods from applied statistics, machine learning and agent-based modelling in application to fields like economics of innovation and climate policy analysis.

Summary: The tutorial will offer both a theoretical understanding and practical skills of working with the method called "topic modelling". Using this method developed on the intersection of machine learning and natural language processing participants will learn new ways on how to: 1) cluster textual data into meaningful topics, 2) relate this information to other characteristics of the texts and 3) discuss potential ways how to develop a storyline around this type of analysis.

This method can be applied for example on responses to open-ended survey questions to elicit preferences, to research articles for large-scale literature reviews, to posts in social networks like Twitter and many more. Using R software a few examples of applying the method will be discussed in detail with the participants.


  • Motivation
  • What is Topic Modelling (TM) and how it works.
  • What are advantages of Structural TM.
  • Hands-on with a few examples (literature review, survey, social network data)
  • Time for questions