
Open Theses

  • Master Thesis

    Current code models use text-based signals (such as next token prediction or masked language modelling) in an unsupervised setting to pretrain the models followed by fine-tuning with a labelled dataset. While this has produced good performance, it has its limitations. CodeRL, on the other hand, uses reinforcement learning with feedbacks from unit tests to train an agent to generate code given a prompt. However, unit tests don’t capture all the information about code and are not always available.

    The thesis will focus on developing an environment, that gives feedback through program analysis, within which an agent can learn to perform different tasks. Prior experience with program analysis would be required to work on this thesis.

    Supervisor: Abhinav Ananad, M.Sc.

  • Bachelor Thesis, Master Thesis

    State-of-the-art models for code generation and understanding, such as CodeBERT and CodeT5, are basically LLMs designed for natural language (BERT, T5) but trained on a large corpus of code. However, LLMs are known to hallucinate, i.e, they often produce sentences and information that are factually incorrect. One way to prevent, or at least reduce the impact of, hallucination in LLMs is to augment them with information retrieval.

    It is currently not known if these models hallucinate when used for code intelligence tasks too. The thesis would focus on developing methods to understand hallucinations in the models. Further, explore the possibilities of retrieval-based augmentation to reduce the effect of hallucinations.

    Supervisor: Abhinav Ananad, M.Sc.

  • Master Thesis

    In Probabilistic Programming, dedicated programming languages provide the means to describe the structure of a stochastic process and to then use machine learning to learn the parameters of that process from data. Usually, these languages use approximate methods to ensure tractability. We can however also use exact tractable models for this task.

    To this end, the set of legal programs needs to be restricted, preferably at compile time. This restriction can be seen as a type inference problem, where types represent the permitted operations and algorithms of an expression.

    The topic of this thesis is to find a way to express these restrictions as a type inference problem, for example as a Hindley-Milner type system.

    Examiner: Prof. Dr.-Ing. Mira Mezini

    Supervisor: David Richter, M.Sc.

  • Bachelor Thesis, Master Thesis

    OPAL is a comprehensive library for static analyses that is developed in Scala to facilitate the writing of a wide range of different kinds of analyses. OPAL supports the development of analyses ranging from bug/bug pattern detection up to full-scale data-flow analyses.

    In the context of this project we are always searching for students who are interested in static analysis and want to implement them using Scala. Topics of interest are, e.g., to develop needed base static analyses such as Call Graph Algorithm, analyses to find security issues or to visualize software.

    If you are interested in OPAL, do not hesitate to contact Dominik Helm. For further information, you can also go to The OPAL Project

    Examiner: Prof. Dr.-Ing. Mira Mezini

    Supervisors: Dr.-Ing. Dominik Helm, Tobias Roth, M.Sc.

  • Bachelor Thesis, Master Thesis

    Sofware based systems already play a major role in industrial production and this role will only grow in the context of Industrie 4.0. In order to solve new challenges that arise in this context, existing software has to be adapted in different directions, e.g. to enable the addition of new sensors or enable the creation of a digital twin. For this purpose, we want to uncover Software Product Line features and models, which are already present implicitly, and make them explicit. Therefore the development of corresponding analyses and automatic refactorings is necessary.

    Candidates would work on different topics that enable software reuse of industrial controllers written in C.

    These topics include (but are not limited to):

    • automatic identification and localization of features

    • automatic code slicing of identified features

    • adaption of analyses to the presence of C preprocessor macros

    • automatic module extraction

    Examiner: Prof. Dr.-Ing. Mira Mezini

    Supervisor: Patrick Müller, M.Sc.