If you are interested in writing your thesis at the Software Technology Group, you can simply contact either one of the research assistants or Prof. Mezini. Ideally, you have had a prior look at the list of our current research topics, as most offered theses will cover one of these topics. If you are excited about a listed topic, just contact the topic's supervisor. However, If you are excited about another topic in the area of software engineering / technology that does not match our research topics exactly, please contact us regardless – maybe we can come up with something.
14 items found. Show all theses.
Data Consistency for Microservices
Microservice architectures have become a popular solution for developing modern distributed systems ensuring proper modularization of the system functionalities. Yet, microservices bring their own set of challenges. Most noticeable, data consistency, when data span multiple services, becomes a primary issue. A common solution to this problem is to use the “Saga” pattern, introduced in 1987 for long-lived transactions. With this pattern, a transaction is broken down into a sequence of local transactions that communicate via asynchronous messaging. To ensure atomicity, such transactions can be compensated to achieve a complete rollback in case one fails. Compared to traditional ACID transactions, however, Sagas exhibit two main issues: i) they do not guarantee isolation across microservices and ii) they are hard to implement as not every action is compensable with an inverse action.
Considering the aforementioned challenges, the candidate will investigate the way the Saga pattern is applied to real-world microservices. This step will be accomplished by mining GitHub repositories. Next, the candidate will work on a solution to guarantee a higher level of isolation for Sagas.
If you are interested in the topic or have any further questions, please contact: firstname.lastname@example.org-… (Nafise Eskandani)
Supervisor: Prof. Guido Salvaneschi
Language-level Support for Distributed Systems
Large-scale distributed systems require to embrace the trade-off between consistency and availability, accepting lower levels of consistency to guarantee higher availability. Existing programming languages are, however, agnostic to this compromise, resulting in consistency guarantees being implicitly adopted from the middleware or hardcoded in configuration files.
Recent advances in programming languages propose to integrate consistency levels into the language design, allowing developers to specify different consistency constraints in the same application. We investigate how consistency levels interact with object structure and define a type system that preserves correct program behavior.
The thesis unfolds at the boundary between programming languages and distributed systems. The goal is to investigate a novel programming model that integrates consistency and isolation – which have been traditionally supported at the middleware level – into the programming language.
Supervisor: Prof. Guido Salvaneschi
Tierless Software Development with ScalaLoci
Distributed applications are traditionally developed as separate modules, often in different languages, which react to events, like user input, and in turn produce new events for the other modules. Separation into components requires time-consuming integration. Manual implementation of communication forces programmers to deal with low-level details. The combination of the two results in obscure distributed data flows scattered among multiple modules, hindering reasoning about the system as a whole.
The ScalaLoci distributed programming language addresses these issues with a coherent model based on placement types that enables reasoning about distributed data flows, supporting multiple software architectures via dedicated language features and abstracting over low-level communication details and data conversions. In ScalaLoci, the components of the distributed system are developed in the same compilation unit and the compiler automatically spits them and generates the deployment units. ScalaLoci simplifies developing distributed systems, reduces error-prone communication code and favors early detection of bugs.
The thesis investigates new features in the ScalaLoci language. For example we are interested in improving performance, in mechanisms for event-based communication among the peers of the distributed systems, in a module systems that supports incremental development and in security features that protect the distributed application from malicious adversaries.
Supervisor: Prof. Guido Salvaneschi
Abstract Dependent Classes
Offered in co-supervision with Nada Amin (University of Cambridge, UK).
Dependent classes  combine the flexibility of dynamic dispatching à la Smalltalk with safety à la Scala. However, a beautiful core that can support abstract methods has still not been devised. A good starting point is the DC calculus of Vaidas Gasiunas' thesis . The exciting insight from the thesis is that subtyping can be described as constraint entailment -- not a new idea in itself, see Frank Pfenning's subtyping lecture notes  but something that hasn't been explored much in an object-oriented setting. The time is ripe due to very efficient constraint solvers (SMT — Satisfiability Modulo Theory).
There are two potential projects.
1. One is a working on an implementation, as a prototype for experimenting with case studies.
2. The other is a formalization in Coq, Agda or Isabelle to assure the proof sketch already outlined in the thesis. There are some old attempts in Isabelle that could serve as a starting point.
If you are interested in any of the above topics or have any further questions, please contact: bracevac -at- st.informatik.tu-darmstadt.de (Oliver Bracevac)
Supervisor: Dr.-Ing. Oliver Bracevac
Reengineering of Industrial C Code
Sofware based systems already play a major role in industrial production and this role will only grow in the context of Industrie 4.0. In order to solve new challenges that arise in this context, existing software has to be adapted in different directions, e.g. to enable the addition of new sensors or enable the creation of a digital twin. For this purpose, we want to uncover Software Product Line features and models, which are already present implicitly, and make them explicit. Therefore the development of corresponding analyses and automatic refactorings is necessary.
Candidates would work on different topics that enable software reuse of industrial controllers written in C.
These topics include (but are not limited to):
• automatic identification and localization of features
• automatic code slicing of identified features
• adaption of analyses to the presence of C preprocessor macros
• automatic module extraction
If you are interested in any of the above mentioned topics or have any further questions, please contact: email@example.com (Patrick Müller)
Supervisor: Patrick Müller, M.Sc.
Secure Integration of Cryptographic Software
Today, many applications use cryptographic components to provide a secure implementation. For a secure implementation, it is essential that a developer is aware of the correct and secure usage of cryptographic components. Recent studies have shown that developers struggle with this. Therefore, applications which are intended to be trustworthy, become insecure.
Within our research project “Secure Integration of Cryptographic Software” of the SFB CROSSING, we want to support developers when they integrate cryptographic components in an application. To achieve this aim, we have developed an Eclipse plugin which can generate secure cryptographic code and a static analysis which identifies insecure usages. Currently, we have created all rules checked by the analyis by hand. One of our next steps is to determine how we can automatically generate rules for correct and secure usages.
If you are interested in this research or the research of our project in general, you can contact Anna-Katharina Wickert.
You can find more details like an introduction video and our publications on our project page.
Supervisor: Anna-Katharina Wickert, M.Sc.
Analysis and Testing of Big Data Applications
Over the last few years, a vast amount of data has become available from a variety of heterogeneous sources, including social networks and cyber-physical systems. This state of the things has pushed recent research in the direction of investigating computing platforms and programming environments that support processing massive quantities of data. Systems like MapReduce, Spark, Flink, Storm, Hive, PigLating, Hadoop, HDFS have emerged to address the technical challenges posed by the nature of these computations, including parallelism, distribution, network communication and fault tolerance.
Despite the popularity of such systems, there has been little attention to aspects in the development process other than programming itself. For example, testing Big Data applications is an area that remains largely unexplored. This is even more surprising considering that testing has a long tradition in Software Engineering from a research standpoint (e.g., concoholic testing, mutation testing) as well as for practitioners, with established testing techniques and tools that are widespread in industry (e.g., JUnit).
The goal of this thesis is to develop a testing methodology for Big Data applications focusing on the Apache Spark platform. The candidate will apply testing techniques based on symbolic execution to the setting of Big Data. Ideally, the thesis will include a comparison of different approaches as well as the development of a new methodology specifically tailored for Big Data.
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing (HotCloud'10). USENIX Association, Berkeley, CA, USA, 10-10.
Supervisor: Prof. Dr. Guido Salvaneschi
Programming Languages for Software-Defined Networks
Software-Defined Networks (SDNs) provide a new way to configure computer networks. Special-purpose network devices with tightly coupled data and control planes are replaced by programmable switches managed by a logically centralized controller. The communication between the controller and these programmable switches is carried out using well-defined APIs (e.g., OpenFlow). Instead of configuring devices individually, network policies are implemented on top of the controller and then used by the controller to instruct individual network switches.However, SDN APIs like OpenFlow closely resemble the features provided by the hardware. OpenFlow uses a set of prioritized match-action rules as abstraction, which makes it difficult to write sophisticated network applications. For example, an application supporting multiple tasks at the same time must merge the switch-level rules required by each of the tasks into a single set of rules that can be installed on the switches. To overcome these limitations, different programming languages for SDNs have been proposed, that provide higher language abstractions on top of OpenFlow, including abstractions for querying the network state, basic service composition or language support for network verification.Developing new SDN language features requires a comparison with already existing languages, usually, in order to show that they increase the expressivity of the language while providing at least the same performance at runtime. However, currently it is quite cumbersome to compare different SDN languages, since they are implemented on top of different host languages, like Python, OCaml, Java and they usually provide only a small set of simple example application that are not directly comparable with each other.The goal of this thesis is to develop an extensible testbed for SDN applications that allows to compare SDN programming languages with respect to expressivity as well as performance and to provide a set of small- to medium-sized example applications that can be used for benchmarking and comparing the available language abstractions. Based on the results of the experiments, the next step will be to propose new language features that address the limitations of current solutions.
Virtual Machine Support for Reactive Programming
Just in time compiler – interpreter optimization
Performance remains, however, a major limitation of RP. Most RP implementations are based on libraries where the language runtime is agnostic to reactive abstractions. As a result, a number of aspects like change propagation, dependency tracking and memory management that could be specifically optimized can only benefit from general purpose optimization such as those provided by out of the box just in time compilers. Optimization at the virtual machine level has the potential to address these issues.
Supervisor: Prof. Dr. Guido Salvaneschi
A debugging system for reactive programming.
Reactive languages provide dedicated language-level abstractions for the development of reactive applications, such as event streams and signals. Thanks to this programming technique, reactive applications can be developed in a declarative way, specifying the relation that exists among reactive entities. In contrast to traditional programming paradigms, which require manual updates, with reactive programming, changes are propagated automatically: dependent components (e.g., the GUI) are updated by the runtime when the entities they depend on change (e.g., the data in the model).
Because of the declarative flavor of reactive programming, traditional debugging tools based on step-by step execution and memory inspection, are hardly effective in this context. For example, in a declarative setting, it is not clear how to “step-over” statements or what “setting a breakpoint” would even mean.
An empirical study on reactive programming.
Reactive programming (RP) supports the development of reactive software through dedicated language-level abstractions, like signals and events. Previous research on RP mostly had a programming-languages focus. Researcher extended the available abstractions, made them more interoperable, improved safety using dedicated type systems, and proposed performance optimizations. Over the years, it has been shown that RP overcomes the well-known drawbacks of the Observer design pattern and leads to better software design.
Unfortunately, when it comes to human factors, the state of the things is less clear. There is preliminary evidence that RP do improve program comprehension in controlled experiments that compare developers using RP against a control group using traditional techniques. However, little is known about why this is the case and which reasoning processes developers adopt when facing a program in the RP style.
This kind of research challenges have been tackled before thorough experiments that apply the “think loudly” approach, where developers are required to describe the activities they are doing while coding. Collected data allow one to get interesting insights about how developers work and which mental processes guide their choices.
Eclipse plugin for reactive programming.
Reactive programming is a recent programming paradigm that specifically supports the development of reactive applications. It provides dedicated language abstractions, like signals and events, that overcome the disadvantages of the traditional Observer pattern.
Previous research on reactive programming has greatly improved the abstractions available to the developer. Other research areas focused on non-functional properties, like proving safety or time-bound execution of reactive applications.
Interestingly, supporting reactive applications with dedicated tools and programming environments is a mostly unexplored area. However, the field is extremely promising, since reactive applications exhibit regular patterns that can be easily exploitable by the IDE.
Language integration of complex event processing
Complex Event Processing (CEP) is about performing queries over time-changing streams of data. This technology is used to correlate low-level events and generate higher-level events when a certain pattern occurs. Applications include trading software, elaboration from environmental sensors, and intrusion detection through network packets analysis. For example, a CEP system can receive a stream of data from the trading market and perform the following query: “For the five most recent trading days starting today, select all stocks that closed higher than the company X on a given day. Keep the query standing for 20 trading days”. CEP engines, however are usually not integrated in the programming language and simply accept SQL-like queries in the form of strings. In the last few years, parallel line of research investigated event-based languages, that provide ah-hoc support for events. For example C# allows one to define events beside fields and methods in class interfaces. However, this class of languages often simply provides syntactic support for the Observer pattern and does not reach the expressivity of CEP. The goal of the thesis is to achieve high expressivity and integration with the existing abstractions in the same language.
Open Analyses Library
OPAL is a comprehensive library for static analyses that is developed in Scala to facilitate the writing of a wide range of different kinds of analyses. OPAL supports the development of analyses ranging from bug/bug pattern detection up to full-scale data-flow analyses.
In the context of this project we are always searching for students who are interested in static analysis and want to implement them using Scala. Topics of interest are, e.g., analyses to detect and validate a software's architecture, to find security wholes, to develop need base static analyses such as Call Graph Algorithm or to visualize a software.