If you are interested in writing your thesis at the Software Technology Group, you can simply contact either one of the research assistants or Prof. Mezini. Ideally, you have had a prior look at the list of our current research topics, as most offered theses will cover one of these topics. If you are excited about a listed topic, just contact the topic's supervisor. However, If you are excited about another topic in the area of software engineering / technology that does not match our research topics exactly, please contact us regardless – maybe we can come up with something.
17 items found. Show all theses.
Over the last few years, a vast amount of data has become available from a variety of heterogeneous sources, including social networks and cyber-physical systems. This state of the things has pushed recent research in the direction of investigating computing platforms and programming environments that support processing massive quantities of data. Systems like MapReduce, Spark, Flink, Storm, Hive, PigLating, Hadoop, HDFS have emerged to address the technical challenges posed by the nature of these computations, including parallelism, distribution, network communication and fault tolerance.
Despite the popularity of such systems, there has been little attention to aspects in the development process other than programming itself. For example, testing Big Data applications is an area that remains largely unexplored. This is even more surprising considering that testing has a long tradition in Software Engineering from a research standpoint (e.g., concoholic testing, mutation testing) as well as for practitioners, with established testing techniques and tools that are widespread in industry (e.g., JUnit).
The goal of this thesis is to develop a testing methodology for Big Data applications focusing on the Apache Spark platform. The candidate will apply testing techniques based on symbolic execution to the setting of Big Data. Ideally, the thesis will include a comparison of different approaches as well as the development of a new methodology specifically tailored for Big Data.
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing (HotCloud'10). USENIX Association, Berkeley, CA, USA, 10-10. go
Supervisor: Prof. Dr. Guido Salvaneschi
Type systems shall ensure that correctly typed programs do not get stuck during execution. More formally, type systems shall satisfy certain soundness criteria, for example, progress and preservation (see Pierce). However, proving the soundness of a type system by hand is a tedious and error-prone task. Therefore, it is desirable to make use of existing verification tools in order to create fully or at least partially machine-checked soundness proofs.
Candidate tools for a formal verification of progress and preservation are for example Isabelle/HOL, Coq, Twelf, and Dafny. These theorem provers provide some support for automated verification of individual proof steps: For example, Isabelle Sledgehammer is a tactic that integrates automated first-order theorem provers such as the E-prover, SPASS, Vampire etc. into Isabelle. Dafny internally uses the SMT solver Z3 to automate proof steps. Twelf provides basic, but powerful support for automating some common forms of inductions which are typically used to prove properties about programming language specifications.
In this thesis, the student candidate shall evaluate how well these existing automation techniques are suited for facilitating the mechanisation of progress and preservation proofs. The student candidate shall use at least three existing verification tools (Isabelle/HOL and Dafny, and a third tool of her or his own choice) to formally prove progress and preservation of a type system specification for a subset of SQL and evaluate at least the following questions by experimenting with the formalised proofs:
- Which general steps can each theorem prover automatically prove in the progress and preservation proof at hand?
- Which steps cannot be automated, i.e. require manual interaction with the theorem prover?
- How “much” of the proofs can each prover automate, and what are sensible metrics to measure this quantity?
- How does each of the theorem provers help with the discovery of proof steps?
Software-Defined Networks (SDNs) provide a new way to configure computer networks. Special-purpose network devices with tightly coupled data and control planes are replaced by programmable switches managed by a logically centralized controller. The communication between the controller and these programmable switches is carried out using well-defined APIs (e.g., OpenFlow). Instead of configuring devices individually, network policies are implemented on top of the controller and then used by the controller to instruct individual network switches.However, SDN APIs like OpenFlow closely resemble the features provided by the hardware. OpenFlow uses a set of prioritized match-action rules as abstraction, which makes it difficult to write sophisticated network applications. For example, an application supporting multiple tasks at the same time must merge the switch-level rules required by each of the tasks into a single set of rules that can be installed on the switches. To overcome these limitations, different programming languages for SDNs have been proposed, that provide higher language abstractions on top of OpenFlow, including abstractions for querying the network state, basic service composition or language support for network verification.Developing new SDN language features requires a comparison with already existing languages, usually, in order to show that they increase the expressivity of the language while providing at least the same performance at runtime. However, currently it is quite cumbersome to compare different SDN languages, since they are implemented on top of different host languages, like Python, OCaml, Java and they usually provide only a small set of simple example application that are not directly comparable with each other.The goal of this thesis is to develop an extensible testbed for SDN applications that allows to compare SDN programming languages with respect to expressivity as well as performance and to provide a set of small- to medium-sized example applications that can be used for benchmarking and comparing the available language abstractions. Based on the results of the experiments, the next step will be to propose new language features that address the limitations of current solutions. go
Distributed software applications communicate with remote components over the network. In such networked applications state is replicated at all components, which requires to keep state synchronized. To this end consistency is achieved by enforcing a single global order of operations on state. Traditionally, this requirement is achieved by all components to reliably and quickly communicating with each other every time the state is changed. However, in many realistic scenarios, networks can be unreliable. For example, mobile connections are often not available in remote areas, or can be overloaded in highly crowded areas.
As global ordering of updates requires consensus among all participating components, consistent state changes become slow or impossible when the network connection is unreliable. Many applications keep being responsive to the user even when the network is unavailable, allowing local state to diverge from the global state. The applications then tries to fix the divergence when the network becomes available again. Unfortunately, multiple independent changes to the state may be conflicting. Conflicts can not always be automatically handled, and can lead to data loss or incorrect behavior.
One solution proposed in the literature are Commutative Replicated Data Types (CRDTs). A category of data types supporting a useful set of conflict-free operations. Conflict freedom is achieved by using commutative operations, where the order of the operations do not matter for the result. Without the need to rely on the order, updates to the state are just applied whenever they are received, and the whole system will eventually reach a consistent state, when all updates are applied.
This thesis proposes language abstractions allowing developers to systematically design and implement systems that deal with the problem of synchronizing state in unreliable networks, depending on available information, resources and connectivity. The key challenge of this thesis is to find a good trade off between the ability to express complex changes to state, while still allowing for a system which can automatically resolve conflicts. go
Supervisor: Ragnar Mogk, M.Sc.
Just in time compiler – interpreter optimization
Performance remains, however, a major limitation of RP. Most RP implementations are based on libraries where the language runtime is agnostic to reactive abstractions. As a result, a number of aspects like change propagation, dependency tracking and memory management that could be specifically optimized can only benefit from general purpose optimization such as those provided by out of the box just in time compilers. Optimization at the virtual machine level has the potential to address these issues. go
Supervisor: Prof. Dr. Guido Salvaneschi
The STG group at TU Darmstadt has formalised the calculus of structured interactions (CoSI) and the type theory of Essential Multiparty Session Types (EMST) to describe and statically validate safe and deadlock free interactions in concurrent, distributed systems. Multiparty session types is a well-established type theory that is highly attracting the attention of the programming language and software engineering communities.
The CoSI and EMST aim to establish the foundations of session types and their transition to practice.
In this project, the bachelor or master student will participate in the design and prototyping of CoSI and global types constructs, devising a library for a mainstream class-based language. The library must support constructs for structuring interactions: session initiation, external and internal choice, matching, and session channels, and the runtime environment that will deal with various characteristics of initiation.The student will use Scribble as a description language for our global types. Scribble is designed based on the ideas and typing strategy underlying the original MST. The amount of work will be properly partitioned in relation to the degree (bachelor or master).
The ideal student has taken a course or has prior industry or research experience in programming languages and/or compilers. go
Supervisor: Dr. Andi Bejleri
The STG group at TU Darmstadt has formalised the E-calculus and a linear type system to model and statically validate composition of decoupled event-based processes, a.k.a. cooperative decoupled event-based processes. In the last decade, event-driven programming has become a major paradigm for developing distributed systems.
Decoupling makes it possible for new components to get deployed at runtime respectively of existing components that can crash or leave without affecting others and enables modules underlying runtime components to be developed independently, e.g., in publish-subscribe systems, components are not bound to interfaces/modules of other components. However, it is an illusion to believe that one can just run components that were developed independently of each other. At least they need to agree on the content of the messages that they exchange. Yet, the latter alone does not mean that they will accomplish something useful. Hence, there is a need to ensure that they cooperate in a sound manner.
In this project, the bachelor or master student will participate in the design and prototyping of the E-calculus and its type system as a library for a mainstream programming language, preferably Scala or Java to evaluate the practicality of the E-calculus and its type system. A performance evaluation for the library will be developed. The amount of work will be properly partitioned in relation to the degree (bachelor or master).
The ideal student has taken a course or has prior industry or research experience in programming languages, event-driven programming, cooperative and publish-subscribe architectures. go
Supervisor: Dr. Andi Bejleri
The STG group at TU Darmstadt has formalised the calculus of structured interactions (CoSI) and the type theory of Essential Multiparty Session Types (EMST) to describe and statically validate safe and deadlock free interactions. Multiparty session types is a well-established type theory that is highly attracting the attention of the programming language and software engineering communities.
The CoSI and EMST aim to establish the foundations of session types and their transition to practice. The fundamental properties of typed CoSI processes such as subject reduction, communication safety, and progress are established and proved.
In this project, the bachelor or master student will participate in the design and prototyping of a plugin for a well-established theorem prover (preferably Isabelle or Coq) to assist the formal proofs of soundness for EMST. The plugin will consist in modelling the formal definition of CoSI and its type system in the preferable theorem prover. A test of the plugin will be developed, using the current proofs of soundness of Essential Multiparty Session Types.
The ideal student has taken a course or has prior industry or research experience in interactive theorem provers (proof assistants) and formal proofs. go
Supervisor: Dr. Andi Bejleri
This Thesis proposes to unify the programming model described earlier with dataflow languages. Dataflow languages are a programming technique to process event streams in a reactive and declarative way – an approach that has been popularized by the success story for the Netflix backend. The key idea in the Thesis to model in a single code unit complex interaction patterns as dataflow abstractions that seamlessly propagate form the user interface to the server backend. This approach can significantly simplify the development of Web applications and higher the level of abstraction at which programmers reason about client-server interactions. go
Reactive languages provide dedicated language-level abstractions for the development of reactive applications, such as event streams and signals. Thanks to this programming technique, reactive applications can be developed in a declarative way, specifying the relation that exists among reactive entities. In contrast to traditional programming paradigms, which require manual updates, with reactive programming, changes are propagated automatically: dependent components (e.g., the GUI) are updated by the runtime when the entities they depend on change (e.g., the data in the model).
Because of the declarative flavor of reactive programming, traditional debugging tools based on step-by step execution and memory inspection, are hardly effective in this context. For example, in a declarative setting, it is not clear how to “step-over” statements or what “setting a breakpoint” would even mean. go
Reactive programming (RP) supports the development of reactive software through dedicated language-level abstractions, like signals and events. Previous research on RP mostly had a programming-languages focus. Researcher extended the available abstractions, made them more interoperable, improved safety using dedicated type systems, and proposed performance optimizations. Over the years, it has been shown that RP overcomes the well-known drawbacks of the Observer design pattern and leads to better software design.
Unfortunately, when it comes to human factors, the state of the things is less clear. There is preliminary evidence that RP do improve program comprehension in controlled experiments that compare developers using RP against a control group using traditional techniques. However, little is known about why this is the case and which reasoning processes developers adopt when facing a program in the RP style.
This kind of research challenges have been tackled before thorough experiments that apply the “think loudly” approach, where developers are required to describe the activities they are doing while coding. Collected data allow one to get interesting insights about how developers work and which mental processes guide their choices.
Reactive programming is a recent programming paradigm that specifically supports the development of reactive applications. It provides dedicated language abstractions, like signals and events, that overcome the disadvantages of the traditional Observer pattern.
Previous research on reactive programming has greatly improved the abstractions available to the developer. Other research areas focused on non-functional properties, like proving safety or time-bound execution of reactive applications.
Interestingly, supporting reactive applications with dedicated tools and programming environments is a mostly unexplored area. However, the field is extremely promising, since reactive applications exhibit regular patterns that can be easily exploitable by the IDE. go
Complex Event Processing (CEP) is about performing queries over time-changing streams of data. This technology is used to correlate low-level events and generate higher-level events when a certain pattern occurs. Applications include trading software, elaboration from environmental sensors, and intrusion detection through network packets analysis. For example, a CEP system can receive a stream of data from the trading market and perform the following query: “For the five most recent trading days starting today, select all stocks that closed higher than the company X on a given day. Keep the query standing for 20 trading days”. CEP engines, however are usually not integrated in the programming language and simply accept SQL-like queries in the form of strings. In the last few years, parallel line of research investigated event-based languages, that provide ah-hoc support for events. For example C# allows one to define events beside fields and methods in class interfaces. However, this class of languages often simply provides syntactic support for the Observer pattern and does not reach the expressivity of CEP. The goal of the thesis is to achieve high expressivity and integration with the existing abstractions in the same language. go
Build systems are used in all but the smallest software projects to invoke the right build tools on the right files in the right order. A build system must be sound (after a build, generated files consistently reflect the latest source files) and efficient (recheck and rebuild as few build units as
possible). Contemporary build systems provide limited efficiency because they lack support for expressing fine-grained file dependencies. We have defined a novel build system ClearDep that supports sound and optimal incremental builds.
The goal of this thesis is to develop mechanisms for migrating existing build scripts to our build system ClearDep. A possible roadmap might be:
- Identify/establish a benchmark of projects that use build system X
- Implement an abstract builder X in ClearDep
- Manually migrate some of the projects to use ClearDep
- Measure build times
- Tune the ClearDep script to optimize performance
- Investigate support for automated build-script migration
The student can freely choose build system X, from which build scripts are migrated to ClearDep.
Type systems shall ensure that correctly typed programs do not get stuck during execution. More formally, type systems shall satisfy certain soundness criteria, for example, progress and preservation (see Pierce). However, proving the soundness of type systems by hand is a tedious and error-prone task. Therefore, it is desirable to have special tool-support for soundness proofs of type systems such that these proofs, or at least large parts of them, can be conducted automatically.
To this end, we have built a research prototype called Veritas that makes it possible for conducting progress and preservation proofs for type systems for simple programming languages (for example for the simply-typed lambda calculus) in a semi-automatic fashion: Main proof steps of the progress and preservation proofs can be stated within Veritas by hand and can be proven fully automatically by a first-order theorem prover. go
OPAL is a comprehensive library for static analyses that is developed in Scala to facilitate the writing of a wide range of different kinds of analyses. OPAL supports the development of analyses ranging from bug/bug pattern detection up to full-scale data-flow analyses.
In the context of this project we are always searching for students who are interested in static analysis and want to implement them using Scala. Topics of interest are, e.g., analyses to detect and validate a software's architecture, to find security wholes, to develop need base static analyses such as Call Graph Algorithm or to visualize a software.
GA computing is a new way of geometrically intuitive computing. Based on the underlying Geometric Algebra (GA), it is very easy to compute with geometric objects such as spheres, planes and circles or to handle geometric operations and transformations. For example, the intersection of two spheres, which is a circle, can be simply expressed based on the outer product of the algebra. This technology leads to more compact algorithms for many engineering areas such as computer graphics, computer vision and robotics as well as for simulations, e.g. molecular dynamics simulation.
The goal of this thesis is to build support for GA computing into Scala using SugarScala. SugarScala is a syntactically extensible variant of Scala where developers can introduce new language features through libraries. Using this technology, this thesis will design and integrate language constructs specific to GA computing. From here, there are two alternative directions:
- Use lightweight modular staging to compile GA programs to highly-efficient code (the optimizations are well-known), and (time permitting) use Delite to run GA programs in parallel.
- Design and implement a language on top of the SugarScala GA features that more easily allows experts from computer graphics, computer vision, robotics, or molecular dynamics to express GA programs.