Comparing Existing Tool Support for the Verification of a Type System’s Soundness

Comparing Existing Tool Support for the Verification of a Type System’s Soundness

Bachelor Thesis, Master Thesis

Type systems shall ensure that correctly typed programs do not get stuck during execution. More formally, type systems shall satisfy certain soundness criteria, for example, progress and preservation (see Pierce). However, proving the soundness of a type system by hand is a tedious and error-prone task. Therefore, it is desirable to make use of existing verification tools in order to create fully or at least partially machine-checked soundness proofs.

Candidate tools for a formal verification of progress and preservation are for example Isabelle/HOL, Coq, Twelf, and Dafny. These theorem provers provide some support for automated verification of individual proof steps: For example, Isabelle Sledgehammer is a tactic that integrates automated first-order theorem provers such as the E-prover, SPASS, Vampire etc. into Isabelle. Dafny internally uses the SMT solver Z3 to automate proof steps. Twelf provides basic, but powerful support for automating some common forms of inductions which are typically used to prove properties about programming language specifications.

Research Objective
In this thesis, the student candidate shall evaluate how well these existing automation techniques are suited for facilitating the mechanisation of progress and preservation proofs. The student candidate shall use at least three existing verification tools (Isabelle/HOL and Dafny, and a third tool of her or his own choice) to formally prove progress and preservation of a type system specification for a subset of SQL and evaluate at least the following questions by experimenting with the formalised proofs:

  • Which general steps can each theorem prover automatically prove in the progress and preservation proof at hand?
  • Which steps cannot be automated, i.e. require manual interaction with the theorem prover?
  • How “much” of the proofs can each prover automate, and what are sensible metrics to measure this quantity?
  • How does each of the theorem provers help with the discovery of proof steps?