David Clark's MSc projects

2009-10



My research interests include

Below is a list of suggestions but it is not completely rigid and I am willing to negotiate.


DCM1  A trace slicer for AAPL (detecting malware)

Description

AAPL is the Abstract Assembler Programming Language and was developed by Laurence Tratt and my phd student, Khalid Alzarouni, as a target language for reverse engineering compiled code. Khalid has also developed an algorithm for slicing semantic traces for AAPL. In this project you will implement a semantic simulator for AAPL. This initially will require developing a lexical analyser and a parser for AAPL and using them to build Abstract Syntax Trees. Once you implement the trace semantics for AAPL it will be possible to generate semantic traces. At this point you will have produced a semantic simulator for AAPL. The project then has a natural extension to implementing an existing trace slicing algorithm and doing some experiments to discover how effective it is in practice at reducing trace size. This project is related to research on semantic detection of malware.

Objectives


Skills required

Implementation

You will produce a semantic simulator for AAPL with possible extension to a trace slicer.




DCM2  An e-bay like web site implemented in JIF (flow security)

Description

Java with Information Flow (JIF) is an extension to Java which provides security label annotations as part of the program and uses those labels to control information flow through a type system. Using such a type system it is possible to implement confidentiality and other security policies using the construction of the program. The aim of this project is to learn about JIF and writing programs in JIF, then develop a web site that has as much as possible developed using JIF.

Objectives


Skills required

Implementation

You will produce a web site with the basic functionality of e-bay, but written in JIF as much as possible.




DCM3  Flow sensitive typing for a While language (flow security)

Description

Volpano and Smith developed typing which ensures flow security for programs but has the drawback of being flow insensitive. Hunt and Sands have developed flow sensitive typing which can successfully type a larger class of programs than flow insensitive typing. In this project you will aim to implement both approaches and produce some experiments comparing them. This will require you to develop a lexical analyser and parser for a core imperative language. An algorithm exists that implements Volpano and Smith's approach but you will have to develop your own algorithm for Hunt and Sand's approach. You will be given guidance on this.

Objectives

You implement two different security typing approaches and compare them.

Skills required

Implementation

You will produce implementations of two different typing approaches to guaranteeing flow security for a core imperative programming language.


DCM4  Probabilistic semantic simulator (flow security)

Description

The aim of this project is to write a semantic simulator for a While language that works on probability distributions instead of single inputs. This will require you to write a lexical analyser and parser for a While language (a core imperative language) and then implement one of Kozen's probabilistic semantics for a While language. This work is closely connected to the problem of measuring the amount of information that passes between two variables when a batch program executes.

Objectives

Implement a semantic simulator for the While language that  takes probability distributions as inputs rather than single values.

Skills required

Implementation

You will produce a program that  takes as input a while program and a probability distribution on the inputs and produces a probability distribution on the outputs of the while program.



DCM5  Formal Power Series analysis for PAs (flow security)

Description

Unlike for programming languages, there is no automatable analysis for flow quantity for any process algebra (PA). The notions of flow quantity for process algebras have been limited to semantics based definitions. Together with colleagues in Italy, I have developed a syntax based analysis for a very simple process algebra which uses Formal Power Series. The aim of this project is to automate this analysis and use it to calculate the flow quantity between high and low events in the possible traces of a process. The method is very general so there is a possible extension to a different analysis, for example termination or logical properties. It will be necessary to develop a lexical analyser and a parser for the process algebra, produce an algorithm that uses the formal power series definitions to calculate the matrix of conditional probabilities and automate the calculation of mutual information using the matrix.

Objectives

Automate an analysis of a process algebra which calculates an information flow measure from sequences of high events to sequences of low events

Skills required

Implementation

You will produce a program that takes as input a process term in the process algebra and calculates the quantity of information flow from secret events to public events. 


DCM6  GAs for optimal slices in EFSMs (slicing models)

Description

Extended Finite State Machines are often used to model reactive systems. Slicing is a well known source code analysis technique that consists of following dependencies to determine which parts of the program affect a set of variables at a given statement of interest (i.e. the slicing criterion). Recent work has applied slicing to EFSMs for reducing the size of the model for comprehension. A minimisation algorithm (usually defined for finite state automata) is used for slicing, however, it may produce exponentially larger EFSMs. This project proposes that a genetic algorithm be applied for minimising the EFSM slices. The fitness function could be the size of the model given in terms of the number of transitions, and the algorithm should terminate when the smallest model has been found.

Objectives

Learn about EFSMs and slicing of EFSMs. Learn Python. Learn about genetic algorithms and implement a genetic algorithm for minimising an EFSM. 

Skills required

Implementation

You will implement a program in Python that takes as input an EFSM marked with transitions to be kept in the slice and use a genetic algorithm for slicing the EFSM. This program can be integrated in the CREST slicing tool if successful.



David Clark