Current code models use text-based signals (such as next token prediction or masked language modelling) in an unsupervised setting to pretrain the models followed by fine-tuning with a labelled dataset. While this has produced good performance, it has its limitations. CodeRL, on the other hand, uses reinforcement learning with feedbacks from unit tests to train an agent to generate code given a prompt. However, unit tests don’t capture all the information about code and are not always available.
The thesis will focus on developing an environment, that gives feedback through program analysis, within which an agent can learn to perform different tasks. Prior experience with program analysis would be required to work on this thesis.