Open Theses
Master Thesis
Current code models use text-based signals (such as next token prediction or masked language modelling) in an unsupervised setting to pretrain the models followed by fine-tuning with a labelled dataset. While this has produced good performance, it has its limitations. CodeRL, on the other hand, uses reinforcement learning with feedbacks from unit tests to train an agent to generate code given a prompt. However, unit tests don’t capture all the information about code and are not always available.
The thesis will focus on developing an environment, that gives feedback through program analysis, within which an agent can learn to perform different tasks. Prior experience with program analysis would be required to work on this thesis.
Supervisor: Abhinav Ananad, M.Sc.
2023/02/28
Bachelor Thesis, Master Thesis
State-of-the-art models for code generation and understanding, such as CodeBERT and CodeT5, are basically LLMs designed for natural language (BERT, T5) but trained on a large corpus of code. However, LLMs are known to hallucinate, i.e, they often produce sentences and information that are factually incorrect. One way to prevent, or at least reduce the impact of, hallucination in LLMs is to augment them with information retrieval.
It is currently not known if these models hallucinate when used for code intelligence tasks too. The thesis would focus on developing methods to understand hallucinations in the models. Further, explore the possibilities of retrieval-based augmentation to reduce the effect of hallucinations.
Supervisor: Abhinav Ananad, M.Sc.
Publications
Error on loading data
An error has occured when loading publications data from TUbiblio. Please try again later.
-
{{ year }}
-
; {{ creator.name.family }}, {{ creator.name.given }}{{ publication.title }}.
; {{ editor.name.family }}, {{ editor.name.given }} (eds.); ; {{ creator }} (Corporate Creator) ({{ publication.date.toString().substring(0,4) }}):
In: {{ publication.series }}, {{ publication.volume }}, In: {{ publication.book_title }}, In: {{ publication.publication }}, {{ publication.journal_volume}} ({{ publication.number }}), ppp. {{ publication.pagerange }}, {{ publication.place_of_pub }}, {{ publication.publisher }}, {{ publication.institution }}, {{ publication.event_title }}, {{ publication.event_location }}, {{ publication.event_dates }}, ISSN {{ publication.issn }}, e-ISSN {{ publication.eissn }}, ISBN {{ publication.isbn }}, DOI: {{ publication.doi.toString().replace('http://','').replace('https://','').replace('dx.doi.org/','').replace('doi.org/','').replace('doi.org','').replace("DOI: ", "").replace("doi:", "") }}, Official URL, {{ labels[publication.type]?labels[publication.type]:publication.type }}, {{ labels[publication.pub_sequence] }}, {{ labels[publication.doc_status] }} - […]
-
Number of items in this list: >{{ publicationsList.length }}
Only the {{publicationsList.length}} latest publications are displayed here.