Identifying hidden intertextuality in 16th century

Semestre project autumn 2023

In this semestre project, the goal was to acknowledge the possibility to use a Large Language Model to embed phrases. It posed the foundation for the future work and highlighted the strengths and weaknesses of such approach.

This first iteration allows to compare each sentence from a given against each sentence of another text.

Semestre project spring 2024

This semestre project extends the previous one by providing an interface for the users to interact with.

This graphic interface allows non-technical users that are not familiar with command line interface to interact with the tool. It displays a global similarity between two texts and highlight each occurrence of intertextuality with a degree of similarity.

Bachelor project Spring 2024

This bachelor project tackles one of the biggest challenge of working with texts from the 16th century with models trained on modern languages: spelling and grammar are different than in their modern counterparts.

We used the semantic power of modern LLMs (in this instance ChatGPT 3.5) to « modernize » texts before they are compared. Doing so allowed for a more accurate embedding of the texts, preserving better their semantic meaning, giving better results than the previous approach.

Master semestre project Spring 2025

In this project, we addressed the major bottleneck in performance of previous approaches. Before this project, texts were compare one against another, with embedding occurring on the spot. This represented a significant amount of the processing time. By storing an embedded version of the texts, we could manage a full corpus of text and compare a query text against dozens of other texts in a shorter time than one against one before.

We also built a RAG that could present the results in a natural manner to the user.

Master Thesis spring 2025

The goal of this Master thesis was to automatically extract and link together actors, events and locations present in the texts. They can then be visualise on a map, as a timeline or as a relation graph.

On top of that, users can interact with a chatbot (MistralAI) to ask about the extracted data in a natural fashion. This allows for Human-AI collaboration, improving results that would be obtained with only human or AI individually.