Tuesday, October 28, 2025, 11:45, 4A125

Cristian Santini (University of Macerata)

Entity Linking and Relation Extraction for Historical Italian Texts: Challenges and Potential Solutions

Entity Linking and Relation Extraction enable the automatic identification of named entities mentioned in texts, along with their relationships, by connecting them to external knowledge graphs such as Wikidata. While these techniques work well on modern documents, applying them to historical texts presents significant challenges due to the diachronic evolution of language and limited resources for training computational models. This seminar presents recent work on developing methods and datasets for processing historical Italian texts. It will discuss the creation of a new benchmark dataset extracted from digital scholarly editions covering two centuries of Italian literary and political writing. The talk will then present approaches that enhance entity disambiguation by incorporating temporal and contextual information from external Wikidata. Finally, it will detail a method for automatically constructing knowledge graphs from historical correspondence that combines multiple language models in sequence, demonstrating how these technologies can facilitate the exploration and understanding of historical archives without requiring extensive manual annotation or model training.