Rajaa El Hamdani
Towards Zero-Shot Knowledge Base Construction with Pretrained Large Language Models
Joint work with Mehwish Alam, Thomas Bonald, Fragkiskos Malliaros
Knowledge bases are critical tools for structuring and understanding information, yet creating them from scratch is expensive and time-consuming.
This paper presents a methodology for Knowledge Base Construction (KBC) using Pretrained Large Language Models (PLLMs), particularly focusing on extracting structured data from natural language texts. Our objective is to evaluate the efficiency of PLLMs, specifically GPT-4, in a zero-shot learning setting for KBC within the legal domain, using Wikipedia articles as our primary data source. This approach is unique in its domain and text-agnostic nature, enabling scalable applications across various fields by simply extending the taxonomy.
Our initial findings show that while GPT-4 exhibits high F1 scores for some properties, it struggles with those requiring deep domain understanding. Interestingly, GPT-4 also surfaced verifiable facts not present in our ground truth, indicating its potential for uncovering novel information.