Seminars - page 2
The DIG team holds a seminar about every two weeks with speakers either from the team, or invited.
You can add the seminars to your calendar with this ics file, and get emails about future seminars by subscribing to our mailing-list.
If you would like to present your work at our seminar, please contact Nils.
Upcoming Seminars
Title TBA
Tuesday, March 31, 2026 11:45, 1D23
Duy Nguyen Ho Minh
Abstract TBA
Toward Responsible Natural Language Processing : Ideal, Illusion, or Imperative?
Tuesday, April 14, 2026 11:45, 1A312
Antoine Gourru (Télécom Saint-Etienne)
Large language models have profoundly transformed natural language processing and are increasingly reshaping work, knowledge production, and social organization, yet they remain misaligned with societal values and demand substantial computational resources. In this seminar, I will present my research on responsible NLP, structured around two central pillars: fairness and frugality. Through a scientific overview of selected recent and ongoing works, I will discuss methods to assess and mitigate alignment failures, and to develop resource-efficient approaches that promote more sustainable NLP systems.
Data Integration: Remaining Challenges and Research Paths
Tuesday, May 19, 2026 11:45, 4A301
Robert Wrembel (Poznań University of Technology)
Data integration (DI) has been a cornerstone of computer science research for decades, resulting in a few established reference architectures. They generally fall into three categories: virtual (federated and mediated), physical (data warehouse), and hybrid (data lake, data lakehouse, and data mesh). Regardless of the paradigm, these architectures depend on an integration layer, implemented by means of sophisticated software designed to orchestrate and execute DI processes. The integration layer is responsible for ingesting data from various sources (typically heterogeneous and distributed) and for homogenizing data into formats suitable for future processing and analysis. On the one hand, in all business domains, large volumes of highly heterogeneous data are produced, e.g., medical systems, smart cities, smart agriculture, which require further advancements in the data integration technologies. On the other hand, the widespread adoption of artificial intelligence (AI) solutions is now extending towards DI, offering alternative solutions, opening new research paths, and generating new open problems. Emerging paradigms, such as Data Spaces and the Model Context Protocol, further advance DI.
Past Seminars
FLORA: Unsupervised Knowledge Graph Alignment by Fuzzy Logic
Tuesday, October 21, 2025 11:45, 4A301
Yiwen Peng & Fabian Suchanek
Knowledge graph alignment is the task of matching equivalent entities (that is, instances and classes) and relations across two knowledge graphs. Most existing methods focus on pure entity-level alignment, computing the similarity of entities in some embedding space. They lack interpretable reasoning and need training data to work. To solve these issues, we introduce FLORA, a simple yet effective method that (1) is unsupervised, i.e., does not require training data, (2) provides a holistic alignment for entities and relations iteratively, (3) is based on fuzzy logic and thus delivers interpretable results, (4) provably converges, (5) allows dangling entities, i.e., entities without a counterpart in the other KG, and (6) achieves state-of-the-art results on major benchmarks.
Data- and knowledge-driven approaches for step-by-step guidance to differential diagnosis
Tuesday, October 07, 2025 11:45, 4A301
Adrien Coulet (INRIA)
Diagnosis guidelines provide recommendations based on expert consensus that cover the majority of the population, but often overlook patients with uncommon conditions or multiple morbidities. We will present and compare two alternative approaches that provide a step-by-step guidance to the differential diagnosis of anemia and lupus. The first approach relies on reinforcement learning and observational data. The second on large langage models and domain knowledge.
Meaning Representations and Reasoning in the Age of Large Language Models
Tuesday, September 30, 2025 11:45, 3A301
Zacchary Sadeddine
This thesis explores how to make large language models (LLMs) more reliable and transparent in their reasoning. It first examines around fifteen societal issues related to these models, such as disinformation or user overreliance, and then investigates symbolic structures from linguistics and how they can be used to improve the performance and transparency of LLMs. It presents VANESSA, a reasoning neuro-symbolic system that combines the power of neural models with the rigor of symbolic reasoning, achieving performance comparable to LLMs while remaining transparent. Finally, it addresses the problem of verifying LLM outputs by introducing a step-by-step verification benchmark, paving the way for more interpretable, controllable and trustworthy artificial intelligence systems.
Robust Knowledge Graph Cleaning
Tuesday, May 27, 2025 11:45, 4A301
Maximilian Egger
Data quality is needed to properly and reliably use the information represented in the dataset. The increasing volume of data renders data preparation and cleaning increasingly difficult. Additionally, more diverse types of data structures for databases, like graphs, get used and need to be handled differently. This leads to the necessity of robust methods to increase data integrity, scalable approaches for finding and fixing errors, and local-oriented algorithms that can be used to pinpoint attention where needed.
Synthesis & Augmentation of Tabular Data In the Age of Foundation Models
Tuesday, May 13, 2025 11:45, 4A301
Nikola Simidjievski
Foundation models - large pre-trained performant models - have shown remarkable success in applications that predominately focus on vision, language, and sound data. On the other hand, tabular data - one of the most prevalent data modalities in many critical domains of business, science, and healthcare - has seen limited benefits from these advances. Tabular data poses unique challenges that relate to heterogeneity, dimensionality, and scarcity as well as lack of explicit symmetries, implicit structures and incomplete prior knowledge – all of which have limiting effects on how we construct, train and apply/transfer large models for tabular data.
GPTKB: Comprehensively Materializing Factual LLM Knowledge
Tuesday, April 29, 2025 11:45, 4A301
Simon Razniewski (TU Dresden)
LLMs have majorly advanced NLP and AI, and next to their ability to perform a wide range of procedural tasks, a major success factor is their internalized factual knowledge. Since (Petroni et al., 2019), analyzing this knowledge has gained attention. However, most approaches investigate one question at a time via modest-sized pre-defined samples, introducing an “availability bias” (Tversky and Kahneman, 1973) that prevents the discovery of knowledge (or beliefs) of LLMs beyond the experimenter’s predisposition. To address this challenge, we propose a novel methodology to comprehensively materialize an LLM’s factual knowledge through recursive querying and result consolidation. As a prototype, we employ GPT-4o-mini to construct GPTKB, a large-scale knowledge base (KB) comprising 101 million triples for over 2.9 million entities. This work marks a milestone in two areas: For LLM research, for the first time, it provides constructive insights into the scope and structure of LLMs’ knowledge (or beliefs), and its strengths and weaknesses. For KB construction, it pioneers new pathways for the long-standing challenge of general-domain KB construction. GPTKB is accessible at https://gptkb.org.
ProvSQL: Provenance and Probabilistic Querying in Uncertain Databases
Tuesday, April 08, 2025 11:45, 4A125
Pratik Karmakar (None)
Probabilistic databases provide a powerful framework for managing and querying uncertain data, enabling principled reasoning under uncertainty. ProvSQL extends PostgreSQL to support provenance tracking and probability computation in probabilistic databases, leveraging provenance circuits to efficiently compute probabilities and Shapley-based data valuations. In this talk, we introduce ProvSQL, demonstrate its capabilities, and explore a key use case—content based image retrieval from the COCO dataset. We show how probabilistic query evaluation and data valuation techniques enhance explainability and trust in AI-driven decision-making.
Tabular foundation models: priors for numbers and strings
Tuesday, March 25, 2025 11:45, 4A301
Gaël Varoquaux (INRIA)
Deep-learning typically does not outperform tree-based models on tabular data. Often this may be explained by the small size of such datasets. For images, sound, text, the solution has be pretrained models, leading to foundation models, adapted and reused for many tasks. I will discuss the challenges to bring these ideas to tabular learning, and the progress that we have made, building priors for tables, ie columns of different natures, with numbers and strings.
Neuro-symbolic approaches for the knowledge graph lifecycle
Tuesday, March 18, 2025 11:45, 4A301
Pierre Monnin (INRIA)
In the Web of Data, an increasing number of knowledge graphs (KGs) are concurrently published, edited, and accessed by human and software agents. Their wide adoption makes essential the tasks of their lifecycle: construction, refinement (e.g., matching, link prediction), mining, and usage to support applications (e.g., explainable AI, recommender systems). However, all these tasks require facing the inherent heterogeneity of KGs, e.g., in terms of granularities, vocabularies, and completeness. Besides, scalability issues arise due to their increasing size and combinatorial nature. In my talk, I will present my research on neuro-symbolic approaches for the KG lifecycle, intertwining domain knowledge from ontologies, deductive reasoning, analogical reasoning, and machine learning models. Throughout my presentation, I will show that such approaches enhance models by improving their semantic awareness, frugality, and the semantic interpretability of their latent representation space.
None
Tuesday, March 04, 2025 11:45, 4A301
Ken Satoh (None)
None