Tuesday, February 4, 2025, 11:45, 4A125

Fabian Suchanek

YAGO

In this talk I will present the newest version of YAGO, the knowledge base that we are building with several members of the DIG team. I will show why we build it, how we build it, and how it can be used. This will also be an occasion for me to get your feedback on our work.

Tuesday, January 21, 2025, 11:45, 4A301

Simon Delarue

Learning on graphs: from algorithms to socio-technical analyses on AI

This thesis addresses the dual challenge of advancing Artificial Intelligence (AI) methods while critically assessing their societal impact. With AI technologies now embedded in high-stake decision sectors like healthcare and justice, their growing influence demands thorough examination, reflected in emerging international regulations such as the AI Act in Europe. To address these challenges, this work leverages attributed-graph based methods and advocates for a shift from performance-focused AI models to approaches that also prioritise scalability, simplicity, and explainability.

The first part of this thesis develops a toolkit of attributed graph-based methods and algorithms aimed at enhancing AI learning techniques. It includes a software contribution that leverages the sparsity of complex networks to reduce computational costs. Additionally, it introduces non-neural graph models for node classification and link predictions tasks, showing how these methods can outperform advanced neural networks while being more computationally efficient. Lastly, it presents a novel pattern mining algorithm that generates concise, human-readable summaries of large networks. Together, these contributions highlight the potential of these approaches to provide efficient and interpretable solutions to AI’s technical challenges.

The second part adopts an interdisciplinary approach to study AI as a socio-technical system. By framing AI as an ecosystem influenced by various stakeholders and societal concerns, it uses graph-based models to analyse interactions and tensions related to explainability, ethics, and environmental impact. A user study explores the influence of graph-based explanations on user perceptions of AI recommendations, while the building and analysis of a corpus of AI ethics charters and manifestos quantifies the roles of key actors in AI governance. A final study reveals that environmental concerns in AI are primarily framed technically, highlighting the need for a broader approach to the ecological implications of digitalisation.

Tuesday, December 10, 2024, 11:45, 4A125

Lanfang Kong

Explainable algorithms for anomaly detection and time series forecasting

Artificial intelligence has shown dominant performance across diverse domains, including critical ones such as medicine, finance, justice and so on. As a result, the explainability of black-box models is becoming more and more important. We focus on two specific applications: anomaly detection and time series forecasting, and present XTREK and ADAPATCH, respectively.

XTREK is an unsupervised tree-based approach for explainable anomaly detection, which maximizes Kendall’s tau between the anomaly scores of the source anomaly detector and those of XTREK. The tree produced by our algorithm is relatively small in size, thereby boasting the renowned off-the-shelf transparency and explainability of tree-based approaches. Moreover, its explanations are sample-based. In particular, the anomaly scores are computed to be the inverse of the size of the corresponding leaf, thereby providing meaningful explanations when comparing examples with different anomaly scores. XTREK can also be used as an in-model approach, which is capable of providing concise explanations for its own decisions. Moreover, we propose efficient computation of Kendall’s tau coefficients when determining the best split at each node of the regression tree. We show how this can be computed incrementally, thereby making the running time of our algorithm almost linear (up to a logarithmic factor) in the size of the input.

ADAPATCH is an adaptive patch-based saliency map method for explainable time series forecasting, which provides local, post-hoc visualization explanations. The approach highlights those patches which would result in worse predictions when hidden to the black-box algorithm. With a differential encoding module in the mask of input, the optimization can be done by gradient-based perturbation. ADAPATCH does not need the patch parameters upfront, such as the length or the stride, as all patch-based approaches need. In fact, it learns those parameters from the data, thereby effectively adapting to different settings and application scenarios. By enforcing an upper bound on the maximum number of patches, we make sure that the patch-level explanations provided by our algorithm can be easily interpreted by humans, as opposed to explanations consisting of a large number of single time points. Moreover, ADAPATCH requires a much smaller number of parameters, typically linear in the number of patches as opposed to linear in the number of time steps. This makes our approach more efficient and easy to train.

Both methods are model-agnostic, which means the architecture of the black-box model can be hidden from the users. They provide accurate and simple explanations, as validated by extensive experiments.

Tuesday, December 3, 2024, 11:45, 4A125

Gabriel Damay

Dynamic Decision Trees and Community-based Graph Embeddings: towards Interpretable Machine Learning

Machine learning is the field of computer science that interests in building models and solutions from data without knowing exactly the set of instructions internal to these models and solutions. This field has achieved great results but is now under scrutiny for the inability to understand or audit its models among other concerns. Interpretable Machine Learning addresses these concerns by building models that are inherently interpretable. This thesis contributes to Interpretable Machine Learning in two ways.

First, we study decision trees. This is a very popular group of machine learning methods for classification problems and it is interpretable by design. However, real world data is often dynamic, but few algorithms can maintain a decision tree when data can be both inserted and deleted from the training set. We propose a new algorithm called FuDyADT to solve this problem.

Second, when data are represented as graphs, a very common machine learning technique called “embedding” consists in projecting them onto a vectorial space. This kind of method however is usually not interpretable. We propose a new embedding algorithm called PaRFaITe based on the factorization of the Personalized PageRank matrix. This algorithm is designed to provide interpretable results.

We study both algorithms theoretically and experimentally. We show that FuDyADT is at least comparable to state-of-the-art algorithms in the usual setting, while also being able to handle unusual settings such as deletions of data. PaRFaITe on the other hand produces embedding dimensions that align with the communities of the graph, making the embedding interpretable.

Tuesday, November 12, 2024, 11:45, 4A125

Cyril Chhun

Methodology and Meta-Evaluation Benchmark for Automatic Story Generation

Storytelling is a central component of human culture. Multiple approaches have been proposed to explore computational storytelling, despite the inherent challenges posed by the tasks of generating stories and assessing their quality. In this thesis, we design a meta-evaluation methodology and benchmark for ASG. First, we lay the groundwork for conducting our meta-evaluation: we describe our chosen setting, provide definitions for the ASG and Automatic Story Evaluation (ASE) tasks, and propose an original set of six criteria for story evaluation. Then, we introduce HANNA, our corpus of Human ANnotated NArratives, which contains 1,056 stories annotated w.r.t. our six criteria, and show that those criteria allow for a standardized human evaluation. We use Large Language Models (LLMs) to augment HANNA with 480 new stories and 150k+ rating annotations. We observe that LLMs obtain better grades than humans, as rated by selected LLMs. After that, we perform our meta-evaluation benchmark on HANNA. We mainly observe that specific measures for ASE are needed, and that commonly-used measures (e.g. BLEU) are sub-optimal. We then show our analysis of LLM performance at ASE: we find that LLMs are currently the best proxy for human evaluation of ASG and that, in our specific setting, providing detailed guidelines does not improve correlations between LLM and human ratings. Those results prompt us to study whether the performance displayed by LLMs at ASE and ASG can be explained through different factors. We perform a three-part study on LLM-generated explanations, and an analysis of pretraining data on LLM performance. Notably, we find that LLMs struggle to explain their answers with substantiated claims. Finally, we outline three main research perspectives: designing specific ASE measures, further investigating LLM performance at ASG and ASE, and assessing and mitigating the impact of LLMs on society.

References:

Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation (COLING 2022)

Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation (TACL 2024)

Tuesday, October 29, 2024, 11:45, 4A125

Simon Coumes

Qiana: A First-Order Formalism to Quantify over Contexts and Formulas

Qiana is a logic framework for reasoning on formulas that are true only in specific contexts. In Qiana, it is possible to quantify over both formulas and contexts to express, e.g., that “everyone knows everything Alice says”. Qiana also permits paraconsistent logics within contexts, so that contexts can contain contradictions. Furthermore, Qiana is based on first-order logic, and is finitely axiomatizable, so that Qiana theories are compatible with pre-existing first-order logic theorem provers.

Tuesday, October 15, 2024, 11:45, 4A301

Yael Amsterdamer & Daniel Deutch

Query-Guided Data Cleaning (Yael Amsterdamer)

We take an active approach to the cleaning of uncertain databases, by proposing a set of tools to guide the cleaning process. We start with a database whose tuple correctness is uncertain, and with some means of resolving this uncertainty, e.g., crowdsourcing, experts, a trained ML model or external sources. Guided by a query that defines what part of the data is of importance, our goal is to select tuples whose cleaning would effectively resolve uncertainty in query results. In other words, we develop a query-guided process for the resolution of uncertain data. Our approach combines techniques from different fields, including the use of provenance information to capture the propagation of errors to query results and Boolean interactive evaluation to decide which input tuples to clean based on their role in output derivation or effect on uncertainty.

Yael Amsterdamer is a Professor at the Department of Computer Science, Bar-Ilan University, and the head of the Data Management Lab. She received her Ph.D. in Computer Science from Tel-Aviv University, and has been a visiting Scholar at the University of Pennsylvania, Philadelphia, PA and jointly at Télécom Paris and INRIA institute (Paris, France). Her research is in the field of interactive data management spanning topics such as crowd-powered data management, interactive summarization and data cleaning. Her research was awarded multiple competitive grants including the Israeli Science Foundation (ISF) personal grants, the Israeli Ministry of Science (MOST) grant, and the BIU Center for Research in Applied Cryptography and Cyber Security Personal Grant.

Explanations in Data Science (Daniel Deutch)

Data Science involves complex processing over large-scale data for decision support, and much of this processing is done by black boxes such as Data Cleaning Modules, Database Management Systems, and Machine Learning modules. Decision support should be transparent but the combination of complex computation and large-scale data yields many challenges in this respect. Interpretability has been extensively studied in both the data management and in the machine learning communities, but the problem is far from being solved. I will present an holistic approach to the problem that is based on two facets, namely counterfactual explanations and attribution-based explanations. I will demonstrate the conceptual and computational challenges, as well as some main results we have achieved in this context.

Daniel Deutch is a Full Professor in the Computer Science Department of Tel Aviv University. Daniel has received his Ph.D. degree in Computer Science from Tel Aviv University. He was a postdoctoral fellow at the University of Pennsylvania and INRIA France. His research focuses on advanced database applications and web data management, studying both theoretical and practical aspects of issues such as data provenance, analysis of web applications and data, and dealing with data uncertainty. Daniel’s research has been disseminated by papers in the top conferences and journals on data and web data management (VLDB, SIGMOD/PODS, VLDBJ, TODS, etc.) He has won a number of research awards including the VLDB best paper award, the Krill Prize (awarded by the Wolf Foundation) and the Yahoo! Early Career Award. His research was awarded multiple competitive grants including the European Research Council (ERC) Personal Research Grant and grants by the Israeli Science Foundation (ISF, twice), the US-Israel Binational Science Foundation (BSF), the Broadcom Foundation, the Israeli Ministry of Science (MOST), the Blavatnik Interdisciplinary Cyber Research Institute (ICRC), Intuit and Intel.

Tuesday, October 8, 2024, 11:45, 4A125

Rajaa El Hamdani & Yiwen Peng

Refining Wikidata Taxonomy using Large Language Models (Yiwen Peng)

Due to its collaborative nature, Wikidata is known to have a complex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a combination of Large Language Models (LLMs) and graph mining techniques. Operations on the taxonomy, such as cutting links or merging classes, are performed with the help of zero-shot prompting on an open-source LLM. The quality of the refined taxonomy is evaluated from both intrinsic and extrinsic perspectives, on a task of entity typing for the latter, showing the practical interest of WiKC.

The Factuality of Large Language Models in the Legal Domain (Rajaa El Hamdani)

This paper investigates the factuality of large language models (LLMs) as knowledge bases in the legal domain, in a realistic usage scenario: we allow for acceptable variations in the answer, and let the model abstain from answering when uncertain. First, we design a dataset of diverse factual questions about case law and legislation. We then use the dataset to evaluate several LLMs under different evaluation methods, including exact, alias, and fuzzy matching. Our results show that the performance improves significantly under the alias and fuzzy matching methods. Further, we explore the impact of abstaining and in-context examples, finding that both strategies enhance precision. Finally, we demonstrate that additional pretraining on legal documents, as seen with SaulLM, further improves factual precision from 63% to 81%.

Tuesday, September 24, 2024, 11:45, 4A125

Ambroise Odonnat

Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias

Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples. For neural networks, softmax prediction probabilities are often used as a confidence measure, although they are known to be overconfident, even for wrong predictions. This phenomenon is particularly intensified in the presence of sample selection bias, i.e., when data labeling is subject to some constraints. To address this issue, we propose a novel confidence measure, called T-similarity, built upon the prediction diversity of an ensemble of linear classifiers. We provide the theoretical analysis of our approach by studying stationary points and describing the relationship between the diversity of the individual members and their performance. We empirically demonstrate the benefit of our confidence measure for three different pseudo-labeling policies on classification datasets of various data modalities.