Tuesday, September 24, 2024, 11:45, 4A125

Ambroise Odonnat

Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias

Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples. For neural networks, softmax prediction probabilities are often used as a confidence measure, although they are known to be overconfident, even for wrong predictions. This phenomenon is particularly intensified in the presence of sample selection bias, i.e., when data labeling is subject to some constraints. To address this issue, we propose a novel confidence measure, called T-similarity, built upon the prediction diversity of an ensemble of linear classifiers. We provide the theoretical analysis of our approach by studying stationary points and describing the relationship between the diversity of the individual members and their performance. We empirically demonstrate the benefit of our confidence measure for three different pseudo-labeling policies on classification datasets of various data modalities.

Tuesday, July 9, 2024, 11:45, 4A125

Peter Fratrič

Mining behavior from a legal simulation environment: where we are and what lies ahead

This talk presents a methodological framework for the use of simulation-based methods to investigate questions of non-compliance in a legal context. Its aim is to generate observed or previously unobserved instances of non-compliance and use them to improve compliance and trust in a given socio-economic infrastructure. The framework consists of three components: a law formalization process resulting in a normative system implemented as an agent-based model, a profit-driven agent generating instances of non-compliance, and a norm extraction process transforming the generated behavior into a formal model. Early research results of practical implementation of this methodology are illustrated on a multinational tax avoidance case. Towards the end, we focus on open issues related to behavior clustering and data/process mining.

Tuesday, July 2, 2024, 12:15, 4A301

Chadi Helwe

PhD defense practice talk

This thesis focuses on evaluating and improving the reasoning abilities of Smaller Language Models (SLMs) and Large Language Models (LLMs). It explores SLMs’ performance on complex tasks and their limitations with simpler ones. This thesis introduces LogiTorch, a Python library that facilitates the training of models on various reasoning tasks with minimal coding. It also presents TINA, a negated data augmentation technique that improves SLMs’  robustness to negation in textual entailment tasks. Further, this thesis explores LLMs’ capabilities through MAFALDA, a new benchmark for identifying and classifying reasoning fallacies, proposing a new annotation scheme and evaluation metric that considers subjectivity in reasoning. The findings indicate that humans outperform SLMs and LLMs in this reasoning task. We propose several research directions that merit further investigation, such as investigating Neuro-symbolic AI and improving the reasoning abilities of low-resource LLMs.

Tuesday, June 18, 2024, 11:45, 4A125

Shady Elbassuoni

Data Centric Fake News Detection During Armed Conflicts

Armed conflicts continue to be a major global issue, causing widespread human suffering, displacement, and economic instability. Fake news can further fuel armed conflicts by manipulating public perception, inciting violence, and undermining efforts towards resolution. In this talk, I will argue why a one-size-fits-all approach for fake news detection is not adequate during armed conflicts. I will then present a data-centric approach for fake news detection, focusing on the Syrian civil war as a case study. The approach utilizes a knowledge graph of conflict casualties to construct a fake news dataset, and then employs meta-learning to automatically detect fake news. I will present experimental results that demonstrate the effectiveness of this approach compared to various baselines, and will conclude with a few potential avenues for future research.

Tuesday, June 11, 2024, 12:30, 4A301

Agnieszka Ławrynowicz

Swift Linked Data Miner: Mining OWL 2 EL class expressions directly from online RDF datasets

The talk presents Swift Linked Data Miner, an interruptible algorithm that can directly mine an online Linked Data source (e.g., a SPARQL endpoint) for OWL 2 EL class expressions to extend an ontology with new axioms. The algorithm works by downloading only a small part of the Linked Data source at a time, building a smart index in the memory and swiftly iterating over the index to mine axioms. We propose a transformation function from mined axioms to RDF Data Shapes. We show, by means of a crowdsourcing experiment, that most of the axioms mined by Swift Linked Data Miner are correct and can be added to an ontology. We provide a ready to use Protégé plugin implementing the algorithm, to support ontology engineers in their daily modeling work.

Agnieszka Ławrynowicz is an Associate Professor at the Faculty of Computer Science and Telecommunications, Poznan University of Technology, and head of the Semantics and Knowledge Engineering Group. She is a member of the Scientific Council of the Polish Association for Artificial Intelligence, ECCAI, program and organizing committees of leading international conferences in the field of artificial intelligence and knowledge engineering (e.g. ISWC, K-CAP, EKAW, WWW, ECAI), chair of the Knowledge Engineering track at the conference of the Polish Association for Artificial Intelligence and member of the Editorial Committees of the journals Transactions on Graph Data and Knowledge and Semantic Web. She has led or participated in several research projects funded by the European Commission, Norwegian funds, the National Science Center, National Center for Research and Development, and as a member of the TAILOR European network of research laboratories on the topic of trustworthy artificial intelligence based on the integration of reasoning, learning, and optimization. She was a scholarship holder in the Marie-Curie program of the European Commission for a project on web mining at the University of Ulster, a winner of a grant in a program financed by the Foundation for Polish Science for a project in collaboration with Stanford University, a winner of an award for an outstanding monograph in computer science awarded by the Committee on Informatics of the Polish Academy of Sciences, a “Scientist of the Future” award, a promoter of the most innovative engineering thesis in Poland (competition under the auspices of the IEEE) and other awardees pursuing work in the field of artificial intelligence. She is an expert on ethics at the European Commission.

Tuesday, May 28, 2024, 11:45, 4A125

Concept.AI

DIG team

From Wikipedia: “Concept is a deduction party board game released in 2013. The game was designed by Alain Rivollet and Gaëtan Beaujannot and published by Repos Production. It has collected multiple awards and nominations including the Jeu de l’Année prize in Cannes in 2014.”

What Wikipedia does not say is that a team of AI experts has been working on an AI system to solve Concept. This session of the DIG seminar will see the unveiling of their work.

Tuesday, May 21, 2024, 11:45, 4A125

Surprise talks

DIG PhD students and emeritus professor

A series of talks about scientific topics, each containing a single mistake. The goal for the audience is to spot the mistake. Speakers get one point for each member of the audience who did not spot the mistake — but no points at all if no one found the mistake.

Tuesday, March 26, 2024, 11:45, 4A125

Mehwish Alam

Deep Learning for Analyzing On-line Media Discourse

This talk will mainly discuss the results of my two related projects I secured as a senior researcher at Karlsruhe Institute of Technology, Germany. One of the two projects, funded by European Union under H2020 program, ITflows – IT Tools for Managing Migration Flows focused on providing predictions of migration flows to enhance humanitarian support. The second project, ReNewRS – Responsible News Recommender Systems (funded by Baden-Württemberg Stiftung), focuses on the main question “Do online news recommender systems promote social polarization or even radicalization?” This project investigated the influence of algorithmic news selection on shaping public opinion.