Tuesday, June 11, 2024, 12:30, 4A301

Agnieszka Ławrynowicz

Swift Linked Data Miner: Mining OWL 2 EL class expressions directly from online RDF datasets

The talk presents Swift Linked Data Miner, an interruptible algorithm that can directly mine an online Linked Data source (e.g., a SPARQL endpoint) for OWL 2 EL class expressions to extend an ontology with new axioms. The algorithm works by downloading only a small part of the Linked Data source at a time, building a smart index in the memory and swiftly iterating over the index to mine axioms. We propose a transformation function from mined axioms to RDF Data Shapes. We show, by means of a crowdsourcing experiment, that most of the axioms mined by Swift Linked Data Miner are correct and can be added to an ontology. We provide a ready to use Protégé plugin implementing the algorithm, to support ontology engineers in their daily modeling work.

Agnieszka Ławrynowicz is an Associate Professor at the Faculty of Computer Science and Telecommunications, Poznan University of Technology, and head of the Semantics and Knowledge Engineering Group. She is a member of the Scientific Council of the Polish Association for Artificial Intelligence, ECCAI, program and organizing committees of leading international conferences in the field of artificial intelligence and knowledge engineering (e.g. ISWC, K-CAP, EKAW, WWW, ECAI), chair of the Knowledge Engineering track at the conference of the Polish Association for Artificial Intelligence and member of the Editorial Committees of the journals Transactions on Graph Data and Knowledge and Semantic Web. She has led or participated in several research projects funded by the European Commission, Norwegian funds, the National Science Center, National Center for Research and Development, and as a member of the TAILOR European network of research laboratories on the topic of trustworthy artificial intelligence based on the integration of reasoning, learning, and optimization. She was a scholarship holder in the Marie-Curie program of the European Commission for a project on web mining at the University of Ulster, a winner of a grant in a program financed by the Foundation for Polish Science for a project in collaboration with Stanford University, a winner of an award for an outstanding monograph in computer science awarded by the Committee on Informatics of the Polish Academy of Sciences, a “Scientist of the Future” award, a promoter of the most innovative engineering thesis in Poland (competition under the auspices of the IEEE) and other awardees pursuing work in the field of artificial intelligence. She is an expert on ethics at the European Commission.

Tuesday, May 28, 2024, 11:45, 4A125

Concept.AI

DIG team

From Wikipedia: “Concept is a deduction party board game released in 2013. The game was designed by Alain Rivollet and Gaëtan Beaujannot and published by Repos Production. It has collected multiple awards and nominations including the Jeu de l’Année prize in Cannes in 2014.”

What Wikipedia does not say is that a team of AI experts has been working on an AI system to solve Concept. This session of the DIG seminar will see the unveiling of their work.

Tuesday, May 21, 2024, 11:45, 4A125

Surprise talks

DIG PhD students and emeritus professor

A series of talks about scientific topics, each containing a single mistake. The goal for the audience is to spot the mistake. Speakers get one point for each member of the audience who did not spot the mistake — but no points at all if no one found the mistake.

Tuesday, March 26, 2024, 11:45, 4A125

Mehwish Alam

Deep Learning for Analyzing On-line Media Discourse

This talk will mainly discuss the results of my two related projects I secured as a senior researcher at Karlsruhe Institute of Technology, Germany. One of the two projects, funded by European Union under H2020 program, ITflows – IT Tools for Managing Migration Flows focused on providing predictions of migration flows to enhance humanitarian support. The second project, ReNewRS – Responsible News Recommender Systems (funded by Baden-Württemberg Stiftung), focuses on the main question “Do online news recommender systems promote social polarization or even radicalization?” This project investigated the influence of algorithmic news selection on shaping public opinion.

Tuesday, February 13, 2024, 11:45, 4A125

Fabian Suchanek

Societal questions around large language models

I am trying to collect all societal issues that can come up in the context of large language models — from copyright to security and environmental problems. The talk will present what I found so far, and I will be happy to have your feedback. The talk is based on a lecture that I gave on the topic.

Tuesday, January 30, 2024, 11:45, 4A125

Nils Holzenberger

The AI, Law and Philosophy workshop at JURIX 2023

On December 18, 2023, I attended the AI, Law and Philosophy workshop at the JURIX conference in Maastricht. This seminar is about the presentations I have attended and the people I have met. This will include a summary of the topics and main discussion points at the workshop, as well as the presentation of my own paper. I have informally discussed a variety of research topics with workshop participants, and will report some of them. I will conclude with the main highlights from this workshop.

Tuesday, January 23, 2024, 11:45, 4A125

Mariam Barry

Adaptive Scalable Online Learning for Handling Heterogeneous Streaming Data in Large-Scale Banking Infrastructure

In this thesis, we have addressed different algorithmic and infrastructure challenges faced when dealing with online machine learning capabilities over high-volume data streams from heterogeneous sources. The research encompasses big data summarization, the construction of industrial knowledge graphs dynamically updated, online change detection, and the operationalization of streaming models in production. Initially, we introduced StreamFlow, an incremental algorithm and a system for big data summarization, generating feature vectors suitable for both batch and online machine learning tasks. These enriched features significantly enhance the performance of both time and accuracy for training batch and online machine-learning models. Subsequently, we proposed Stream2Graph, a stream-based solution facilitating the dynamic and incremental construction and updating of enterprise knowledge graphs. Experimental results indicated that leveraging graph features in conjunction with online learning notably enhances machine learning outcomes. Thirdly, we presented StreamChange, an explainable online change detection model designed for big data streaming, featuring constant space and time complexity. Real-world experiments demonstrated superior performance compared to state-of-the-art models, particularly in detecting both gradual and abrupt changes. Lastly, we demonstrated the operationalization of online machine learning in production, enabling horizontal scaling and incremental learning from streaming data in real-time. Experiments utilizing feature-evolving datasets with millions of dimensions validated the effectiveness of our MLOps pipelines. Our design ensures model versioning, monitoring, audibility, and reproducibility, affirming the efficiency of employing online learning models over batch methods in terms of both time and space complexity.

Tuesday, December 19, 2023, 11:45, 4A125

Rajaa El Hamdani

Towards Zero-Shot Knowledge Base Construction with Pretrained Large Language Models

Joint work with Mehwish Alam, Thomas Bonald, Fragkiskos Malliaros

Knowledge bases are critical tools for structuring and understanding information, yet creating them from scratch is expensive and time-consuming.

This paper presents a methodology for Knowledge Base Construction (KBC) using Pretrained Large Language Models (PLLMs), particularly focusing on extracting structured data from natural language texts. Our objective is to evaluate the efficiency of PLLMs, specifically GPT-4, in a zero-shot learning setting for KBC within the legal domain, using Wikipedia articles as our primary data source. This approach is unique in its domain and text-agnostic nature, enabling scalable applications across various fields by simply extending the taxonomy.

Our initial findings show that while GPT-4 exhibits high F1 scores for some properties, it struggles with those requiring deep domain understanding. Interestingly, GPT-4 also surfaced verifiable facts not present in our ground truth, indicating its potential for uncovering novel information.

Tuesday, December 12, 2023, 11:45, 4A125

Charbel-Raphael Segerie

https://crsegerie.github.io

An introduction to AI Safety

The rapid advancements in artificial intelligence is advancing quickly. While these technologies are awe-inspiring, models like ChatGPT or Bing Chat, although specifically developed to be polite and benevolent towards the user, can be easily manipulated.

In this presentation, we will address these major technical flaws. These models remain large black boxes and we cannot guarantee that their actions will conform to our expectations. A second flaw is the lack of robustness; the models are trained on a particular dataset and must therefore generalize to new situations during their deployment. The fact that Bing Chat threatens users when it was trained to help them illustrates this failure of generalization. The third flaw lies in the difficulty of specifying precisely to a model the desired objective, given the complexity and diversity of human values.

Then, we will address different solution paradigms: Specification techniques with Reinforcement Learning (RLHF and its variations), interpretability (how information is represented in neural networks, robustly editing a language model’s knowledge by modifying its memory, …), scalable oversight (training and alignment techniques that are likely to work even with human-level AIs).

Tuesday, November 21, 2023, 11:45, 4A301

Simon Delarue and Thomas Bonald

Sparse Graph Neural Networks with Scikit-network (Simon Delarue)

Joint work with Thomas Bonald

In recent years, Graph Neural Networks (GNNs) have undergone rapid development and have become an essential tool for building representations of complex relational data. Large real-world graphs, characterised by sparsity in relations and features, necessitate dedicated tools that existing dense tensor-centred approaches cannot easily provide. To address this need, we introduce a GNNs module in Scikit-network, a Python package for graph analysis, leveraging sparse matrices for both graph structures and features. Our contribution enhances GNNs efficiency without requiring access to significant computational resources, unifies graph analysis algorithms and GNNs in the same framework, and prioritises user-friendliness.

A Consistent Diffusion-Based Algorithm for Semi-Supervised Graph Learning (Thomas Bonald)

Joint work with Nathan De Lara

The task of semi-supervised classification aims at assigning labels to all nodes of a graph based on the labels known for a few nodes, called the seeds. One of the most popular algorithms relies on the principle of heat diffusion, where the labels of the seeds are spread by thermo-conductance and the temperature of each node at equilibrium is used as a score function for each label. In this paper, we prove that this algorithm is not consistent unless the temperatures of the nodes at equilibrium are centered before scoring. This crucial step does not only make the algorithm provably consistent on a block model but brings significant performance gains on real graphs.