The DIG seminar takes place on a regular basis with both invited speakers and speakers from within the DIG team. Seminars from before September 2023 can be found here.

  • Tuesday, March 26, 11:45, 4A125

    Mehwish Alam

    Deep Learning for Analyzing On-line Media Discourse

    This talk will mainly discuss the results of my two related projects I secured as a senior researcher at Karlsruhe Institute of Technology, Germany. One of the two projects, funded by European Union under H2020 program, ITflows – IT Tools for Managing Migration Flows focused on providing predictions of migration flows to enhance humanitarian support. The second project, ReNewRS – Responsible News Recommender Systems (funded by Baden-Württemberg Stiftung), focuses on the main question “Do online news recommender systems promote social polarization or even radicalization?” This project investigated the influence of algorithmic news selection on shaping public opinion.

  • Tuesday, February 13, 11:45, 4A125

    Fabian Suchanek

    Societal questions around large language models

    I am trying to collect all societal issues that can come up in the context of large language models — from copyright to security and environmental problems. The talk will present what I found so far, and I will be happy to have your feedback. The talk is based on a lecture that I gave on the topic.

  • Tuesday, January 30, 11:45, 4A125

    Nils Holzenberger

    The AI, Law and Philosophy workshop at JURIX 2023

    On December 18, 2023, I attended the AI, Law and Philosophy workshop at the JURIX conference in Maastricht. This seminar is about the presentations I have attended and the people I have met. This will include a summary of the topics and main discussion points at the workshop, as well as the presentation of my own paper. I have informally discussed a variety of research topics with workshop participants, and will report some of them. I will conclude with the main highlights from this workshop.

  • Tuesday, January 23, 2024, 11:45, 4A125

    Mariam Barry

    Adaptive Scalable Online Learning for Handling Heterogeneous Streaming Data in Large-Scale Banking Infrastructure

    In this thesis, we have addressed different algorithmic and infrastructure challenges faced when dealing with online machine learning capabilities over high-volume data streams from heterogeneous sources. The research encompasses big data summarization, the construction of industrial knowledge graphs dynamically updated, online change detection, and the operationalization of streaming models in production. Initially, we introduced StreamFlow, an incremental algorithm and a system for big data summarization, generating feature vectors suitable for both batch and online machine learning tasks. These enriched features significantly enhance the performance of both time and accuracy for training batch and online machine-learning models. Subsequently, we proposed Stream2Graph, a stream-based solution facilitating the dynamic and incremental construction and updating of enterprise knowledge graphs. Experimental results indicated that leveraging graph features in conjunction with online learning notably enhances machine learning outcomes. Thirdly, we presented StreamChange, an explainable online change detection model designed for big data streaming, featuring constant space and time complexity. Real-world experiments demonstrated superior performance compared to state-of-the-art models, particularly in detecting both gradual and abrupt changes. Lastly, we demonstrated the operationalization of online machine learning in production, enabling horizontal scaling and incremental learning from streaming data in real-time. Experiments utilizing feature-evolving datasets with millions of dimensions validated the effectiveness of our MLOps pipelines. Our design ensures model versioning, monitoring, audibility, and reproducibility, affirming the efficiency of employing online learning models over batch methods in terms of both time and space complexity.

  • Tuesday, December 19, 2023, 11:45, 4A125

    Rajaa El Hamdani

    Towards Zero-Shot Knowledge Base Construction with Pretrained Large Language Models

    Joint work with Mehwish Alam, Thomas Bonald, Fragkiskos Malliaros

    Knowledge bases are critical tools for structuring and understanding information, yet creating them from scratch is expensive and time-consuming.

    This paper presents a methodology for Knowledge Base Construction (KBC) using Pretrained Large Language Models (PLLMs), particularly focusing on extracting structured data from natural language texts. Our objective is to evaluate the efficiency of PLLMs, specifically GPT-4, in a zero-shot learning setting for KBC within the legal domain, using Wikipedia articles as our primary data source. This approach is unique in its domain and text-agnostic nature, enabling scalable applications across various fields by simply extending the taxonomy.

    Our initial findings show that while GPT-4 exhibits high F1 scores for some properties, it struggles with those requiring deep domain understanding. Interestingly, GPT-4 also surfaced verifiable facts not present in our ground truth, indicating its potential for uncovering novel information.