We were happy to host Robert Wrembel from Poznan University of Technology (PUT) for an inspiring talk on Data Integration: Remaining Challenges and Research Paths.

Abstract: Data integration (DI) has been a cornerstone of computer science research for decades, resulting in a few established reference architectures. They generally fall into three categories: virtual (federated and mediated), physical (data warehouse), and hybrid (data lake, data lakehouse, and data mesh). Regardless of the paradigm, these architectures depend on an integration layer, implemented by means of sophisticated software designed to orchestrate and execute DI processes. The integration layer is responsible for ingesting data from various sources (typically heterogeneous and distributed) and for homogenizing data into formats suitable for future processing and analysis. On the one hand, in all business domains, large volumes of highly heterogeneous data are produced, e.g., medical systems, smart cities, smart agriculture, which require further advancements in the data integration technologies. On the other hand, the widespread adoption of artificial intelligence (AI) solutions is now extending towards DI, offering alternative solutions, opening new research paths, and generating new open problems. Emerging paradigms, such as Data Spaces and the Model Context Protocol, further advance DI. This talk will then present (1) overview the research field of DI, (3) highlight remaining challenges, and (3) outline ML/AI solutions for DI. The findings presented in the talk are based on my experience in running research and development DI projects for various business entities.

Short Biography: Robert Wrembel (PhD, Dr. Habil.) is a professor in the Faculty of Computing and Telecommunications at Poznan University of Technology (PUT), Poland. He received his habilitation in 2008, specializing in database systems and data warehouses. His primary research includes data integration, data quality, databases, data warehouses, and data lakes. He held a few administrative roles at PUT, including two terms as deputy dean of the Faculty of Computing and Management (2008–2012) and the Faculty of Computing (2012–2016). Since Jan 2023, he has chaired the Data Processing Technologies research group at PUT.

Updated: