Knowledge Bases
A knowledge base is a computer-processable collection of knowledge about the world. We construct and mine such knowledge bases.
- YAGO: YAGO is a large ontology constructed from WordNet, Wikipedia, and other sources. We develop YAGO together with the Database department of the Max Planck Institute for Informatics in Germany.
- AMIE: AMIE is a project to learn patterns and rules in ontologies. We conduct this project together with the Database department of the Max Planck Institute for Informatics in Germany.
- KB-LM is our new project to marry knowledge bases and large language models.
Graph Mining
Graphs are a near-universal way to represent data. We are concerned with mining graphs for patterns and properties. Our particular focus is on the scalability of such approaches.
- scikit-network: scikit-network is a Python package for the analysis of large graphs (clustering, embedding, classification, ranking).
Social Web
The Web has evolved more and more into a social Web: content is produced and shared by users. In the DIG team, we follow and anticipate developments in this area.
- Community detection: We are investigating means to detect and distinguish social communities on the Web.
- Social Relations: We investigate the optimal investment in social relations from a theoretical point of view.
Language and Relevance
Computer science is not just about computers. In this area of research, we investigate how humans reason, and what this implies for machines.
- Simplicity Theory: Simplicity theory seeks to explain the relevance of situations or events to human minds. See http://www.simplicitytheory.science
- Relevance in natural language: The point is to retro-engineer methods to achieve meaningful and relevant speech from our understanding of human performance. Read this paper. Read more on this.
- Communication as social signalling: We apply game theory and social simulation to explore conditions in which providing valuable (i.e. relevant) information is a profitable strategy. Read this paper. Read more on this.
Machine Learning for Data Streams
We investigate how to do machine learning in real time, contributing to new open source tools:
- River: a Python library for online Machine Learning
- MOA: Massive Online Analytics, a framework for mining data streams (in Java)
- Apache SAMOA: Scalable Advanced Massive Online Analytics, an open source framework for data stream mining on the Hadoop Ecosystem