Knowledge Bases
A knowledge base is a computer-processable collection of knowledge about the world. We construct and mine such knowledge bases.
YAGO: YAGO is a large ontology constructed from WordNet, Wikipedia, and other sources. We develop YAGO together with the Database department of the Max Planck Institute for Informatics in Germany.- AMIE: AMIE is a project to learn patterns and rules in ontologies. We conduct this project together with the Database department of the Max Planck Institute for Informatics in Germany.
- KB-LM is our new project to marry knowledge bases and large language models.
Graph Mining
Graphs are a near-universal way to represent data. We are concerned with mining graphs for patterns and properties. Our particular focus is on the scalability of such approaches.
scikit-network: scikit-network is a Python package for the analysis of large graphs (clustering, embedding, classification, ranking).
Data Streams
We investigate how to do machine learning in real time, contributing to new open source tools:
- River: a Python library for online Machine Learning
- MOA: Massive Online Analytics, a framework for mining data streams (in Java)
- Apache SAMOA: Scalable Advanced Massive Online Analytics, an open source framework for data stream mining on the Hadoop Ecosystem
Language and Relevance
Computer science is not just about computers. In this area of research, we investigate how humans reason, and what this implies for machines.
- Simplicity theory seeks to explain the relevance of situations or events to human minds.
- Relevance in natural language: The point is to retro-engineer methods to achieve meaningful and relevant speech from our understanding of human performance.
- We apply game theory and social simulation to explore conditions in which providing valuable (i.e. relevant) information is a profitable strategy. Read this paper.
