Home

Check our open positions in the team!

The Data, Intelligence and Graphs (DIG) team is a group of researchers at Télécom Paris working on the fundamental issues raised in databases, knowledge management, graph mining and artificial intelligence. Research interests cover theoretical foundations of data intelligence and graph systems, practical solutions and applications, as well as cognitive aspects.

The DIG team has strong industrial collaborations:


                  

The DIG team is a proud signer of the TCS4F pledge for sustainable research in theoretical computer science:
Theoretical Computer Scientists for Future

A large majority of DIG members are signers of the No free view? No review! pledge in favor of open access:
No free view? No review!

Research

Knowledge Bases

A knowledge base is a computer-processable collection of knowledge about the world. We construct and mine such knowledge bases.

Graph Mining

Graphs are a near-universal way to represent data. We are concerned with mining graphs for patterns and properties. Our particular focus is on the scalability of such approaches.

  • Logo of scikit-networkscikit-network: scikit-network is a Python package for the analysis of large graphs (clustering, embedding, classification, ranking).

Social Web

The Web has evolved more and more into a social Web: content is produced and shared by users. In the DIG team, we follow and anticipate developments in this area.

  • Community detection: We are investigating means to detect and distinguish social communities on the Web.
  • Social Relations: We investigate the optimal investment in social relations from a theoretical point of view.

Language and Relevance

Computer science is not just about computers. In this area of research, we investigate how humans reason, and what this implies for machines.

  • Simplicity Theory: Simplicity theory seeks to explain the relevance of situations or events to human minds. See http://www.simplicitytheory.science
  • Relevance in natural language: The point is to retro-engineer methods to achieve meaningful and relevant speech from our understanding of human performance. Read this paper. Read more on this.
  • Communication as social signalling: We apply game theory and social simulation to explore conditions in which providing valuable (i.e. relevant) information is a profitable strategy. Read this paper. Read more on this.

Machine Learning for Data Streams

We investigate how to do machine learning in real time, contributing to new open source tools:

  • scikit-multiflow: a machine learning framework for multi-output/multi-label and stream data.
  • MOA: Massive Online Analytics, the most popular framework for mining data streams, implemented in Java.
  • Apache SAMOA: Scalable Advanced Massive Online Analytics, an open source framework for data stream mining on the Hadoop Ecosystem.

Big Data & Market Insights

We focus on data management and mining and their applications in digital marketing:

  • Scalability of the algorithms on large sets of real data
  • Context-aware recommender systems and predictive models: hotel booking, travel recommandation, Points of Interest …
  • Social networks analysis and web information extraction: community detection, centrality, engagement rate …

People

 

Talel Abdessalem Antoine Amarilli Albert Bifet Thomas Bonald Jean-Louis Dessalles
Georges Hebrail Louis Jachiet Mauro Sozio Fabian M. Suchanek Tiphaine Viard

Senior

Post-docs

PhD candidates

Interns

  • Éloi TanguyAdvisors: Thomas Bonald and Tiphaine Viard
  • Julie DessaintAdvisors: Fabien Suchanek and Thomas Bonald
  • Simon Delarue. Advisors: Thomas Bonald and Tiphaine Viard

Former members

News

Best paper award at ICALP 2021

The paper by Antoine Amarilli and Louis Jachiet (from DIG) and Charles Paperman (U. Lille), Dynamic Membership for Regular Languages, has won the best paper award at track B of the ICALP 2021 conference.

scikit-network

A new version of scikit-network is available! This includes: accelerated code for massive graphs visualization in SVG format soft clustering soft classification fast embedding