I recently attended SIGMOD and chatted with people in the data management and graph processing communities.
These conversations and other interactions led me to come up with a few research ideas.
I lack the time to actively pursue them, but I list them below for future reference.
If you are working on something similar or would like to collaborate, please drop me a line!
- Single-node LDBC Datagen: I have long thought about developing a single-node variant of the LDBC Datagen. This could take inspiration from the SIGMOD 2025 paper “Revisiting Graph Analytics Benchmark”, which includes suggestions for the LDBC data generator.
- Bluesky: While the Bluesky social media site is still struggling to become mainstream, it has already produced datasets that can be extremely useful for modeling the generation of real-life networks. See the “Two million Bluesky posts” dataset and the 20+ papers published in recent years.
- Harnessing data lake architecture for graph databases: I expect the data lake architecture to have a major impact on graph processing, as it enables a multi-engine stack that allows analytical graph databases to function as secondary systems.
- Building graph systems on top of DuckDB: Similar to
duckpgq
, building a GQL or an RDF/SPARQL engine on top of DuckDB would be an interesting project.