Updated graph databases slide deck
I published an updated version of my Graph Databases slide deck. I originally presented this back in February at FOSDEM 2025. Now I updated it with new systems such as KùzuDB’s fork, LadybugDB. ...
I published an updated version of my Graph Databases slide deck. I originally presented this back in February at FOSDEM 2025. Now I updated it with new systems such as KùzuDB’s fork, LadybugDB. ...
In the last decade, I had my fair share of bad sleep. Often, I pushed through it helpless and the usual tips – aim for a regular bedtime, stop caffeine intake mid-afternoon, reduce screen time in the evening – did not help much. However, over the years I built up a repertoire of techniques. This is obviously not medical advice. All I can say that if you’re struggling with bad sleep, I encourage you to experiment. ...
I recently discovered the typos command line tool and created the following workflow around it in Git repositories: typos -w # writes the changes to file git add -p # allows you to review the changes interactively Some git clients use line-based diffs, which make the differences difficult to see. In these cases, you may want to switch to the diff-highlight tool, which is shipped with Git. On macOS with Apple Silicon and a Brew-installed git, you can use these commands to make git use it: ...
I recently attended SIGMOD and chatted with people in the data management and graph processing communities. These conversations and other interactions led me to come up with a few research ideas. I lack the time to actively pursue them, but I list them below for future reference. If you are working on something similar or would like to collaborate, please drop me a line! Single-node LDBC Datagen: I have long thought about developing a single-node variant of the LDBC Datagen. This could take inspiration from the SIGMOD 2025 paper “Revisiting Graph Analytics Benchmark”, which includes suggestions for the LDBC data generator. Bluesky: While the Bluesky social media site is still struggling to become mainstream, it has already produced datasets that can be extremely useful for modeling the generation of real-life networks. See the “Two million Bluesky posts” dataset and the 20+ papers published in recent years. Harnessing data lake architecture for graph databases: I expect the data lake architecture to have a major impact on graph processing, as it enables a multi-engine stack that allows analytical graph databases to function as secondary systems. Building graph systems on top of DuckDB: Similar to duckpgq, building a GQL or an RDF/SPARQL engine on top of DuckDB would be an interesting project. ...
I moved to the Netherlands during the summer of 2020 — about five years ago. Over time, I’ve learned about a bunch of useful cards and apps that make everyday life easier. Here’s a brief collection of them. Cards Albert Heijn Bonuskaart: When shopping at the Albert Heijn supermarket, you need to scan the bonus card; otherwise, the discounts are not applied. A few years ago, it was quite easy to get an anonymous card (just ask for one and don’t register it — the card still works as intended). There’s also a mobile version. European Health Insurance Card: My insurance provider did not automatically issue the blue EU-wide insurance card. I had to request it on their website, and they shipped it via post within a few days. Museumkaart: The Museum Card (a.k.a. museum pass) allows entry to 350+ museums across the country for a fixed yearly fee. It’s worth getting if you visit at least 4–5 museums a year. Since 2025, it also has a mobile version. Fun fact: because Amsterdam has poor public toilet coverage, people sometimes use their Museum Card to briefly enter a museum just to use the toilet. This earned the Museum Card the nickname “WC jaarkaart.” Apps You’ll very likely need the DigiD app for dealing with taxes, accessing municipal messages, vaccination records, etc. The built-in iOS weather app and widget are quite unreliable. Most people use the Buienradar app, which also has its own widget. You can use Tikkie to split bills. Last time I checked, it was only available in the Dutch App Store — so you may have to change your region. Use the PostNL app to track or reschedule deliveries. Travelling If you get around a lot, get a personal (yellow) OV-chipkaart. This card can be linked to a bank account and receive automatic top-ups. ...
A lot of things happened in the graph space so far. Here’s a quick summary with a few comments. On DB Engines ranking, graph databases have continued their rebound and have been on a growth trajectory for the last 5 months. In 2021, Gartner predicted that “by 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, facilitating rapid decision making across the organization”. While this is difficult to verify, my feeling is that this prediction failed and has been quietly swept under the rug. There is no mention of it on Gartner’s site or social media feeds. RDF 1.2 is out. RDF 1.1 was released more than 11 years ago so this is a big deal in the semantic web community. DuckDB v1.3.0 was released with support for the USING KEY clause in recursive common table expressions. This new clause, implemented by researchers at the University of Tübingen based on their CIDR 2023 paper, allows making recursive CTEs much more efficient. Kùzu v0.10.0 is out. Kùzu continues to be a strong player in the open-source analytical graph database category and it’s good to see it mature. The programme of SIGMOD 2025 has been announced and its full of good graph talks. I will attend the event in June. ...
Some regular expressions for catching typical writing errors – spelling issues (US spelling), repeat words, etc.: ag --md " a [aeioAEIO]" ag --md " an [bcdfghjklmnpqrstvxzwyBCDGHJKPQRTVWXYZ]" ag --md -i "\s\b(\w+) +\1\b" ag --md -i "\s\b(\w+ +\w+) +\1\b" ag --md '\w+isation\b' ag --md 'e\.g\. ' ag --md 'i\.e\. ' ...
In 2025, both top-tier database conferences will be in Europe: SIGMOD in Berlin (June 22–27) and VLDB in London (September 1–5). There are quite a few papers and satellite events I am looking forward to – I listed them below. SIGMOD 2025 The papers presented at SIGMOD are listed on the website: research track, industry track. Update: the detailed programme is out! Keynotes: How to Build a Brain by Christos H. Papadimitriou The Case for Collaboration (Everything a Database Person really needs to know about Machine Learning) by Margo Seltzer Fifty Years of Transaction Processing Research by Phil Bernstein Graphs The GRADES-NDA 2025 workshop on Friday ...
TPCx-BB (née BigBench) is a Big Data benchmark. To generate the TPCx-BB data sets, download the TPCX-BB_Tools_vX.Y.Z.zip package from the TPC website. To run the generator, you’ll need a Java 8-compatible Java Virtual Machine. To obtain this, you can, e.g., install the Zulu JVM via SDKMAN!. Then, you can run the generator as follows: java -cp pdgf.jar pdgf.Controller To list the available commands, run: help To start the data generation, run: start ...
I am a big fan of Cloudflare R2, an object storage that provides egress-free downloads. R2 is compatible with the AWS S3 API, so you can use the AWS CLI tool – with a few caveats. These include: You need to add the --endpoint-url https://<account_id>.r2.cloudflarestorage.com argument for every call. When copying to R2, you need to pass the --checksum-algorithm CRC32 argument. I often store multiple AWS configurations, which requires passing an additional argument: --profile <your_r2_profile_name>. To perform these steps automatically, I wrote a small Bash function. Add this to your ~/.bashrc or ~/.zshrc: ...