September: I co-authored the paper “The LDBC Financial Benchmark: Transaction Workload”, which got the best paper runner-up award at VLDB 2025’s benchmark track.
In the last decade, I had my fair share of bad sleep. Often, I pushed through it helpless and the usual tips – aim for a regular bedtime, stop caffeine intake mid-afternoon, reduce screen time in the evening – did not help much. However, over the years I built up a repertoire of techniques.
This is obviously not medical advice. All I can say that if you’re struggling with bad sleep, I encourage you to experiment....
Using the typos CLI tool
I recently discovered the typos command line tool and created the following workflow around it in Git repositories:
typos -w # writes the changes to file git add -p # allows you to review the changes interactively Some git clients use line-based diffs, which make the differences difficult to see. In these cases, you may want to switch to the diff-highlight tool, which is shipped with Git. On macOS with Apple Silicon and a Brew-installed git, you can use these commands to make git use it:...
Research ideas for graph processing
I recently attended SIGMOD and chatted with people in the data management and graph processing communities.
These conversations and other interactions led me to come up with a few research ideas.
I lack the time to actively pursue them, but I list them below for future reference.
If you are working on something similar or would like to collaborate, please drop me a line!
Single-node LDBC Datagen: I have long thought about developing a single-node variant of the LDBC Datagen....
Cards and apps in the Netherlands
I moved to the Netherlands during the summer of 2020 — about five years ago. Over time, I’ve learned about a bunch of useful cards and apps that make everyday life easier. Here’s a brief collection of them.
Cards Albert Heijn Bonuskaart: When shopping at the Albert Heijn supermarket, you need to scan the bonus card; otherwise, the discounts are not applied. A few years ago, it was quite easy to get an anonymous card (just ask for one and don’t register it — the card still works as intended)....
Graph news – May 2025
A lot of things happened in the graph space so far. Here’s a quick summary with a few comments.
On DB Engines ranking, graph databases have continued their rebound and have been on a growth trajectory for the last 5 months. In 2021, Gartner predicted that “by 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, facilitating rapid decision making across the organization”....
Regular expressions for catching typical writing errors
Some regular expressions for catching typical writing errors – spelling issues (US spelling), repeat words, etc.:
ag --md " a [aeioAEIO]" ag --md " an [bcdfghjklmnpqrstvxzwyBCDGHJKPQRTVWXYZ]" ag --md -i "\s\b(\w+) +\1\b" ag --md -i "\s\b(\w+ +\w+) +\1\b" ag --md '\w+isation\b' ag --md 'e\.g\. ' ag --md 'i\.e\. ' ...
Databases conferences in 2025
In 2025, both top-tier database conferences will be in Europe: SIGMOD in Berlin (June 22–27) and VLDB in London (September 1–5). There are quite a few papers and satellite events I am looking forward to – I listed them below.
SIGMOD 2025 The papers presented at SIGMOD are listed on the website: research track, industry track.
Update: the detailed programme is out!
Keynotes:
How to Build a Brain by Christos H....
Generating TPCx-BB data sets
TPCx-BB (née BigBench) is a Big Data benchmark. To generate the TPCx-BB data sets, download the TPCX-BB_Tools_vX.Y.Z.zip package from the TPC website.
To run the generator, you’ll need a Java 8-compatible Java Virtual Machine. To obtain this, you can, e.g., install the Zulu JVM via SDKMAN!. Then, you can run the generator as follows:
java -cp pdgf.jar pdgf.Controller To list the available commands, run:
help To start the data generation, run:...
Cloudflare R2 command line snippet
I am a big fan of Cloudflare R2, an object storage that provides egress-free downloads.
R2 is compatible with the AWS S3 API, so you can use the AWS CLI tool – with a few caveats. These include:
You need to add the --endpoint-url https://<account_id>.r2.cloudflarestorage.com argument for every call. When copying to R2, you need to pass the --checksum-algorithm CRC32 argument. I often store multiple AWS configurations, which requires passing an additional argument: --profile <your_r2_profile_name>....
DuckDB vs. coreutils
A few months ago, I wrote a post on the DuckDB blog where I explained how DuckDB’s SQL can express operations that developers typically implement with UNIX commands. Then earlier this week, I published a light-hearted social media post about DuckDB beating the UNIX wc -l command for counting the lines in a CSV file by a significant margin (1.2 vs. 2.9 seconds).
This post received a lot of feedback with the criticisms centered around two points:...