I am a developer advocate at DuckDB Labs, the company behind DuckDB. I edit the documentation and the blog, give talks and run interesting benchmark experiments. I also co-organize DuckCons and DuckDB meetups.

Outside of my day job, I am a contributor of the Linked Data Benchmark Council (LDBC), a non-profit organization that promotes the use of graph data management technology and defines TPC-style graph database benchmarks.

Previously, I spent a decade in academia. I obtained my PhD in software engineering at the Critical Systems Research Group of TU Budapest, followed by a 3-year stint as a post-doctoral researcher at the Database Architectures group of CWI Amsterdam. My research focused on graph queries, graph analytics, and benchmarks.

Besides the news on what I’ve been up to, I share some personal notes on this site.

Links

GitHub – Twitter – LinkedIn – Stack Overflow – Google Scholar – DBLP

News

(upcoming) November: I will give a talk about DuckLake at Øredev 2025.
June: I co-organized the DuckDB Berlin Meetup. The event had 100 people on site, making it the largest DuckDB meetup so far!
June: I presented DuckDB and DuckLake at the Budapest Data+ML Forum. See my slide deck.
May: DuckDB Labs released DuckLake, a new lakehouse format. I took a leading role in creating the launch strategy of the new format.
April: I gave the closing keynote of PyCon Lithuania’s Data Day. See the recording.
March: I gave a talk on graph databases at the Simonyi Conference, organized by my old alma mater, BME. The English translation of my slide deck is available, see Graph databases: Where theory meets pactice.
February: I co-organized the second DuckDB Amsterdam Meetup.
February: I gave a talk in the Data Analytics developer room of the FOSDEM conference in Brussels. See the slides and the recording.
January: I was a co-organizer of DuckCon #6 in Amsterdam. This was the largest DuckCon so far and the first one that we streamed online.

For older items, see the news archive.

Talks

Graph Databases (FOSDEM 2025)

slide deck

DuckDB (GOTO Amsterdam 2024)

slide deck

LDBC (FOSDEM 2023)

slide deck

Notes

Research ideas for graph processing

I recently attended SIGMOD and chatted with people in the data management and graph processing communities. These conversations and other interactions led me to come up with a few research ideas. I lack the time to actively pursue them, but I list them below for future reference. If you are working on something similar or would like to collaborate, please drop me a line! Single-node LDBC Datagen: I have long thought about developing a single-node variant of the LDBC Datagen....

Cards and apps in the Netherlands

I moved to the Netherlands during the summer of 2020 — about five years ago. Over time, I’ve learned about a bunch of useful cards and apps that make everyday life easier. Here’s a brief collection of them. Cards Albert Heijn Bonuskaart: When shopping at the Albert Heijn supermarket, you need to scan the bonus card; otherwise, the discounts are not applied. A few years ago, it was quite easy to get an anonymous card (just ask for one and don’t register it — the card still works as intended)....

Graph news – May 2025

A lot of things happened in the graph space so far. Here’s a quick summary with a few comments. On DB Engines ranking, graph databases have continued their rebound and have been on a growth trajectory for the last 5 months. In 2021, Gartner predicted that “by 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, facilitating rapid decision making across the organization”....

Regular expressions for catching typical writing errors

Some regular expressions for catching typical writing errors – spelling issues (US spelling), repeat words, etc.: ag --md " a [aeioAEIO]" ag --md " an [bcdfghjklmnpqrstvxzwyBCDGHJKPQRTVWXYZ]" ag --md -i "\s\b(\w+) +\1\b" ag --md -i "\s\b(\w+ +\w+) +\1\b" ag --md '\w+isation\b' ag --md 'e\.g\. ' ag --md 'i\.e\. ' ...

Databases conferences in 2025

In 2025, both top-tier database conferences will be in Europe: SIGMOD in Berlin (June 22–27) and VLDB in London (September 1–5). There are quite a few papers and satellite events I am looking forward to – I listed them below. SIGMOD 2025 The papers presented at SIGMOD are listed on the website: research track, industry track. Update: the detailed programme is out! Keynotes: How to Build a Brain by Christos H....

Generating TPCx-BB data sets

TPCx-BB (née BigBench) is a Big Data benchmark. To generate the TPCx-BB data sets, download the TPCX-BB_Tools_vX.Y.Z.zip package from the TPC website. To run the generator, you’ll need a Java 8-compatible Java Virtual Machine. To obtain this, you can, e.g., install the Zulu JVM via SDKMAN!. Then, you can run the generator as follows: java -cp pdgf.jar pdgf.Controller To list the available commands, run: help To start the data generation, run:...

Cloudflare R2 command line snippet

I am a big fan of Cloudflare R2, an object storage that provides egress-free downloads. R2 is compatible with the AWS S3 API, so you can use the AWS CLI tool – with a few caveats. These include: You need to add the --endpoint-url https://<account_id>.r2.cloudflarestorage.com argument for every call. When copying to R2, you need to pass the --checksum-algorithm CRC32 argument. I often store multiple AWS configurations, which requires passing an additional argument: --profile <your_r2_profile_name>....

DuckDB vs. coreutils

A few months ago, I wrote a post on the DuckDB blog where I explained how DuckDB’s SQL can express operations that developers typically implement with UNIX commands. Then earlier this week, I published a light-hearted social media post about DuckDB beating the UNIX wc -l command for counting the lines in a CSV file by a significant margin (1.2 vs. 2.9 seconds). This post received a lot of feedback with the criticisms centered around two points:...

“Data Science at the Command Line” book in DuckDB

Today I solved the exercises in Chapter 5 of the Data Science at the Command Line book using the DuckDB command line client. This page documents my solutions. Prerequisites Clone the https://github.com/jeroenjanssens/dsutils repository and add it to the PATH. To get the results for the reference solutions, you also need csvkit, which contains the csvlook, csvcut, csvsql, etc. CLI tools. brew install csvkit DuckDB Solutions In the following, I give the DuckDB solutions for each exercise....

Installing tidyverse on macOS

The tidyverse R package cannot be installed on macOS because one of its dependencies, ragg fails to compile with the following error: clang++ -std=gnu++17 -I"/opt/homebrew/Cellar/r/4.4.1/lib/R/include" -DNDEBUG -I./agg/include -I/opt/homebrew/opt/freetype/include/freetype2 -I/opt/homebrew/opt/libpng/include/libpng16 -I/opt/homebrew/Cellar/libtiff/4.6.0/include -I/opt/homebrew/opt/zstd/include -I/opt/homebrew/Cellar/xz/5.6.2/include -I/opt/homebrew/Cellar/jpeg-turbo/3.0.3/include -I'/opt/homebrew/lib/R/4.4/site-library/systemfonts/include' -I'/opt/homebrew/lib/R/4.4/site-library/textshaping/include' -I/opt/homebrew/opt/gettext/include -I/opt/homebrew/opt/readline/include -I/opt/homebrew/opt/xz/include -I/opt/homebrew/include -fPIC -g -O2 -Wall -pedantic -fdiagnostics-color=always -c agg/src/agg_vcgen_stroke.cpp -o agg/src/agg_vcgen_stroke.o agg/src/agg_font_freetype.cpp:116:18: warning: variable 'len' set but not used [-Wunused-but-set-variable] unsigned len = 0; ^ agg/src/agg_font_freetype.cpp:182:35: error: assigning to 'char *' from 'unsigned char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not tags = outline....

Links#

News#

Talks#

Graph Databases (FOSDEM 2025)#

DuckDB (GOTO Amsterdam 2024)#

LDBC (FOSDEM 2023)#

Notes#

Links

News

Talks

Graph Databases (FOSDEM 2025)

DuckDB (GOTO Amsterdam 2024)

LDBC (FOSDEM 2023)

Notes