Graph news – May 2025

A lot of things happened in the graph space so far. Here’s a quick summary with a few comments. On DB Engines ranking, graph databases have continued their rebound and have been on a growth trajectory for the last 5 months. In 2021, Gartner predicted that “by 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, facilitating rapid decision making across the organization”....

May 25, 2025

Reading and watching list

Being Glue by Tanya Reilly How to Write Blog Posts that Developers Read by Michael Lynch Reality has a surprising amount of detail by John Salvatier The Bitter Lesson by Rich Sutton Cottagecore Programmers by Theodore Morley The lethal trifecta for AI agents: private data, untrusted content, and external communication by Simon Willison I Regret My $46k Website Redesign by Michael Lynch ...

May 20, 2025

Regular expressions for catching typical writing errors

Some regular expressions for catching typical writing errors – spelling issues (US spelling), repeat words, etc.: ag --md " a [aeioAEIO]" ag --md " an [bcdfghjklmnpqrstvxzwyBCDGHJKPQRTVWXYZ]" ag --md -i "\s\b(\w+) +\1\b" ag --md -i "\s\b(\w+ +\w+) +\1\b" ag --md '\w+isation\b' ag --md 'e\.g\. ' ag --md 'i\.e\. ' ...

April 28, 2025

Database conferences in 2025

In 2025, both top-tier database conferences will be in Europe: SIGMOD in Berlin (June 22–27) and VLDB in London (September 1–5). There are quite a few papers and satellite events I am looking forward to – I listed them below. SIGMOD 2025 The papers presented at SIGMOD are listed on the website: research track, industry track. Update: the detailed programme is out! Keynotes: How to Build a Brain by Christos H....

April 27, 2025

Generating TPCx-BB data sets

TPCx-BB (née BigBench) is a Big Data benchmark. To generate the TPCx-BB data sets, download the TPCX-BB_Tools_vX.Y.Z.zip package from the TPC website. To run the generator, you’ll need a Java 8-compatible Java Virtual Machine. To obtain this, you can, e.g., install the Zulu JVM via SDKMAN!. Then, you can run the generator as follows: java -cp pdgf.jar pdgf.Controller To list the available commands, run: help To start the data generation, run:...

April 7, 2025

Cloudflare R2 command line snippet

I am a big fan of Cloudflare R2, an object storage that provides egress-free downloads. R2 is compatible with the AWS S3 API, so you can use the AWS CLI tool – with a few caveats. These include: You need to add the --endpoint-url https://<account_id>.r2.cloudflarestorage.com argument for every call. When copying to R2, you need to pass the --checksum-algorithm CRC32 argument. I often store multiple AWS configurations, which requires passing an additional argument: --profile <your_r2_profile_name>....

April 6, 2025

Graph Databases after 15 Years – Where are they headed?

🤖 Below is an AI-generated summary of my FOSDEM 2025 talk (based on the slide deck). What are graph databases really about? At their core, graph databases are about joins. Despite marketing claims from some vendors that relational databases “cannot join,” the reality is more nuanced. Relational systems handle joins perfectly well at the execution level. What graph databases actually bring to the table is syntax sugar and specialized optimizations for join-heavy workloads....

February 2, 2025

DuckDB vs. coreutils

A few months ago, I wrote a post on the DuckDB blog where I explained how DuckDB’s SQL can express operations that developers typically implement with UNIX commands. Then earlier this week, I published a light-hearted social media post about DuckDB beating the UNIX wc -l command for counting the lines in a CSV file by a significant margin (1.2 vs. 2.9 seconds). This post received a lot of feedback with the criticisms centered around two points:...

December 4, 2024

“Data Science at the Command Line” book in DuckDB

Today I solved the exercises in Chapter 5 of the Data Science at the Command Line book using the DuckDB command line client. This page documents my solutions. Prerequisites Clone the https://github.com/jeroenjanssens/dsutils repository and add it to the PATH. To get the results for the reference solutions, you also need csvkit, which contains the csvlook, csvcut, csvsql, etc. CLI tools. brew install csvkit DuckDB Solutions In the following, I give the DuckDB solutions for each exercise....

November 30, 2024

Installing tidyverse on macOS

The tidyverse R package cannot be installed on macOS because one of its dependencies, ragg fails to compile with the following error: clang++ -std=gnu++17 -I"/opt/homebrew/Cellar/r/4.4.1/lib/R/include" -DNDEBUG -I./agg/include -I/opt/homebrew/opt/freetype/include/freetype2 -I/opt/homebrew/opt/libpng/include/libpng16 -I/opt/homebrew/Cellar/libtiff/4.6.0/include -I/opt/homebrew/opt/zstd/include -I/opt/homebrew/Cellar/xz/5.6.2/include -I/opt/homebrew/Cellar/jpeg-turbo/3.0.3/include -I'/opt/homebrew/lib/R/4.4/site-library/systemfonts/include' -I'/opt/homebrew/lib/R/4.4/site-library/textshaping/include' -I/opt/homebrew/opt/gettext/include -I/opt/homebrew/opt/readline/include -I/opt/homebrew/opt/xz/include -I/opt/homebrew/include -fPIC -g -O2 -Wall -pedantic -fdiagnostics-color=always -c agg/src/agg_vcgen_stroke.cpp -o agg/src/agg_vcgen_stroke.o agg/src/agg_font_freetype.cpp:116:18: warning: variable 'len' set but not used [-Wunused-but-set-variable] unsigned len = 0; ^ agg/src/agg_font_freetype.cpp:182:35: error: assigning to 'char *' from 'unsigned char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not tags = outline....

September 9, 2024