Reading and watching list

Being Glue by Tanya Reilly How to Write Blog Posts that Developers Read by Michael Lynch Reality has a surprising amount of detail by John Salvatier The Bitter Lesson by Rich Sutton Cottagecore Programmers by Theodore Morley The lethal trifecta for AI agents: private data, untrusted content, and external communication by Simon Willison I Regret My $46k Website Redesign by Michael Lynch ...

May 20, 2025

Regular expressions for catching typical writing errors

Some regular expressions for catching typical writing errors – spelling issues (US spelling), repeat words, etc.: ag --md " a [aeioAEIO]" ag --md " an [bcdfghjklmnpqrstvxzwyBCDGHJKPQRTVWXYZ]" ag --md -i "\s\b(\w+) +\1\b" ag --md -i "\s\b(\w+ +\w+) +\1\b" ag --md '\w+isation\b' ag --md 'e\.g\. ' ag --md 'i\.e\. ' ...

April 28, 2025

Database conferences in 2025

In 2025, both top-tier database conferences will be in Europe: SIGMOD in Berlin (June 22–27) and VLDB in London (September 1–5). There are quite a few papers and satellite events I am looking forward to – I listed them below. SIGMOD 2025 The papers presented at SIGMOD are listed on the website: research track, industry track. Update: the detailed programme is out! Keynotes: How to Build a Brain by Christos H....

April 27, 2025

Generating TPCx-BB data sets

TPCx-BB (née BigBench) is a Big Data benchmark. To generate the TPCx-BB data sets, download the TPCX-BB_Tools_vX.Y.Z.zip package from the TPC website. To run the generator, you’ll need a Java 8-compatible Java Virtual Machine. To obtain this, you can, e.g., install the Zulu JVM via SDKMAN!. Then, you can run the generator as follows: java -cp pdgf.jar pdgf.Controller To list the available commands, run: help To start the data generation, run:...

April 7, 2025

Cloudflare R2 command line snippet

I am a big fan of Cloudflare R2, an object storage that provides egress-free downloads. R2 is compatible with the AWS S3 API, so you can use the AWS CLI tool – with a few caveats. These include: You need to add the --endpoint-url https://<account_id>.r2.cloudflarestorage.com argument for every call. When copying to R2, you need to pass the --checksum-algorithm CRC32 argument. I often store multiple AWS configurations, which requires passing an additional argument: --profile <your_r2_profile_name>....

April 6, 2025

DuckDB vs. coreutils

A few months ago, I wrote a post on the DuckDB blog where I explained how DuckDB’s SQL can express operations that developers typically implement with UNIX commands. Then earlier this week, I published a light-hearted social media post about DuckDB beating the UNIX wc -l command for counting the lines in a CSV file by a significant margin (1.2 vs. 2.9 seconds). This post received a lot of feedback with the criticisms centered around two points:...

December 4, 2024

“Data Science at the Command Line” book in DuckDB

Today I solved the exercises in Chapter 5 of the Data Science at the Command Line book using the DuckDB command line client. This page documents my solutions. Prerequisites Clone the https://github.com/jeroenjanssens/dsutils repository and add it to the PATH. To get the results for the reference solutions, you also need csvkit, which contains the csvlook, csvcut, csvsql, etc. CLI tools. brew install csvkit DuckDB Solutions In the following, I give the DuckDB solutions for each exercise....

November 30, 2024

Installing tidyverse on macOS

The tidyverse R package cannot be installed on macOS because one of its dependencies, ragg fails to compile with the following error: clang++ -std=gnu++17 -I"/opt/homebrew/Cellar/r/4.4.1/lib/R/include" -DNDEBUG -I./agg/include -I/opt/homebrew/opt/freetype/include/freetype2 -I/opt/homebrew/opt/libpng/include/libpng16 -I/opt/homebrew/Cellar/libtiff/4.6.0/include -I/opt/homebrew/opt/zstd/include -I/opt/homebrew/Cellar/xz/5.6.2/include -I/opt/homebrew/Cellar/jpeg-turbo/3.0.3/include -I'/opt/homebrew/lib/R/4.4/site-library/systemfonts/include' -I'/opt/homebrew/lib/R/4.4/site-library/textshaping/include' -I/opt/homebrew/opt/gettext/include -I/opt/homebrew/opt/readline/include -I/opt/homebrew/opt/xz/include -I/opt/homebrew/include -fPIC -g -O2 -Wall -pedantic -fdiagnostics-color=always -c agg/src/agg_vcgen_stroke.cpp -o agg/src/agg_vcgen_stroke.o agg/src/agg_font_freetype.cpp:116:18: warning: variable 'len' set but not used [-Wunused-but-set-variable] unsigned len = 0; ^ agg/src/agg_font_freetype.cpp:182:35: error: assigning to 'char *' from 'unsigned char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not tags = outline....

September 9, 2024

Setting up a MacBook for presentations

Overview A recurring task in my day job is to organize technical conferences (most recently, DuckCon #4 and #5), and to run the event from my laptop. To this end, I configure my laptop to ensure the best experience for both speakers and attendees. Most of the events I organize are free, so there is a limited budget available. Additionally, there is a limited amount of time to prepare. For example, it is not possible to conduct rehearsals with speakers....

August 24, 2024

macOS command line tricks

Make git beep upon failed push Motivation: When I issue a git push command, I immediately navigate away from the terminal. Therefore, if the command fails due to the remote rejecting it after a second, I do not see this and assume that the push was successful. To avoid this, we’ll configure the shell so when git push fails, it gives a small beep sound. To do so, follow these steps:...

June 27, 2024