DuckDB vs. coreutils

A few months ago, I wrote a post on the DuckDB blog where I explained how DuckDB’s SQL can express operations that developers typically implement with UNIX commands. Then earlier this week, I published a light-hearted social media post about DuckDB beating the UNIX wc -l command for counting the lines in a CSV file by a significant margin (1.2 vs. 2.9 seconds). This post received a lot of feedback with the criticisms centered around two points: ...

December 4, 2024

“Data Science at the Command Line” book in DuckDB

Today I solved the exercises in Chapter 5 of the Data Science at the Command Line book using the DuckDB command line client. This page documents my solutions. Prerequisites Clone the https://github.com/jeroenjanssens/dsutils repository and add it to the PATH. To get the results for the reference solutions, you also need csvkit, which contains the csvlook, csvcut, csvsql, etc. CLI tools. brew install csvkit DuckDB Solutions In the following, I give the DuckDB solutions for each exercise. If the command is part of a pipeline (e.g., when the input is read from the standard input), DuckDB is called as ... | duckdb -c "⟨SQL query⟩". Other times, the solution is presented simply as a SQL script. ...

November 30, 2024

Installing tidyverse on macOS

The tidyverse R package cannot be installed on macOS because one of its dependencies, ragg fails to compile with the following error: clang++ -std=gnu++17 -I"/opt/homebrew/Cellar/r/4.4.1/lib/R/include" -DNDEBUG -I./agg/include -I/opt/homebrew/opt/freetype/include/freetype2 -I/opt/homebrew/opt/libpng/include/libpng16 -I/opt/homebrew/Cellar/libtiff/4.6.0/include -I/opt/homebrew/opt/zstd/include -I/opt/homebrew/Cellar/xz/5.6.2/include -I/opt/homebrew/Cellar/jpeg-turbo/3.0.3/include -I'/opt/homebrew/lib/R/4.4/site-library/systemfonts/include' -I'/opt/homebrew/lib/R/4.4/site-library/textshaping/include' -I/opt/homebrew/opt/gettext/include -I/opt/homebrew/opt/readline/include -I/opt/homebrew/opt/xz/include -I/opt/homebrew/include -fPIC -g -O2 -Wall -pedantic -fdiagnostics-color=always -c agg/src/agg_vcgen_stroke.cpp -o agg/src/agg_vcgen_stroke.o agg/src/agg_font_freetype.cpp:116:18: warning: variable 'len' set but not used [-Wunused-but-set-variable] unsigned len = 0; ^ agg/src/agg_font_freetype.cpp:182:35: error: assigning to 'char *' from 'unsigned char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not tags = outline.tags + first; ~~~~~~~~~~~~~~^~~~~~~ 1 warning and 1 error generated. make: *** [agg/src/agg_font_freetype.o] Error 1 make: *** Waiting for unfinished jobs.... ERROR: compilation failed for package ‘ragg’ As of today, there is a pull request resolving the issue. You can install it using the remotes package. Then, install the tidyverse package: ...

September 9, 2024

Setting up a MacBook for presentations

Overview A recurring task in my day job is to organize technical conferences (most recently, DuckCon #4 and #5), and to run the event from my laptop. To this end, I configure my laptop to ensure the best experience for both speakers and attendees. Most of the events I organize are free, so there is a limited budget available. Additionally, there is a limited amount of time prepare. For example, it is not possible to conduct rehearsals with speakers. Therefore, the question I am trying to answer in this post is the following: ...

August 24, 2024

macOS command line tricks

Make git beep upon failed push Motivation: When I issue a git push command, I immediately navigate away from the terminal. Therefore, if the command fails due to the remote rejecting it after a second, I do not see this and assume that the push was successful. To avoid this, we’ll configure the shell so when git push fails, it gives a small beep sound. To do so, follow these steps: ...

June 27, 2024

DuckDB workshop

Setup DuckDB installation site duckman: DuckDB Version Manager railway.ipynb Jupyter notebook Weather data set Source: Visual Crossing Weather wget https://blobs.duckdb.org/data/amsterdam-weather.csv Railway data set Source: Rijden de Treinen wget https://blobs.duckdb.org/nl-railway/stations-2022-01.csv wget https://blobs.duckdb.org/nl-railway/tariff-distances-2022-01.csv wget https://blobs.duckdb.org/nl-railway/services-2019.csv.gz wget https://blobs.duckdb.org/nl-railway/services-2020.csv.gz wget https://blobs.duckdb.org/nl-railway/services-2021.csv.gz wget https://blobs.duckdb.org/nl-railway/services-2022.csv.gz wget https://blobs.duckdb.org/nl-railway/services-2023.csv.gz wget https://blobs.duckdb.org/nl-railway/services-2024-01.csv.gz wget https://blobs.duckdb.org/nl-railway/services-2024-02.csv.gz wget https://blobs.duckdb.org/nl-railway/services-2024-03.csv.gz wget https://blobs.duckdb.org/nl-railway/services-2024-04.csv.gz VS Code hotkey Define keyboard shortcut Ctrl + Enter for executing selection in terminal To use the same editor (VS Code) and the same keyboard shortcut (Ctrl + Enter) to run the active line / selected piece of code (for CLI) or the active cell (for Python notebooks), add the CLI hotkey as follows: ...

June 9, 2024

Using the Aztec Code on an iPhone to enter the gates at Dutch train stations

TL;DR: To prevent your iPhone from switching to Apple Pay at train station gates when scanning your ticket’s code, use Guided Access mode. This can be turned on in the accessibility settings and activated by triple-clicking the side button of your phone. Recently, when travelling to FOSDEM from Amsterdam to Brussels via railway, I ran into the following problem. The Aztec Code for my train was saved on my iPhone. When I pulled out the phone to scan it at the entrance gates of Amsterdam Centraal, my phone kept opening the wallet for Apple Pay, making the Aztec Code no longer visible. ...

February 5, 2024

Publications at VLDB 2023

I co-authored the following papers, to be presented at VLDB 2023 and its satellite events. These papers capture the work on the new and updated benchmarks of the Linked Data Benchmark Council (LDBC), as well as LDBC’s organizational restructuring. Additionally, an initial DuckDB-based prototype of the SQL/PGQ language extension was accepted at the demo track. VLDB 2023 main track – G. Szárnyas et al.: The LDBC Social Network Benchmark: Business Intelligence workload [slide deck] TPCTC 2023 – G. Szárnyas et al.: The Linked Data Benchmark Council (LDBC): Driving competition and collaboration in the graph data management space [slide deck] TPCTC 2023 – D. Püroja, J. Waudby, P. Boncz, G. Szárnyas: The LDBC Social Network Benchmark Interactive workload v2: A transactional graph query benchmark with deep delete operations [slide deck] VLDB 2023 demonstration track – D. ten Wolde, G. Szárnyas, P. Boncz: DuckPGQ: Bringing SQL/PGQ to DuckDB ...

July 18, 2023

Working without sudo

I spent many years working in academia, where the typical development environment was an on-premises Linux cluster where I was given a user without sudo rights. In the absence of root access, the recommended way to install packages was to ask the system administrators to do so. However, this process can introduce considerable delays, so I was keen to look for alternative solutions. Version managers For installing programming language-related tooling, it’s recommended to use version managers (also known as “version multiplexers”). Here’s a short list of the version managers I found useful: ...

June 27, 2023

E-bike rental options in Amsterdam

Recently, I had a few family members visiting me in Amsterdam and I wanted to show them around the Dutch countryside. To give us a larger radius – and put towns like Marken, Zaandam, and Naarden comfortably within reach – I decided to rent e-bikes. In this post, I document my findings about the available options and the bikes I rented. Note that my lessons learnt are valid as of summer 2022 and are likely to change in the future. ...

August 20, 2022