Previously, I spent a decade on academia.
I obtained my PhD in software engineering at the Critical Systems Research group of TU Budapest, followed by a 3-year stint as a post-doctoral researcher at the Database Architectures group of CWI Amsterdam. My research focused on graph queries, graph analytics, and benchmarks.
A few months ago, I wrote a post on the DuckDB blog where I explained how DuckDB’s SQL can express operations that developers typically implement with UNIX commands. Then earlier this week, I published a light-hearted social media post about DuckDB beating the UNIX wc -l command for counting the lines in a CSV file by a significant margin (1.2 vs. 2.9 seconds).
This post received a lot of feedback with the criticisms centered around two points:...
Data Science at the Command Line Book in DuckDB
Today I solved the exercises in Chapter 5 of the Data Science at the Command Line book using the DuckDB command line client. This page documents my solutions.
Prerequisites Clone the https://github.com/jeroenjanssens/dsutils repository and add it to the PATH.
To get the results for the reference solutions, you also need csvkit, which contains the csvlook, csvcut, csvsql, etc. CLI tools.
brew install csvkit DuckDB Solutions In the following, I give the DuckDB solutions for each exercise....
Installing tidyverse on macOS
The tidyverse R package cannot be installed on macOS because one of its dependencies, ragg fails to compile with the following error:
clang++ -std=gnu++17 -I"/opt/homebrew/Cellar/r/4.4.1/lib/R/include" -DNDEBUG -I./agg/include -I/opt/homebrew/opt/freetype/include/freetype2 -I/opt/homebrew/opt/libpng/include/libpng16 -I/opt/homebrew/Cellar/libtiff/4.6.0/include -I/opt/homebrew/opt/zstd/include -I/opt/homebrew/Cellar/xz/5.6.2/include -I/opt/homebrew/Cellar/jpeg-turbo/3.0.3/include -I'/opt/homebrew/lib/R/4.4/site-library/systemfonts/include' -I'/opt/homebrew/lib/R/4.4/site-library/textshaping/include' -I/opt/homebrew/opt/gettext/include -I/opt/homebrew/opt/readline/include -I/opt/homebrew/opt/xz/include -I/opt/homebrew/include -fPIC -g -O2 -Wall -pedantic -fdiagnostics-color=always -c agg/src/agg_vcgen_stroke.cpp -o agg/src/agg_vcgen_stroke.o agg/src/agg_font_freetype.cpp:116:18: warning: variable 'len' set but not used [-Wunused-but-set-variable] unsigned len = 0; ^ agg/src/agg_font_freetype.cpp:182:35: error: assigning to 'char *' from 'unsigned char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not tags = outline....
Setting up a MacBook for Presentations
Overview A recurring task in my day job is to organize technical conferences (most recently, DuckCon #4 and #5), and to run the event from my laptop. To this end, I configure my laptop to ensure the best experience for both speakers and attendees.
Most of the events I organize are free, so there is a limited budget available. Additionally, there is a limited amount of time prepare. For example, it is not possible to conduct rehearsals with speakers....
macOS command line tricks
Make git beep upon failed push Motivation: When I issue a git push command, I immediately navigate away from the terminal. Therefore, if the command fails due to the remote rejecting it after a second, I do not see this and assume that the push was successful.
To avoid this, we’ll configure the shell so when git push fails, it gives a small beep sound. To do so, follow these steps:...
DuckDB workshop
Setup DuckDB installation site duckman: DuckDB Version Manager railway.ipynb Jupyter notebook Weather data set Source: Visual Crossing Weather
wget https://blobs.duckdb.org/data/amsterdam-weather.csv
Railway data set Source: Rijden de Treinen
wget https://blobs.duckdb.org/nl-railway/stations-2022-01.csv
wget https://blobs.duckdb.org/nl-railway/tariff-distances-2022-01.csv
wget https://blobs.duckdb.org/nl-railway/services-2019.csv.gz
wget https://blobs.duckdb.org/nl-railway/services-2020.csv.gz
wget https://blobs.duckdb.org/nl-railway/services-2021.csv.gz
wget https://blobs.duckdb.org/nl-railway/services-2022.csv.gz
wget https://blobs.duckdb.org/nl-railway/services-2023.csv.gz
wget https://blobs.duckdb.org/nl-railway/services-2024-01.csv.gz
wget https://blobs.duckdb.org/nl-railway/services-2024-02.csv.gz
wget https://blobs.duckdb.org/nl-railway/services-2024-03.csv.gz
wget https://blobs.duckdb.org/nl-railway/services-2024-04.csv.gz
VS Code hotkey Define keyboard shortcut Ctrl + Enter for executing selection in terminal To use the same editor (VS Code) and the same keyboard shortcut (Ctrl + Enter) to run the active line / selected piece of code (for CLI) or the active cell (for Python notebooks), add the CLI hotkey as follows:...
Using the Aztec Code on an iPhone to enter the gates at Dutch train stations
TL;DR: To prevent your iPhone from switching to Apple Pay at train station gates when scanning your ticket’s code, use Guided Access mode. This can be turned on in the accessibility settings and activated by triple-clicking the side button of your phone.
Recently, when travelling to FOSDEM from Amsterdam to Brussels via railway, I ran into the following problem. The Aztec Code for my train was saved on my iPhone. When I pulled out the phone to scan it at the entrance gates of Amsterdam Centraal, my phone kept opening the wallet for Apple Pay, making the Aztec Code no longer visible....
Publications at VLDB 2023
I co-authored the following papers, to be presented at VLDB 2023 and its satellite events. These papers capture the work on the new and updated benchmarks of the Linked Data Benchmark Council (LDBC), as well as LDBC’s organizational restucturing. Additionally, an initial DuckDB-based prototype of the SQL/PGQ language extension was accepted at the demo track.
VLDB 2023 main track – G. Szárnyas et al.: The LDBC Social Network Benchmark: Business Intelligence workload [slide deck] TPCTC 2023 – G....
Working without sudo using version managers
I spent many years working in academia, where the typical development environment was an on-premises Linux cluster where I was given a user without sudo rights. In the absence of root access, the recommended way to install packages was to ask the system administrators to do so. However, this process can introduce considerable delays, so I was keen to look for alternative solutions.
I found that a better, simpler way to install tools – especially programming language-related tooling – to use version managers....
E-bike rental options in Amsterdam
Recently, I had a few family members visiting me in Amsterdam and I wanted to show them around the Dutch countryside. To give us a larger radius – and put towns like Marken, Zaandam, and Naarden comfortably within reach – I decided to rent e-bikes. In this post, I document my findings about the available options and the bikes I rented. Note that my lessons learnt are valid as of summer 2022 and are likely to change in the future....