This tutorial is an adaptation of “Public DuckLake on Object Storage” to the object storage of Leafcloud, an Amsterdam-based cloud provider.
Hosting DuckLake on Leafcloud
Setting Up the Object Store
-
Navigate to the Leafcloud dashboard at https://create.leaf.cloud/.
-
Go to Object Store | Containers and create a new container. We’ll use the name
ducklake-storage. -
Tick the checkbox for Public Access and copy the link. This will be e.g. the following:
https://leafcloud.store/swift/v1/AUTH_f84982a3c5d04bd0846197d8e8ce3ddd/ducklake-storage
Setting Up the OpenStackClient
-
Fetch the credentials from Leafcloud. Navigate to your username in the top right corner | Settings | Identity | Application Credentials, or simply visit https://create.leaf.cloud/identity/application_credentials/.
-
Select Create Application Credential and download the resulting
app-cred-<your_credential_name>-cred-openrc.shandclouds.yamlfiles. -
Source the credentials in your shell to configure the environment variables:
source app-cred-<your_credential_name>-cred-openrc.sh -
Install OpenStackClient in your favorite Python environment:
pip install python-openstackclient -
Create the credentials for the bucket:
openstack ec2 credentials create -
This will print something like this:
+------------+----------------------------------+ | Field | Value | +------------+----------------------------------+ | access | <32-character hexadecimal value> | | links | {'self': '...'} | | project_id | <32-character hexadecimal value> | | secret | <32-character hexadecimal value> | | trust_id | None | | user_id | <32-character hexadecimal value> | +------------+----------------------------------+ -
Save the printed credentials.
Setting Up Rclone
-
Install Rclone.
-
Initiate the setup with
rclone configand add a new remote. We’ll name itlc. -
Select Amazon S3 Compliant Storage Providers (
s3) | Any other S3 compatible provider (other). -
Set the
access_key_idto theaccessfield’s value from the table above. -
Set the
secret_access_keyto thesecretfield’s value. -
Edit the
~/.config/rclone/rclone.confmanually and set the endpoint tohttps://leafcloud.store. -
The entry in
rclone.confwill look like this:[lc] type = s3 provider = Other access_key_id = <32-character hexadecimal value> secret_access_key = <32-character hexadecimal value> endpoint = https://leafcloud.store
Creating a DuckLake
-
Create a directory called
ducklake-storageand navigate to this directory. -
Create a DuckLake following the “Using a Remote Data Path” DuckLake guide.
-
When specifying the
DATA_PATH, use the previously obtained pathhttps://leafcloud.store/.../ducklake-storage. -
Synchronize the DuckLake to the object storage as follows:
rclone sync ducklake-storage lc:ducklake-storage
Testing
-
Connect to the DuckLake as follows:
duckdb ducklake:https://leafcloud.store/swift/v1/AUTH_f84982a3c5d04bd0846197d8e8ce3ddd/ducklake-storage/sf1.ducklake -
List the tables with
.tables. This will list a mix of data tables and metadata tables:D .tables Comment ducklake_column_tag Comment_hasTag_Tag ducklake_data_file -
Run any SQL query you like:
select firstName from person limit 1;┌───────────┐ │ firstName │ │ varchar │ ├───────────┤ │ Jun │ └───────────┘
Future Work
Leafcloud’s website states that a managed database service is coming soon. This could be used to set up DuckLake with Postgres for multi-writer DuckDB.