On the Overture Slack, Jake Wasserman shared a cool trick where you can connect to a remote DuckDB database using the ATTACH statement, using an HTTP or S3 URL. Jake’s demo was especially cool because he created a DB containing only views referencing an Overture parquet on S3, so the hosted database file was only a couple hundred kilobytes.

It’s a nice technique that allows us to host some easy to wrangle views at a convenient URLs (Jake suggested https://duckdb.overturemaps.org/), which will be nice for “Getting Started” demos and tutorials or even regional extracts.

But I wondered: what is the body of the request DuckDB is sending? It would be cool if someone could use the root Overture URL as a starting point, rather than having to remember a subdomain.

I spun up a quick Rails app and started ATTACH-ing to localhost and interrogating the request. DuckDB fires a standard GET HTTP request; there’s nothing special about it. However, it does use a fairly unique user agent: “cpp-httplib/0.14.3”. This is the C++ library used by DuckDB. We can check a request’s user agent and respond with a static database file if it matches “cpp-httplib”.

Here’s what a Rails controller might look like:

As a proof of concept, I created a DuckDB database containing a sample of public lists from my app, StepList, and made it available at the root URL. You can test out it out like so:

One gotcha for this trick: don’t forget the trailing / in your URL. Otherwise DuckDB will complain.

This technique would be great for any DaaS company or other data purveyor. Sample datasets in remote databases can be made available at their root URL, simplfying testing and demos. We’ll certainly be checking this out at Overture, at the root URL or otherwise.

Now I’m off to ask DuckDB to customize their user agent string…


Have thoughts? Send me a note