Towards Standardizing Place
Why I’m Excited for Overture’s GERS
Last week, The Overture Maps Foundation announced the General Availability of its global maps datasets. The exiting of beta for the places, buildings, divisions, and base layers is a tremendous achievement for all involved. I’ve been lucky enough to participate with Overture for the last year and a half, through my work at Precisely. I’m especially excited to help guide Overture’s Global Entity Reference System, or GERS.
It’s been hard to express my excitement, especially to non-geospatial geeks, so I’ll attempt to explain here.
Overture’s Global Entity Reference System has a real shot at standardizing how datasets and systems deal with place.
Coordinate reference systems (CRS) do a great job defining how we can describe locations as coordinate sets in our databases. Our de facto standard, WGS 84, works with all the major map platforms and databases, is the lingua franca of the GPS network, and has plenty of precision for nearly all GIS use cases1. As a standard for describing where a location is, WGS 84 is excellent and (almost always) interoperable. It has standardized location in our databases, datasets, and data software.
But these systems cannot standardize place.
The difference between locations and places is an important nuance. Humans build out, demarcate, and describe discrete venues and areas in space. The questions we ask only deal in coordinates because they have to; we’d rather ask questions about roads, paths, houses, stores, schools, forests, arenas, cities, and more – the places we define to organize the world.
So what do we need to standardize place? Let’s back up and understand what we need to build a standard. We need:
- A Source of Truth: An agreed-upon reference point to benchmark our measurements. For WGS 84, this source of truth is
(0,0)
, the point where the Prime Meridian intersects with the Equator. - Mechanisms for Disseminating the Truth: Networks and tools for taking measurements and ascertaining our relationship with the source of truth. For WGS 84, this is a complex system including GPS satellites, antennas, geolocation APIs, and geospatial software for computing and working with coordinates.
- A Format for Expressing Your Position Relative to the Truth: A set system for expressing our positions relative to the benchmark. For WGS 84, this is a coordinate pair (with a third metric for elevation, if you’d like), which is easily stored and shared with minimal hiccups.
These three ingredients are critical and present with every standard.
My favorite example is our current standardization of time, the trio of UTC, NTP, and UNIX Time. Each element respectively maps to the requirements: the source of truth (UTC), the mechanism for dissemination (NTP), and an expressive format (UNIX time).
This standardization, like all of them, was hard-won over decades. Hundreds of years even, if you follow UTC back to its ancestor Greenwich Mean Time and other standards that arose in response to steam trains2.
To standardize place, our source of truth must be much more complex than WGS 84’s or UTC’s. Any standard must know about our geography – those roads, houses, parks, and more. We can’t use a single location to benchmark everything. We need a dataset that knows about all the places we want to associate with and ask questions of. This dataset must be open and iterrogable by all. Otherwise, we stand no hope of building our second requirement, the dissemination mechanisms that allow people to situate themselves and their data against the source of truth.
But even “open” isn’t enough. It must be easy to access. If a dataset is open but requires you to download a massive file and stage it in a specialized database – unfamiliar to most – is it really accessible?
This is why Open Street Map isn’t sufficient to standardize place. OSM is an amazing project, focused on making the best maps, freely available. But their dissemination mechanism is the map itself; everything is done in service of building that final map view.
As an accessible dataset, OSM is lacking. Planet files and their extracts are unruly and intimidating. Getting one to the point where you can query it is challenging for anyone unfamiliar with the OSGeo stack. Further, OSM’s tagging system – while perfectly suited to a collaborative, global map-building project – isn’t ideal for data exploration or analytics (though some projects are helping).
Thankfully, Overture’s main product is its data, not a map. This data is staged accessibly, on AWS and Azure, in cloud-native geoparquet. Accessing the entire corpus, a single layer, or a subset of either can be done with AWS, Azure, or Overture’s Python CLI tools. If you use Google Cloud or Snowflake, CARTO staged the data on both.
Heck, you can use duckdb to query the data remotely. Here’s how you get all the places in my home ZIP code:
If you’re after a smaller region, use Overture’s Explorer interface and click the “Download Visible” button.
The data is easy to access. It’s early days, but the foundations are laid for robust dissemination mechanisms to emerge: tools that quickly and easily associate your data with the source of truth that is the dataset itself.
The last requirement for a standard is the expressive format, which also happens to be my favorite part of Overture: GERS.
Every entity in Overture’s data products – each road, building, place, address, city, country, and more – has a unique GERS ID. This ID is intended to be stable and traceable, “providing a mechanism to match features across datasets, track data stability, and detect errors in the data.”
GERS IDs are 128 bits and 32 characters. They don’t encode an entity’s coordinates, rather they serve as a pointer to the entity in Overture’s data. GERS, coupled with Overture datasets, is a potential standard for identifying the places we’ve defined in our world.
The Future of GIS isn’t More Maps; It’s Column Joins
The ability to easily connect disparate datasets dramatically increases their value. More data connections create more perspectives; and more answers to more questions.
There is tremendous potential value latent in data that could be connected to a place but isn’t.
Standardizing location with WSG 84 hasn’t proven sufficient. Every organization can benefit from geospatial intelligence. But the number of organizations capable of GIS is a tiny fraction of them. GIS is too complex for most situations.
People wanting information about places need to know about WGS 84 and Web Mercator as much as someone using UNIX time stamps needs to know about the resonant frequency of the cesium powering our atomic clocks. Most of the time, a map isn’t necessary. With a persistent key system like GERS, we can prepare, explore, and analyze geo data with column joins and SQL queries. We can visualize our findings with bar charts and other standard visualizations which are easier to create and consume3.
By delivering a data standard for places, not just locations, we can work towards a future where all one needs to access geospatial intelligence are column joins.
A standard for places makes the geospatial market much, much bigger. It will reduce the cost of data onboarding, lessen the experience needed to load data and make connections, and increase our ability to make connections by magnitudes. More data will be pinned down to places, broadening our perspective and understanding.
With Overture we have the seeds to standardize place. And it’s getting better with every release. As an ecosystem builds around Overture data, then GERS, the effects will be massive. Perhaps place will realize its potential as a standard, common, join key.
-
However we can imagine emerging technology use cases, specifically spatial computing, which might require greater precision than WGS 84 allows. But there are still nascent and complex. ↩
-
Don’t get me started on Train Time, one of my favorite rabbit holes. But, in a nutshell: before trains, the fastest means of transport was a horse. And when it took a full day to get from Cincinnati to Cleveland, regional clocks disagreeing by ~30 minutes doesn’t really matter. Once you start running trains on shared tracks, time syncronization really matters (if you’d prefer your trains don’t crash). Over decades, standardized time became established country by country. But that’s another post… ↩
-
Don’t get me wrong: I love maps. But the GIS community too often forgets that most people can’t easily read maps. Map reading has never been intuitive for most this has been made worse by turn-by-turn apps. Rather than improving our relationship with maps, they mediate it. GIS as an industry is too focused on maps as a final product. And it limits the size of our audience. ↩