My first experience of a Postgres day, where people talk postgreSQL all day. My day started very early – I left home at 05:15 and arrived in Brussels at 09:10. Unfortunately, due to the train schedules (where trains seem to wake up later than I do) I arrived a bit late for the opening talk by Magnus Hagander. Magnus is one of the core members of the PostgreSQL infrastructure team, developer and comitter in the Global development group. And then, straight into the sessions in a packed room, fortunately there was a seat available.
Session 1: Hosted environments for PostgreSQL by Christophe Pettus
The talk by Christophe Pettus where he compared the three big cloud providers with respect to hosting #postgresql as a database service. In a fast paced session he took the audience through the good and the bad of AWS RDS, Azure and GCP (Google Cloud Platform). Some surprising features and quirks on each of the providers, and some interesting differences. Although I think I understand a little bit of what it takes to provide a Database as a service, I do believe that there are areas where all three have some work to do, for example, access to the database and errorlogs, or is this the traditional DBA speaking?
Choosing a provider boils down to where your compute is located already – do you want an application scattered across a number of cloud providers, and what your priorities are - if you have PostgreSQL functionality as a high priority , then there is one clear winner.
Why choose one of the cloud solutions? The biggest reasons would be fail-over orchestration which is quite complex do implement yourself, and the ability to describe your infrastructure as code. The latter provides repeatability when installing new environments.
There are some downsides too (applicable to all the providers):
- you lose insights into what's happening under the hood
- limited logging,
- limited log retention,
- limited configuration - some options cannot be adjusted,
- highly limited access/use of extensions
- no PG Superuser
- out-of-date versions (patching takes time…
The speaker compared the three major cloud providers on a number of points, and explained his reasoning for choosing one of the big three as a clear winner, and after having heard the arguments, I agree. I'll keep that for myself for now.
Data structures by Tomas Vondra
This was followed by two technical presentations. One was by Tomas Vondra where he spoke about two interesting probabilistic data structures allowing efficient (distributed, parallel) estimation of rank-based statistics (e.g. percentiles). This was another very informative session with practical examples. I'll have to revisit the slides and do some experimentation with t-digest and hyperloglog. These are statistical methods which helps to speed up aggregate functions with a slight tradeoff in accuracy for speed.
Asynchronous IO for PostgreSQL by Andres Freund
The next technical presentation was by Andres Freund where he discussed asynchronous IO for PostgreSQL. For many workloads PostgreSQL currently cannot take full advantage of modern storage systems, like e.g. good SSD. One of the major reasons for that is that the majority of storage IO postgres performs is done synchronously (see e.g. slide 8fff in https://anarazel.de/talks/2019-10-16-pgconf-milan-io/io.pdf for an illustration as to why that is a problem).
Managing mobility data in PostgreSQL by Esteban Zimanyi
The last session before lucnh was by Esteban Zimanyi from the Univercity of Brussels about managing mobility data in PostgreSQL.
From the conference page: ”MobilityDB is an open source moving object database system (https://github.com/ULB-CoDE-WIT/MobilityDB). Its core function is to efficiently store and query mobility tracks, such as vehicle GPS trajectories. It is engineered up from PostgreSQL and PostGIS, providing spatiotemporal data management via SQL. It integrates with the PostgreSQL eco-system allowing for complex architectures such as cloud deployments using Citus.”
Super interesting when one starts playing with al the "tracking" data out there. Think about your Google location data - which goes back quite some time, AIS data from ships, tracking data from buses and trains and even tracking data from airplanes. It was very cool to see the progress this project has made, and I also liked the way that they also asked the community for support to change and add features to #postgresql. This is a project to watch. I can think of one possible use-case for this. Slides here: https://www.postgresql.eu/events/fosdem2020/sessions/session/2864/slides/268/MobilityDB-PGDay-2020.pdf
Modern B-Tree techniques by Dmitry Dolgov
After a very nice lunch Dmitry Dolgov spoke about Modern B-Tree techniques. This was a fast paced session where Dmitry explained some of the techniques designed and proposed to improve efficiency or add functionality. This talk was inspired by articles by Goetz Graefes, and will require some more reading on my part.
PGSpider by Taiga Katayama
The penultimate session of the day was by Taiga Katayama. He works for Toshiba in Japan in their Open Source Technology departement. On a side note, it is quite interesting on how many large companies are actively busy in the open source world, and giving back to open source.
Again, from the conference page:” In recent years, the number of edge and sensor devices in IoT systems has increased abruptly to the thousands or the tens of thousands. When users store data in high-performance edge and sensor devices, they prefer to search data directly on them rather than collecting the data in the cloud. Users also want to handle this distributed big data as a single virtual table that they can search at high speed. For that reason, we developed PGSpider which can access various data sources at high speed by using PostgreSQL Foreign Data Wrappers (FDW). By using FDW's pushdown function to its full potential and thread parallelization, PGSpider can acquire and process data at lightning speed.”
What impressed me about this work was the ability to query from many different datasources by using the foreign data wrapper – roughly analogous to Polybase in SQL server. This opens up a lot of possibilities when it comes to processing data from different sources. Another project to watch.
PostgreSQL security model by Joe Conway
The final session of the day was by Joe Conway from Crunchydata. Joe spoke about the challenges (and quirks) of the PostgreSQL security model, and the risks associated with some standard settings. The complications arises from a number of areas: Users and groups are different forms of a role, “USER” can have members, a role may be a direct or indirect member of another role, and some less than ideal privileges are “just there”. Joe explained these challenges and the demonstrated how the crunchy data plugins can make managing this a lot easier. Sheets can be found here: http://joeconway.com/presentations/security-FOSDEM2020.pdf
All great sessions, and of particular interest to me was the talk by Christophe Pettus where he presented his view on comparing the big three cloud providers when it comes to hosting postgres. Some interesting titbits and insights in his real-world comparison of AWS, Azure and GCloud. Of personal interest was the session on MobilityDB which adds support for temporal and spatio-temporal objects to the PostgreSQL object-relational database and its spatial extension PostGIS. I'll keep my eye on this one. And very important - I really appreciate the opportunity to attend these events. Kudos to OptimaData for this!
Back to blog overview