Когда Postgres стал крутым?

Коллеги, это вольный перевод - я пытался донести суть, не теряя художественную составляющую языка на котором написан изначальный текст. Если у вас есть корректировки того, что кардинально меняет смысл написанного, или вы заметили неточности, смело пишите мне на talk@shifu.io.

Ingres —> Postgres95 —> PostgreSQL

Изначально Postgres начался с проекта Ingres. Это был проект начатый в Университете Калифорнии Беркли, и назывался Ingres (INteractive Graphics REtrieval System - Интерактивная Графическая Система Повторения) под руководством Михаеля Стоунбрейкера. Они заменили обучающие курса на проект по разработке релляционной базы данных.

Изначально язык запросов был QUEL - тоесть даже не SQL. ANSI (Американский национальный институт стандартов) отдал предпочтение SQL в 1986 и много связанных проектов двинулись в этом направлении. Поддержка SQL была добавлена в Postgres95 (в момент запуска) в 1995.

Разработка вышла за рамки Университета в 1996, когда первая версия PostgreSQL была выпущена под версией 6.0, и была учреждена Глобальная команда разработки PosqtgreSQL (PostgreSQL Global Development Team) - это также группа, которая ведет проект по сей день.

Основопологающая технология MVCC(Multiversion Concurrency Control, Многоверсионное управление конкурентным доступом), которая сделала Postgres тем чем он является сейчас была зарелизена в Postgres 6.5 в 1999.

Postgres начинает расти

Today there are a lot of things you take for granted when working with a reliable database. In the year 2000, Postgres began its journey of growing up. The press release (if they did those things back then) might have read: Postgres JOINS other database ranks with foreign keys and join support.

The rest of the early 2000s brought more key foundational pieces, with a primary focus on being a reliable database and SQL, you got features such as:

Write-Ahead-Log (WAL) Outer joins TOAST More than 4b transactions Drop column Schemas IPv6 Postgres was early on this one 😉 One of my favorite things in observing some of the details from back in 2000 is some of the names that contributed that are still pillars in the Postgres community today. These folks range from my colleague Tom Lane, to Josh Berkus, Bruce Momjian, Hiroshi Inoue and Peter Eisentraut.

But it’s not just the commits, show up at a PgCon or PGConfEU and you’ll see some of the very same people pictured here at the PostgreSQL anniversary summit in 2006 (so almost 20 years ago) including Magnus Hagander, Stephen Frost, Bruce Momjian, Devrim Gunduz, Oleg Bartunov, David Fetter, Robert Treat, Simon Riggs, Joe Conway, Tom Lane, Andrew Dunstan, Dave Page, Jean-Paul Argudo, and so many many more (and to anyone I missed, personal apologies you can point out I missed you in person at the next conference).

Ned Lilly said, “Everyone who is involved with Postgres feels really good about its prospects. It’s not much of a stretch for them to [think] there’s a market opportunity here. Oracle is monolithic, so it’s not a leap to visualize [success]. The open-source database wave is starting to crest.”

The above statement was made back in 2000, in association with the first Postgres company Great Bridge. Ned got it right, but was he 5 years, 10 years, or 15 years too early? We will never know as it was a dotcom casualty, and not long after that statement that the company shutdown in 2001. And if you hadn’t noticed a slight nod in the naming of Crunchy Bridge to some PostgreSQL history, well now you know.

Postgres continuing to mature As we get later into the 2000s and hit 2005, Postgres could be viewed as a fairly reliable database. With richer transaction support, broad SQL support, and pieces like WAL and VACUUM improvements, if you were an early adopter you started to trust it for production workloads. It was trustworthy, but still had some ways to go in terms of ease of use.

At this point in time we start to see a mix of features with multiple different themes that could be pointed to:

Concurrent index creation Warm standby servers Query language improvements All the data types - Arrays, UUID, ENUM, XML Two phase commit A richer role system We still saw the same set of folks involved since early days, but others started to contribute headlining features. Some themes would emerge. As always more SQL support and better performance, embracing Postgres as an extensible database with new data types. Personally, I think the powerful reporting capabilities really made Postgres a game changer. In 2009 with Postgres 8.4 we got window functions and common table expressions (CTEs), with those in place I personally would never look back to another database.

With Postgres as a solid foundation, around this time we’d start to see its imprint on the broader data ecosystem. Because of the solid code base but also its permissible licensing, many companies took Postgres and began to fork it. In the early to late 2000s the first thing that most of these companies would do is add MPP support to Postgres so it could target more OLAP focused workloads. When you couple that with support for things like window functions and CTEs, you have something new and powerful without having to build it from the ground up. This type of product could shave years off the time it takes to mature as a database. Many of those original forks aren’t still around today, but some live on in other products…

Aster Data → Acquired by Teradata Truviso → Acquired by Cisco Netezza → Acquired by IBM Greenplum → Acquired by EMC ParAccel → never acquired but effectively became RedShift Postgres marches on Despite some proliferation of Postgres forks, Postgres kept doing what it had always done – just keep moving forward.

Postgres 9.0 and 9.1 - 2010 If you ask me, this is the start of Postgres becoming cool. Support for things like listen/notify (pub-sub from your database) and hstore (a key/value data type) really didn’t fit the mold of stodgy old relational databases. Upgrades finally became easier with support for pg_upgrade. We started to get more than just your standard B-Tree index with the arrival of GIN and GiST indexes.

Extensions were always part of Postgres but with some refactors in the integration they became much more readily available to users. And we saw the development of the Postgres foreign data wrapper so you could connect disparate Postgres databases.

Not just the old boring data types and columns and relations. Yet, all of it was built on the same ACID-compliant, trustworthy base.

JSON (made you look) - 2012 If the world hadn’t taken notice of Postgres by now you were in for a wake up call. With the big data wave starting to cool off, but the rise of NoSQL databases (Mongo and Couchbase), it was clear developers wanted a different way of working with their data. Postgres listened, but then sort of cheated with its JSON support. In 9.2 it was JSON validation, but thrown into a text field. In reality we’d have to wait two more years before credible JSON support inside Postgres.

But. That didn’t stop people from starting to take notice. With the rise of Heroku for easy app deployment, Heroku Postgres being the default database, the tides from shared hosting of MySQL databases and your VPS for apps to PaaS, and more dedicated database infrastructure were growing.

JSON for real, for real - 2014 Postgres 9.3 was great and all, we got lateral joins, updatable foreign tables, checksums, and more. But 9.4, we got better JSON in the JSONB data type. This was a binary representation of JSON on disk, this meant GIN indexes could allow you to more easily index your data without having to index on very specific JSON functions. JSONB is still a data type and feature that blows minds when people first hear about it.

Postgres isn’t only for app developers wanting shiny features. Logical decoding laid the groundwork for future years of easier change data capture (CDC) in Postgres. Refreshing materialized views allowed for richer reporting use cases. Background workers enabled more functionality and creative use cases in particular for extensions.

It was around that 9.3 timeline that Amazon announced support at Re:Invent for PostgreSQL support on RDS. Having been at most Re:Invents and being in the room, it is the only one I can recall a complete standing ovation from the audience. The speaker was noticeably thrown off by the duration of the applause, this wasn’t Amazon announcing a new thing it was them caving in to constant customer requests.

9.5, 9.6, 10 - 2016 At this point we started to get to the point where headline features were less common, instead we see a theme of steady performance improvements and a continuation to enhance existing features. JSONB got support for inline updates, and we started to see more parallel execution come in. But it wasn’t all small updates, depending on your needs there may have been a major feature in here that you’d been dying for. A few highlights include:

Row level security

Logical replication

Table partitioning

This marching forward over several year was still largely the result of individual contributors. We also saw some concerted efforts of companies investing in particular areas like EnterpriseDB (focusing on Oracle compatibility), 2ndQuadrant (focusing on replication), Postgres Pro (focusing on JSONB), and Crunchy Data (focusing on security and cloud native).

Extensions - There all along, but a new resurgence Postgres extensions are like libraries, they change Postgres behavior without contributing to the main code. Postgres had long shipped with contrib extensions, some of these were heavily used and adopted… hstore was the key/value store that people loved before JSONB arrived in Postgres. pg_stat_statements got some major improvements making it easily one of the most valuable extensions in Postgres.

Outside of contrib, PostGIS moved in parallel to Postgres, transforming Postgres into the world’s most powerful and feature rich geospatial database. It wasn’t just a one trick pony, it introduced new operators, new functions, and data types. For a while PostGIS was the highlight example, first we saw some clever new extensions arrive like HyperLogLog. Then we saw entire companies be built around this idea of extending Postgres, first Citus, then Timescale, and now a plethora of companies.

So when was it? At its core maybe it has always been the same as it is today, a reliable database you can trust – after all isn’t that what you want in a database? And yet, other new shiny things may capture your attention for a short time only for you to come back to the reliable stable workhorse we’ve had for over 20 years.

Was it Heroku making it the default so every Rails developer just started with Postgres? Has to be given some credit.

Even today, people are still shocked to learn of the rich JSON support in Postgres, and it’s been there for over 10 years now.

The richness of its SQL support, the continued performance improvements and flexibility in how your work with your data can’t be dismissed.

My personal belief, had it not been for the combination of Heroku and JSON support there wouldn’t have been pressure for the hyperscalers to add support – had the hyperscalers not supported it you’d simply pick the more available and accessible thing.

But without the continued improvement on performance and features with a steady release cycle and focus on quality it would have never had the chance. To that the credit always goes to the developers and committers that contribute often thankless work for making Postgres the solid piece of software it is.

Enjoy this article?