Determine better indexing and partitioning with hypopg

I ran into the challenge to determine the best partitioning key on an Amazon RDS Postgres database. Normal approach would be to start a project trying to implement this on a development environment. But in most cases, like during a QuickScan, the customer expects an educated guess in as little time as possible.\

It concerns running production environments where you would rather not release all kinds of scripts and analysis tooling to avoid disrupting the primary process. Which then also takes up unnecessary extra time.

Extensions available

PostgreSQL has a lot of extensions available to aid in all sorts of things. Also for determining the best partitioning key. Part of the challenge is that not all extensions are supported on Amazon RDS PostgreSQL. You can find a list here of supported extension by version here:

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html#PostgreSQL.Concepts

HypoPG

I decided to have a go with HypoPG and used the following approach:

spinned up a Postgres 11 local Vagrant
downloaded the beta version because that one has partitioning in it https://github.com/HypoPG/hypopg/releases/tag/2.0.0beta
extracted the software and tried to build the extension, but it required some additional packages and variables:

tar xvfz 2.0.0beta.tar.gz cd hypopg-2.0.0beta/ export PATH=$PATH:/usr/pgsql-11/bin/ yum install postgresql11-devel -y yum install postgresql11-libs -y yum install epel-release -y yum install centos-release-scl -y yum install postgresql11-devel -y yum install gcc -y make install

now the extension is ready for use and can be enabled in a user database with:
CREATE EXTENSION hypopg;In my specific case I wanted to test partitioning on a column called user_id. I first made sure this column was part of every unique constraint, as well as the primary key. In most cases I had to add the column to the unique indexes and primary key. Next steps were:select hypopg_partition_table('shop_order','PARTITION BY RANGE(user_id)'); select tablename from hypopg_add_partition('shop_order_1_10000', 'PARTITION OF shop_order FOR VALUES FROM (1) TO (10000)'); select tablename from hypopg_add_partition('shop_order_10000_20000', 'PARTITION OF shop_order FOR VALUES FROM (10000) TO (20000)'); .... SELECT * FROM hypopg_create_index('CREATE INDEX ON shop_order (user_id)');Executing an explain or explain analyze will show you the use of the partitioning.
Example:
SELECT "shop_order"."uuid" FROM "shop_order" WHERE ("shop_order"."user_id" = 39626 AND "shop_order"."processed_at" IS NULL);

This query has the following current execution plan:
Bitmap Heap Scan on shop_order (cost=51.42..4297.60 rows=490 width=16)
Recheck Cond: ((user_id 39626))
Filter: (processed_at IS NULL)
-> Bitmap Index Scan on shop_order_list_idx (cost=0.00..51.29 rows=1073 width=0)
Index Cond: ((user_id = 39626))

With partitioning on user_id the query plan would look like:
Append (cost=0.03..8.06 rows=1 width=16)
-> Index Scan using btree_shop_order_user_id_order_cr on shop_order shop_order_30000_40000 (cost=0.03..8.05 rows=1 width=16)
Index Cond: ((user_id = 39626))
Filter: (processed_at IS NULL)

Simulate in the real environment without directly affecting that environment

In live production environments it is often difficult to make real-time changes or make partitioning analyses. This takes a lot of time and you simply don’t have it. With HypoPG you simulate ‘session-bound’ without directly influencing the real environment, while you do use that real environment. This gives you a realistic picture of what you want to analyze without having to set up a development environment.

Using each other’s strength

If you want to create an index analysis in another situation, it works the same way. HypoPG is a wonderful tool and illustrates what it is like to work in a valuable community with open source. By working together in this way, you make use of both the available tooling and each other’s strengths.

Curious

U can always contact us.