Having covered SSL certificate creation and the use of certificate authorities (CA), I would like to put it all together and show how certificates and certificate authorities work to ensure trusted Postgres communication.
I have created a diagram showing server, client, and certificate authority certificates. None of these certificates is secret, e.g. the server sends its SSL certificate to the client, and visa versa. In the diagram, the server and client use the same certificate authority certificate. (Intermediate certificate authorities could also be used.)
When the client connects, the server sends its certificate to the client. The client uses the public key in its certificate authority certificate to verify that the server certificate was signed by its trusted certificate authority (the red line). It then uses the public key in the server certificate to encrypt a secret key that is sent to the server. Only a server with the matching private key can reply to generate a session key. It is not the possession of the server certificate that proves the server's identity but the possession of the private key that matches the public key stored in the server's certificate. The same is true for client certificates used for client host and user authentication (the blue line).
Indexes are an essential tool for optimizing database performance and are becoming ever more important with big data. However, as the volume of data increases, index maintenance often becomes a write bottleneck, especially for advanced index types which use a lot of CPU time for every row that gets written. Index creation may also become prohibitively expensive as it may take hours or even days to build a new index on terabytes of data in postgres. As of Citus 6.0, we’ve made creating and maintaining indexes that much faster through parallelization.
Citus can be used to distribute PostgreSQL tables across many
machines. One of the many advantages of Citus is that you can keep
adding more machines with more CPUs such that you can keep
increasing your write capacity even if indexes are becoming the
bottleneck. As of Citus
6.0 CREATE INDEX can also be performed in a
massively parallel fashion, allowing fast index creation on large
tables. Moreover, the
COPY command can write multiple rows in parallel when
used on a distributed table, which greatly improves performance for
use-cases which can use bulk ingestion (e.g. sensor data, click
streams, telemetry).
To show the benefits of parallel indexing, we’ll walk through a small example of indexing ~200k rows containing large JSON objects from the GitHub archive. To run the examples, we set up a formation using Citus Cloud consisting of 4 worker nodes with 4 cores each, running PostgreSQL 9.6 with Citus 6.
You can download the sample data by running the following commands:
wget http://examples.citusdata.com/github_archive/github_events-2015-01-01-{0..24}.csv.gz
gzip -d github_events-*.gz
Next lets create the table for the GitHub events once as a regular PostgreSQL table and then distribute it across the 4 nodes:
CREATE TABLE github_events (
event_id bigint,
event_type text,
event_public boolean,
repo_id bigint,
payload jsonb,
repo jsonb,
actor jsonb,
org jsonb,
created_at timestamp
);
-- (distributed table only) Shard the table [...]For a while now, I've been annoyed with how difficult it is to set up good backups in PostgreSQL. The difficulty of doing this "right" has pushed people to use things like pg_dump for backups, which is not really a great option once your database reaches any non-toy size. And when visiting customers over the years I've seen a large number of home-written scripts to do PITR backups, most of them broken, and most of that breakage because the APIs provided were too difficult to use.
Over some time, I've worked on a number of ways to improve this situation, alone or with others. The bigger steps are:
For 10.0, there are a couple of new things that have been done in the past couple of weeks:
While reviewing my notes on some handy Postgres tricks and nasty gotchas to conclude in on-site training course my “current me” again learned some tricks which an older version of “me” had luckily wrote down. So here are two simple tricks that hopefully even a lot of Postgres power-users find surprising. Disabling JOIN re-ordering by […]
The post Two simple Postgres tips to kick-start year 2017 appeared first on Cybertec - The PostgreSQL Database Company.
Here are some news from the front of Postgres 10 development, with the highlight of the following commit:
commit: de41869b64d57160f58852eab20a27f248188135
author: Tom Lane <[email protected]>
date: Mon, 2 Jan 2017 21:37:12 -0500
Allow SSL configuration to be updated at SIGHUP.
It is no longer necessary to restart the server to enable, disable,
or reconfigure SSL. Instead, we just create a new SSL_CTX struct
(by re-reading all relevant files) whenever we get SIGHUP. Testing
shows that this is fast enough that it shouldn't be a problem.
In conjunction with that, downgrade the logic that complains about
pg_hba.conf "hostssl" lines when SSL isn't active: now that's just
a warning condition not an error.
An issue that still needs to be addressed is what shall we do with
passphrase-protected server keys? As this stands, the server would
demand the passphrase again on every SIGHUP, which is certainly
impractical. But the case was only barely supported before, so that
does not seem a sufficient reason to hold up committing this patch.
Andreas Karlsson, reviewed by Michael Banck and Michael Paquier
Discussion: https://postgr.es/m/[email protected]
This has been wanted for a long time. In some environments where Postgres is deployed, there could be CA and/or CLR files installed by default, and the user may want to replace them with custom entries. Still, in most cases, the problems to deal with is the replacement of expired keys. In each case, after replacing something that needs a reload of the SSL context, a restart of the instance is necessary to rebuild it properly. Note that while it may be fine for some users to pay the cost of an instance restart, some users caring about availability do not want to have to take down a server, so this new feature is most helpful for many people.
All the SSL parameters are impacted by this upgrade, and they are the following ones:
Note however that there
[...]There are a smorgasbord of database engines out there. From an outside perspective, Postgres is just another on a steadily growing pile of structured data storage mechanisms. Similarly to programming languages like Rust and Go, it’s the new and shiny database systems like MongoDB that tend to garner the most attention. On the other hand, more established engines like Oracle or MySQL have a vastly larger lead that seems insurmountable. In either case, enthusiasm and support is likely to be better represented in exciting or established installations.
So why? Why out of the myriad choices available, use Postgres? I tend to get asked this question by other DBAs or systems engineers that learn I strongly advocate Postgres. It’s actually a pretty fair inquiry, so why not make it the subject of the first PG Phriday for 2017? What distinguishes it from its brethren so strongly that I staked my entire career on it?
Postgres isn’t new. It didn’t enjoy the popularity that practically made MySQL a household name as part of the LAMP stack. It didn’t infiltrate corporations several decades ago as the de facto standard for performance and security like Oracle. It isn’t part of a much larger supported data environment like SQL Server. It isn’t small and easy like SQLite. It’s not a distributed hulk like Hadoop, or enticingly sharded like MongoDB or Cassandra. It’s not in-memory hotness like VoltDB.
It’s just a regular, plain old ACID RDBMS.

It does after all, have all of the basics many expect in an RDBMS:
Certain… other database platforms weren’t so complete. As a consequence, Postgres was preferred by those who knew the difference and needed that extra functionality without breaking the bank. It’s not much, but it’s a great way to develop a niche. From there, things get more interesting.
For the most part, being boring but reliable was a fact of life until the release of 9.0 when Postgres introduced s
[...]Having covered the choice of certificate authorities, I want to explain the internals of creating server certificates in Postgres. The instructions are already in the Postgres documentation.
When using these instructions for creating a certificate signing request (CSR), two files are created:
(It is also possible to use an existing key file.) You can view the contents of the CSR using openssl, e.g.:
Using Postgres with python is easy and provides first class database capabilities for applications and data processing. When starting a new project, you need to choose which PostgreSQL python driver to use. The best choice depends on your deployment, python version, and background. Here’s a quick guide to choosing between three popular python PostgreSQL drivers:
Psycopg2
This is the most common and widely supported Python PostgreSQL
driver. It provides a DB-API compatible wrapper on top of the
native libpq postgresql library. It supports automatic mapping of
query results to python dictionaries as well as named tuples, and
is commonly used by ORM’s and frameworks, including SQLAlchmey.
Because psycopg2 uses libpq, it supports the same environment
variables as libpq for connection properties (PGDATABASE, PGUSER,
PGHOST, etc) as well as using a .pgpass file. And, supports
COPY directly for bulk loading. If you use Postgres
from multiple languages this driver will feel the most “native
PG.”
But, the libpq dependency requires libpq-dev and python-dev packages on Linux or install packages on Windows and macOS for the shared objects/dll’s. Not an issue for a single developer machine or server, but hard to package for a cross-platform Python app (like our BigSQl DevOps management tool) or to use in a PyPy runtime.
pg8000
An actively maintained, pure python DB-API compatible driver that works across OS and in many different Python environments, such as PyPy and Jython. With no native dependencies, it is easy to package pg8000 for distribution with an application with Python 2 or Python 3. pg8000 tries to be focused on just PostgreSQL database access, ignoring some of the convenience functions in libpq. For example, pg800 does not directly support a .pgpass file or environment variables – but you can use the pgpasslib project in conjunction. pg8000 does not include the many Extras included with psycopg2 to retrieve dictionaries or named tuples but does support most data types.
pg8000 is a good choice for distributing with
[...]Most people using PostgreSQL database systems are aware of the fact that the database engine has to send changes to the so called “Write Ahead Log” (= WAL) to ensure that in case of a crash the database will be able to recover to a consistent state safely and reliably. However, not everybody is aware […]
The post Checkpoint distance and amount of WAL appeared first on Cybertec - The PostgreSQL Database Company.
When setting up SSL in Postgres, you can't just enable SSL. You must also install a signed certificate on the server.
The first step is to create a certificate signing request (CSR) file that contains the host name of the database server. Once created, there are three ways to sign a CSR to create a certificate:
If the certificate is to be self-signed, use the key created by the certificate signing request to create a certificate. If using a local certificate authority, sign the CSR file with the local certificate authority's key.
Many DBAs agree that one of the most useful extension in their arsenal is pg_repack. Pg_repack addresses a serious shortcoming in Postgres: doing VACUUM FULL online. Due to the way Postgres handles the MVCC, tables and indexes become bloated. Dead tuples are addressed by AUTOVACUUM and space will be marked free. In many situations a VACUUM FULL becomes unavoidable because AUTOVACUUM just leaves scattered and fragmented free space in the table as it is. DBA may have to do VACUMM FULL to release such free space back to disk.
Unfortunately VACUUM FULL requires an exclusive lock on the table during an operation. This can’t be performed while the table is being used. This is where pg_repack comes to the rescue of the DBA — to perform a VACUUM FULL almost fully online (there could be a momentary lock).
The popularity of this extension among DBAs led the Postgres by BigSQL project to add pg_repack as a ready to install package.
Installing pg_repack is quite easy with the pgcli command line.
$ ./pgc list Displays all installable versions of
extensions. Let’s install repack13-pg96 (the package version
depends on the postgres version we have installed). Installing this
is pretty straightforward:
$ ./pgc install repack13-pg96
Installation of pg_repack doesn’t require either a restart or reload. As is commonly known, the pg_repack installation has 2 components: the actual extension and a client tool to invoke the pg_repack functionality. We can create the extension in the desired database as follows from the psql command interface:
postgres=# \c db1
postgres=# CREATE EXTENSION pg_repack;
CREATE EXTENSION
To create a test environment, run pgbench from the command line:
pgbench -U postgres -i -s 10 db1
This produced a pgbench_accounts table of 128 MB. To create a bloat, I ran the update twice.
db1=# update pgbench_accounts set
abalance=abalance;
This caused the table to grow up to 384 MB. In a couple of minutes, AUTOVACUUM kicked in and cleans up all dead tuples as expected.
db1=# select n_live_tu
Version 2.1 of Barman, backup and recovery manager for PostgreSQL, was released Thursday, Jan. 5.
The new release, along with several bugfixes, introduces preliminary support for the upcoming PostgreSQL 10, and adds the –archive option to the switch-xlog command.
The new –archive option is especially useful when setting up a new server.
Until now, the switch-xlog command used to force the PostgreSQL server to switch to a different transaction log file. Now, Barman also gives the –archive option, which triggers WAL archiving after the xlog switch, and forces Barman to wait for the archival of the closed WAL file.
By default Barman expects to receive the WAL in 30 seconds, the
amount of seconds to wait can be changed using the –archive-timeout
option.
If the switch-xlog command returns an error, it means no WAL file
has been archived, and the Barman server is not able to receive
WALs from PostgreSQL.
This option allows the users to test the entire WAL archiving process, identifying configuration issues.
The Barman dev team is very happy about this small release. Containing primarily bug fixes, it increases the robustness of Barman thanks to the feedback received through the Barman mailing list and the GitHub issue tracker.
If you are interested in helping us by sponsoring the development, even partially, drop us a line ([email protected]).
Website
Download
Online
Documentation
Man
page, section 1
Man
page, section 5
Support
It seems each week when I’m reviewing data with someone a feature comes up that they had no idea existed within Postgres. In an effort to continue documenting many of the features and functionality that are useful, here’s a list of just a few that you may find handy the next time you’re working with your data.
This one I’ve
covered before, but it’s worth restating. Psql is a great
editor that already comes with Postgres. If you’re comfortable on
the CLI you should consider giving it a try. You can even setup
you’re own .psqlrc for it so that it’s well customized
to your liking. In particular turning \timing on is
especially useful. But even with all sorts of customization if
you’re not aware that you can use your preferred editor by using
\e then you’re missing out. This will allow you to
open up the last run query, edit it, save–and then it’ll run for
you. Vim, Emacs, even Sublime text works just take your pick by
setting your $EDITOR variable.
Ever sit at a terminal running a query over and over to see if
something on your system changed? If you’re debugging something
whether locally or even live in production, watching data change
can be key to figuring out. Instead of re-running your query you
could simply use the \watch command in Postgres, this
will re-run your query automatically every few seconds.
```sql SELECT now() –
query_start,
state, query
FROM pg_stat_activity \watch ```
I love JSONB as a datatype. Yes, in cases it won’t be the optimal for performance (though at times it can be perfectly fine). If I’m hitting some API that returns a ton of data, I’m usually not using all of it right away. But, you never know when you’ll want to use the rest of it. I use Clearbit this way today, and for safety sake I save all the JSON result instead of de-normalizing it. Unfortunately, when you query this in Postgres you get one giant compressed text of JSON. Yes, you could pipe out to something like jq, or you could simply use Postgres built in function to make it
[...]When: 6-8pm Thursday Jan 19, 2017
Where: iovation
Who: Mark Wong
What: pglogical
Mark Wong will give an overview of pglogical, the latest replication option for Postgres. It’s a lower-impact option than trigger-based replication, and features include the ability to replicate only the databases and tables you choose from a cluster. Part of the talk will cover use cases and future development plans.
Find out more here: https://2ndquadrant.com/en/resources/pglogical/
Mark leads the 2ndQuadrant performance practice as a Performance
Consultant for English Speaking Territories, based out of Oregon in
the
USA.
—
If you have a job posting or event you would like me to announce at
the meeting, please send it along. The deadline for inclusion is
5pm the day before the meeting.
—
Our meeting will be held at iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!
iovation provides us a light dinner (usually sandwiches or pizza).
Elevators open at 5:45 and building security closes access to the floor at 6:30.
See you there!
During normal server shutdown, sessions are disconnected, dirty shared buffers and pending write-ahead log (WAL) records are flushed to durable storage, and a clean shutdown record is written to pg_control. During the next server start, pg_control is checked, and if the previous shutdown was clean, startup can ignore the WAL and start immediately.
Unfortunately, a clean shutdown can take some time, and impatient database administrators might get into the habit of using kill -9 or pg_ctl -m immediate to quicken the shutdown. While this does have the intended effect, and you will not lose any committed transactions, it greatly slows down the next database startup because all WAL generated since the last completed checkpoint must be replayed. You can identify an unclean shutdown by looking at the server logs for these two ominous lines:
LOG: database system was interrupted; last known up at 2016-10-25 12:17:28 EDT LOG: database system was not properly shut down; automatic recovery in progress
Using a managed service is a very attractive proposition. You are offloading a heck of a lot of worry, especially when it comes to something as complicated and, let’s face it, specialized as a database. Someone else will set it up for you, and back it up, and keep it running, without you having to worry overmuch about it. However, there are downsides. You can only get what the manager is offering. Often that’s good enough. But needs change, and I have often seen people start with managed services, only to find that they want more than they can get that way.
Just yesterday I received a complaint that the Redis Foreign Data Wrapper, which I have done a lot of work on, is not available on Amazon RDS. And that’s completely understandable. Amazon only provide a limited number of extensions, and this isn’t one of them. At least one other managed service, Heroku, does offer this extension, but there are others it doesn’t offer.
So the lesson is: choose your managed service, or even whether to use a managed service at all, very carefully, taking into account both your current needs and your likely future needs.
Today’s post, the first one of 2017, is about the following feature of the upcoming Postgres 10:
commit: 3901fd70cc7ccacef1b0549a6835bb7d8dcaae43
author: Fujii Masao <[email protected]>
date: Mon, 19 Dec 2016 21:15:30 +0900
Support quorum-based synchronous replication.
This feature is also known as "quorum commit" especially in discussion
on pgsql-hackers.
This commit adds the following new syntaxes into synchronous_standby_names
GUC. By using FIRST and ANY keywords, users can specify the method to
choose synchronous standbys from the listed servers.
FIRST num_sync (standby_name [, ...])
ANY num_sync (standby_name [, ...])
The keyword FIRST specifies a priority-based synchronous replication
which was available also in 9.6 or before. This method makes transaction
commits wait until their WAL records are replicated to num_sync
synchronous standbys chosen based on their priorities.
The keyword ANY specifies a quorum-based synchronous replication
and makes transaction commits wait until their WAL records are
replicated to *at least* num_sync listed standbys. In this method,
the values of sync_state.pg_stat_replication for the listed standbys
are reported as "quorum". The priority is still assigned to each standby,
but not used in this method.
The existing syntaxes having neither FIRST nor ANY keyword are still
supported. They are the same as new syntax with FIRST keyword, i.e.,
a priority-based synchronous replication.
Author: Masahiko Sawada
Reviewed-By: Michael Paquier, Amit Kapila and me
Discussion: <CAD21AoAACi9NeC_ecm+Vahm+MMA6nYh=Kqs3KB3np+MBOS_gZg@mail.gmail.com>
Many thanks to the various individuals who were involved in
discussing and developing this feature.
9.6 has introduced the possibility to specify multiple synchronous standbys by extending the syntax of synchronous_standby_names. For example values like ‘N (standby_1,standby_2, … ,standby_M)’ allow a primary server to wait for commit confirmations from N standbys among the set of M nodes defined in the list given by user, depending on
[...]So, what’s weird about this plan, from a query on a partitioned table? (PostgreSQL 9.3, in this case.)
test=> explain select distinct id from orders where order_timestamp > '2016-05-01';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=347341.56..347343.56 rows=200 width=10)
Group Key: orders.id
-> Append (cost=0.00..337096.10 rows=4098183 width=10)
-> Seq Scan on orders (cost=0.00..0.00 rows=1 width=178)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Index Scan using orders_20160425_order_timestamp_idx on orders_20160425 (cost=0.43..10612.30 rows=120838 width=10)
Index Cond: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Seq Scan on orders_20160502 (cost=0.00..80539.89 rows=979431 width=10)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Seq Scan on orders_20160509 (cost=0.00..74780.41 rows=909873 width=10)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Seq Scan on orders_20160516 (cost=0.00..68982.25 rows=845620 width=10)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Seq Scan on orders_20160523 (cost=0.00..65777.68 rows=796054 width=10)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
-> Seq Scan on orders_20160530 (cost=0.00..36403.57 rows=446366 width=10)
Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
(17 rows)
That estimate on the HashAggregate certainly looks wonky, doesn’t it? Just 200 rows even with a huge number of rows below it?
What if we cut down the number of partitions being hit?
test=> explain[...]One of the most common questions from new PostgreSQL users is “how do I connect to the database server?” The nuances of pg_hba.conf and how to correctly enable your Python web app to connect to your db server without opening up connections to all users and all servers is not simple for someone new to Postgres. And architecting a secure multi-tenant SaaS system requires knowledge of how roles, schemas, databases, and search paths interact. That’s one reason we wrote a Security Whitepaper a while back.
But, after seeing thousands of MongoDB instances taken hostage by ransomware the “no authorization required” default for MongoDB is looking like a very dumb idea. Just imagine what executives whose developers picked MongoDB are saying today:
“You mean we store our client list in a database without security?
“Anyone can just delete our NoSQL database from the internet?”
“Were we hacked last year when you said we lost data in MongoDB?”
So, a quick “Thank You” to PostgreSQL for making sure that your data is Secure By Default.
Mirror mirror on the wall,
Who is the fairest Database of all?
A frequently asked question, certainly.
DB-Engines recently announced it’s DBMS of the Year. Maybe the cool thing is that PostgreSQL is in 3rd Place. Yee-ha, an open source project is up there!
Let’s look closely about what this means.
PostgreSQL.org’s agreed response was this…
“It’s great to see the continued success of PostgreSQL being reflected in DB-Engines rankings. It’s clear that the strength of the following for the World’s Most Advanced Open Source Database is enough to outweigh the largest software companies as people continue to choose to move away from commercial databases.”
though because of commercial sensitivity this was changed down to this
“It’s great to see the continued success of PostgreSQL being reflected in DB-Engines rankings. It’s clear that the strength of the following for the World’s Most Advanced Open Source Database is enough to draw people away from commercial databases.”
What were the commercial sensitivities? (What about “open source sensitivies”? Well, blame me, cos I agreed the change.)
Well, the title of the post is that a Microsoft product is actually DBMS of the Year, even though it’s not ranked #1 on the main list, that’s still Oracle. And Postgres is #3 on DBMS of the Year, even though we moved through to #5 again, competing with MongoDB for position 4 (although mostly level).
My guess is that Microsoft would like to highlight how it gets more press than PostgreSQL, a point I would concede in an instant. Whether that means it is more popular or has better features is a different thing entirely. People are simply leaving commercial databases in droves to come to PostgreSQL, which is clearly reflected in the very public decline of Oracle licencing revenues over the last 10 quarters and I’m sure its just the same for Microsoft revenue.
The purpose of the announcement from PostgreSQL.org was to highlight that “the strength of the following for the World’s Most Advanced Open Source Database is enough to outweigh t
[...]Unlike other database systems, Postgres makes the cleanup process visible and tunable to users. Autovacuum performs recycling of old rows and updates optimizer statistics. It appears in ps command output, the pg_stat_activity system view, and optionally in the server logs via log_autovacuum_min_duration.
Postgres also allows fine-grained control over the autovacuum cleanup process. Occasionally users find that cleanup is slowing the system down, and rather than modifying the behavior of autovacuum, they decide to turn it off via the autovacuum setting.
However, turning off autovacuum can cause problems. Initially the system will run faster since there is no cleanup overhead, but after a while old rows will clog up user tables and indexes, leading to increasing slowness. Once that happens, you can turn on autovacuum again, and it will recycle the old rows and free up space, but there will be much unused space that can't be reused quickly, or perhaps ever.
As we say welcome to a new year, we have a couple of updates to the finances and payment handling in PostgreSQL Europe, that will affect our members and attendees of our events.
First of all, PostgreSQL Europe has unfortunately been forced to VAT register. This means that most of our invoices (details below) will now include VAT.
Second, we have enabled a new payment provider for those of you that can't or prefer not to use credit cards but that still allows for fast payments.
A few weeks back, I noticed the following tweet from Michael Paquier:

And my first thought was "that can't be right" (spoiler: Turns out it wasn't. But almost.)
The second thought was "hmm, I wonder how that has actually changed over time". And of course, with today being a day off and generally "slow pace" (ahem), what better way than to analyze the data that we have. The PostgreSQL mailinglist archives are all stored in a PostgreSQL database of course, so running the analytics is a quick job.
The one thing that everyone knows about compositive indexes is: If you have an index on (A, B, C), it can’t be used for queries on (B) or (B, C) or (C), just (A), (A, B) or (A, B, C), right? I’ve said that multiple times in talks. It’s clearly true, right?
Well, no, it’s not. It’s one of those things that is not technically true, but it is still good advice.
The documentation on multi-column indexes is pretty clear:
A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned.
Let’s try this out!
First, create a table and index:
xof=# CREATE TABLE x (
xof(# i integer,
xof(# f float,
xof(# g float
xof(# );
CREATE TABLE
xof=# CREATE INDEX ON x(i, f, g);
CREATE INDEX
And fill it with some test data:
xof=# INSERT INTO x SELECT 1, random(), random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# INSERT INTO x SELECT 2, random(), random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# INSERT INTO x SELECT 3, random(), random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# ANALYZE x;
ANALYZE
And away we go!
xof=# EXPLAIN ANALYZE SELECT SUM(g) FROM x WHERE f BETWEEN 0.11 AND 0.12;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=599859.50..599859.51 rows=1 width=8) (actual time=91876.057..91876.057 rows=1 loops=1)
-> Index Only Scan using x_i_f_g_idx on x (cost=0.56..599097.71 rows=304716 width=8) (actual time=1820.699..91652.409 rows=300183 loop[...]
auth sufficient pam_google_authenticator.so
On a PostgreSQL primary / secondary pair, it’s very important to monitor replication lag. Increasing replication lag is often the first sign of trouble, such as a network issue, the secondary disconnecting for some reason (or for no reason at all, which does happen rarely), disk space issues, etc.
You can find all kinds of complex scripts that do math on the various WAL positions that are available from the secondary and from pgstatreplication on the primary.
Or you can do this. It’s very cheap and cheerful, and for many installations, it gets the job done.
First, on the primary (and thus on the secondary), we create a one-column table:
CREATE TABLE replication_monitor (
last_timestamp TIMESTAMPTZ
);
Then, we insert a singel row into the table (you can probably already see where this is going):
INSERT INTO replication_monitor VALUES(now());
Having that, we can start a cron job that runs every minute, updating that value:
* * * * * /usr/bin/psql -U postgres -c "update replication_monitor set last_update=now()" postgres > /dev/null
On the secondary (which is kept in sync with the primary via
NTP, so make sure ntpd is running on both!), we have a script, also
run from cron, that complains if the value in has fallen more than
a certain amount behind now(). Here’s a (pretty basic)
Python 2 version:
#!/usr/bin/python
import sys
import psycopg2
conn = psycopg2.connect("dbname=postgres user=postgres")
cur = conn.cursor()
cur.execute("select (now()-last_update)>'5 minutes'::interval from replication_monitor")
problem = cur.fetchone()[0]
if problem:
print >>sys.stderr, "cat1r replication lag over 5 minutes."
We make sure we get the output from stderr for cron
jobs on the secondary, set it up to run every so often, and we’re
done!
Of course, this has its limitations:
It only has ±2 minutes of resolution, based on how often cron runs. For a basic “is replication working?” check, this is probably fine.
It creates traffic in the replication stream and WAL, but if you are really worried about a TIM
It is too late for stories about Christmas, so here is a Postgres story about the following commit of Postgres 10:
commit: 6ef2eba3f57f17960b7cd4958e18aa79e357de2f
author: Andres Freund <[email protected]>
date: Thu, 22 Dec 2016 11:31:50 -0800
Skip checkpoints, archiving on idle systems.
Some background activity (like checkpoints, archive timeout, standby
snapshots) is not supposed to happen on an idle system. Unfortunately
so far it was not easy to determine when a system is idle, which
defeated some of the attempts to avoid redundant activity on an idle
system.
To make that easier, allow to make individual WAL insertions as not
being "important". By checking whether any important activity happened
since the last time an activity was performed, it now is easy to check
whether some action needs to be repeated.
Use the new facility for checkpoints, archive timeout and standby
snapshots.
The lack of a facility causes some issues in older releases, but in my
opinion the consequences (superflous checkpoints / archived segments)
aren't grave enough to warrant backpatching.
Author: Michael Paquier, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Amit Kapila, Kyotaro HORIGUCHI
Bug: #13685
Discussion:
https://www.postgresql.org/message-id/[email protected]
https://www.postgresql.org/message-id/CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkYv_45dKwA@mail.gmail.com
Backpatch:
Postgres 9.0 has been the first release to introduce the parameter wal_level, with three different values:
In 9.6, “archive” and “hot_standby” have been merged into “replica” as both levels have no difference in
[...]I spend a lot of time trying to learn what’s already been implemented, as DBA’s, we tend to live in that world. It’s important that you have tools that allow you to quickly get the visual so you can get on with your job. One of the biggest ‘reverse engineering’ tasks that DBA’s have to do is in the area of data modeling. There’s a lot of tools that do this, but my favorite of late is SchemaSpy (schemapy.sf.net). I’ve found that the output is complete and the ERD’s are easy to read.
java -jar schemaSpy_5.0.0.jar -t pgsql -host
localhost:5432 -db postgres \
-dp ~/software/pg/drivers/jdbc/postgresql-9.4.1208.jre6.jar \
-o ERD -u postgres -s myschema
SchemaSpy is nice because it can talk to many different types of databases… a side effect of this is a fairly complex set of commandline switches.
-t specifies the type, -t pgsql specifies postgres
-o specifies a directory to create the output in
All you need in order to run this is the SchemaSpy jar and a postgresql jdbc driver. In the output directory ‘ERD’. Pulling up ERD/index.html gives you a nice page with a table list and some basics.

As a DBA, I really love the ‘Insertion Order’ and ‘Deletion Order’ links here. SchemaSpy reverse engineers the referential integrity chain for me! Clicking either one gives me a simple page, top to bottom with the right order!

Now for the real reason that I super-love SchemaSpy. The
‘Relationships’ tab. I’ve loaded up a pgbench schema. pgbench
doesn’t actually create any real keys, but column names are
consistent. SchemaSpy notices this and ‘infers’ relationships for
me! This is huge, even without explicity keys, I can begin to infer
what the developer intended (the estimated rowcount is helpful too


I won’t force you to follow me through all of the tabs. If you’re looking for a schema visualization tool, give SchemaSpy a try.
Happy PostgreSQL-ing!