It is too late for stories about Christmas, so here is a Postgres story about the following commit of Postgres 10:
commit: 6ef2eba3f57f17960b7cd4958e18aa79e357de2f
author: Andres Freund <[email protected]>
date: Thu, 22 Dec 2016 11:31:50 -0800
Skip checkpoints, archiving on idle systems.
Some background activity (like checkpoints, archive timeout, standby
snapshots) is not supposed to happen on an idle system. Unfortunately
so far it was not easy to determine when a system is idle, which
defeated some of the attempts to avoid redundant activity on an idle
system.
To make that easier, allow to make individual WAL insertions as not
being "important". By checking whether any important activity happened
since the last time an activity was performed, it now is easy to check
whether some action needs to be repeated.
Use the new facility for checkpoints, archive timeout and standby
snapshots.
The lack of a facility causes some issues in older releases, but in my
opinion the consequences (superflous checkpoints / archived segments)
aren't grave enough to warrant backpatching.
Author: Michael Paquier, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Amit Kapila, Kyotaro HORIGUCHI
Bug: #13685
Discussion:
https://www.postgresql.org/message-id/[email protected]
https://www.postgresql.org/message-id/CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkYv_45dKwA@mail.gmail.com
Backpatch:
Postgres 9.0 has been the first release to introduce the parameter wal_level, with three different values:
In 9.6, “archive” and “hot_standby” have been merged into “replica” as both levels have no difference in
[...]I spend a lot of time trying to learn what’s already been implemented, as DBA’s, we tend to live in that world. It’s important that you have tools that allow you to quickly get the visual so you can get on with your job. One of the biggest ‘reverse engineering’ tasks that DBA’s have to do is in the area of data modeling. There’s a lot of tools that do this, but my favorite of late is SchemaSpy (schemapy.sf.net). I’ve found that the output is complete and the ERD’s are easy to read.
java -jar schemaSpy_5.0.0.jar -t pgsql -host
localhost:5432 -db postgres \
-dp ~/software/pg/drivers/jdbc/postgresql-9.4.1208.jre6.jar \
-o ERD -u postgres -s myschema
SchemaSpy is nice because it can talk to many different types of databases… a side effect of this is a fairly complex set of commandline switches.
-t specifies the type, -t pgsql specifies postgres
-o specifies a directory to create the output in
All you need in order to run this is the SchemaSpy jar and a postgresql jdbc driver. In the output directory ‘ERD’. Pulling up ERD/index.html gives you a nice page with a table list and some basics.

As a DBA, I really love the ‘Insertion Order’ and ‘Deletion Order’ links here. SchemaSpy reverse engineers the referential integrity chain for me! Clicking either one gives me a simple page, top to bottom with the right order!

Now for the real reason that I super-love SchemaSpy. The
‘Relationships’ tab. I’ve loaded up a pgbench schema. pgbench
doesn’t actually create any real keys, but column names are
consistent. SchemaSpy notices this and ‘infers’ relationships for
me! This is huge, even without explicity keys, I can begin to infer
what the developer intended (the estimated rowcount is helpful too


I won’t force you to follow me through all of the tabs. If you’re looking for a schema visualization tool, give SchemaSpy a try.
Happy PostgreSQL-ing!
Relational databases are the first choice of data store for many applications due to their enormous flexibility and reliability. Historically the one knock against relational databases is that they can only run on a single machine, which creates inherent limitations when data storage needs outpace server improvements. The solution to rapidly scaling databases is to distribute them, but this creates a performance problem of its own: relational operations such as joins then need to cross the network boundary. Co-location is the practice of dividing data tactically, where one keeps related information on the same machines to enable efficient relational operations, but takes advantage of the horizontal scalability for the whole dataset.
The principle of data co-location is that all tables in the database have a common distribution column and are sharded across machines in the same way such that rows with the same distribution column value are always on the same machine, even across different tables. As long as the distribution column provides a meaningful grouping of data, relational operations can be performed within the groups.
In a previous article we described how to shard a multi-tenant application with co-location as the underlying technique. In this article, we zoom into how data co-location works in Citus and the benefits it provides.
The Citus extension for PostgreSQL is unique in being able to form a distributed database of databases. Every node in a Citus cluster is a fully functional PostgreSQL database and Citus adds the experience of a single homogenous database on top. While it does not provide the full functionality of PostgreSQL in a distributed way, in many cases it can take full advantage of features offered by PostgreSQL on a single machine through co-location, including full SQL support, transactions and foreign keys.
In Citus a row is stored in a shard if the hash of the value in the distribution column falls within the shard’s hash range. T
[...]Do you know situations with unstable runtimes? You have a query, which is sometimes slow and sometimes just lightning fast. The question is usually: Why? Maybe this little post can be inspiring and shade some light one some of your performance issues, which might bug you in your daily life. Reasons for unstable runtimes There […]
The post Detecting unstable runtimes in PostgreSQL appeared first on Cybertec - The PostgreSQL Database Company.
wal_log_hints has to be
enabled Postgres 10 has support for an additional feature related to replication slots, a facility holding a way to retain WAL data in pg_xlog depending on the data consumed by clients connected to it. The feature spoken about here has been implemented by the following commit:
Add support for temporary replication slots
This allows creating temporary replication slots that are removed
automatically at the end of the session or on error.
From: Petr Jelinek <[email protected]>
As the commit log already mentions, replication slots have the possibility to be made temporary. When created, they are associated with the connection that created them, and once the connection session finishes, the slots are automatically dropped. Note as well that the slots have no consistent data on disk, so on crash those are of course dropped as well (it’s not like Postgres crashes a lot anyway, per its reputation for stability).
One direct application of this feature is the case where WAL segments are needed by this client and that they absolutely need this data to be consistent with what has been done and that the slot is a one-shot need. In existing Postgres versions, any application can create a slot, be it via SQL or the replication protocol, though any failure results in a cleanup logic that needs to be done or WAL would be retained infinitely if the cleanup work does not happen. At this end that would crash the backend once the partition holding pg_xlog gets full.
Once this commit has been done, a patch has been sent for pg_basebackup to make use of temporary slots. This is a perfect match for the use of temporary replication slots as many users are already using the stream mode of pg_basebackup to fetch the WAL segments with a secundary replication connection to be sure that segments are included to make a full consistent backup that can be used as-is when restoring an instance.
The main risk of using pg_basebackup without replication slots is to not be able to take a consistent backup as WAL segments may have been alr
[...]Postgres is great, but it can’t run itself in all cases. Things come up. Queries go awry. Hardware fails, and users leave transactions open for interminable lengths of time. What happens if one of these things occur while the DBA themselves has a hardware fault? While they’re down for maintenance, someone still has to keep an eye on things. For the last PG Phriday of the year completely unrelated to my upcoming surgery, let’s talk about what happens when your DBA becomes inoperative due to medical complications.
This is Fake Postgres DBA 101!
When in doubt, SSH is the name of the game. Database accounts
are either locked-down or lack sufficient permissions to do
everything, but usually the postgres user itself has total access.
Invoke your favorite SSH client to connect to the host running the
database in question, and use sudo to become the
postgres user:
ssh my-db-host sudo su -l postgres |
Afterwards, it’s a good idea to add our public key to the
postgres user’s .ssh/authorized_keys file so we can
log in as the postgres user without the intermediate sudo. Here’s a
good
guide for doing that. If there’s configuration management like
salt or Puppet involved, that file is probably part of the stack
and needs to be modified there or it’ll be overwritten.
Either way, we’re in. If we’re lucky enough to be on a Debian
derivative like Ubuntu or Mint, we can use the
pg_lsclusters command to see which database instances
are running on this server. Here’s what that looks like on a sample
VM:
pg_lsclusters Ver Cluster Port Status Owner Data directory Log file 9.5 main 5432 online postgres /data/pgsql/main/data /var/log/postgresql/postgresql-9.5-main.log 9.6 96test 5440 online postgres /data/pgsql/96test/data /var/log/postgresql/postgresql-9.6-96test.log |
The next thing we need is the psql command-line
client—the nexus of Postgres interoperability. Postgres instances
monitor the listed port for connection attempts, so if we wanted to
target a specific instance, we’d want to
If you are using transaction log shipping (or Point-In-Time-Recovery) in PostgreSQL to keep your data safe, you will surely already know about pg_basebackup. The idea behind pg_basebackup is to allow users to create a binary copy of the data, which can serve as the basis for Point-In-Time-Recovery. However, recently we have seen some support cases, […]
The post pg_basebackup: Creating self-sufficient backups appeared first on Cybertec - The PostgreSQL Database Company.
There is a nice lie out there. A lot of people want to believe it. They think by believing this lie it will somehow increase something for them. In some ways that is true. If you want what you are doing to be about you. If you are a believer in Open Source, it isn't about you. It is about the community and bettering that community as a whole.
If we provide the videos of our sessions for free, you won't attend the conference.
The PgConf US conference grows every year and guess what, they provide their videos for free.
If you pay for the conference, we will provide the videos for free.
LinuxFest Northwest which is larger than PgConf US, PgConf EU, Postgres Vision and Char(16) combined also grows every year and provides their videos for free. It is also free to attend.
The next time someone says, "You won't show if we give you videos" or "Maybe if you paid to come", I recommend you attend an Open Source conference that follows Open Source values.
That isn't to say that it isn't a conference's right to not give free videos. Of course it is. It is also your right to only attend conferences about Open Source that embrace what Open Source is about. There is a reason I only speak at non-profit or community run conferences. I want to contribute to the greater good. That is also the reason why Command Prompt only sponsors community run or non-profit conferences.
That is also not to say that for-profit conferences about Open Source are bad. They aren't and more power to you if you can find a way to make money off a conference about Open Source. It is to say that Open Source conferences should follow Open Source values. If you are going to record and you are an Open Source conference, give the recording away.
Merry Christmas & Tally Ho!
Citus is a distributed database that extends (not forks) PostgreSQL. Citus does this by transparently sharding database tables across the cluster and replicating those shards.
After open sourcing Citus, one question that we frequently heard from users related to how Citus replicated data and automated node failovers. In this blog post, we intend to cover the two replication models available in Citus: statement-based and streaming replication. We also plan to describe how these models evolved over time for different use cases.
Going back to Citus’ first version, one of the early use cases that we looked at was analyzing large volumes of event data in real-time. This data has the nice property that they are append-only and have a natural time dimension. Users could therefore batch those events together and load them into the distributed cluster.
These properties of event data enabled parallel loads of events data, without sacrificing from consistency semantics, relatively easy. A coordinator node would keep metadata related to shards and shard placements (replicas) in the cluster. Clients would then talk to the coordinator node and exchange metadata on which shards to append events data. Once a client appended related events data to the related shards, the client would conclude the operation by updating the shard metadata on the coordinator node.

The simplified diagram above shows an example data load. The client tells the coordinator node that it wants to append events data to an append-distributed table. The coordinator node grants the client information about shard 6’s placements. The client then copies these events to the shard’s placements and updates the coordinator with related metadata. If the client fails to copy events to one of the nodes, it can either mark the related shard placement as invalid or abort the copy operation.
There are three important points to highlight in this example.
CHAR(16) Conference was a live conference in NYC held on 6 Dec 2016, focusing on Scalability for Business.
The Speaker Panel was recorded, so you can find out what happened here: https://youtu.be/eLjBoZr4dTc (1 hour)
Eric David Benari hosted an hour long “chat show” format, featuring
Other videos? No, sorry. Obviously, if you have to pay to attend, but we post the videos for free, then fewer people will attend. So next time please come along for the conference and find out what’s happening, ask questions and enjoy chatting with the attendees.
Hope to see you next year.
When I talk to various people about PostgreSQL,
based on where they are in the process of analyzing it, I get asked
many similar questions. One of the most frequently asked questions
is: What is the PostgreSQL licensing model? Here is my attempt to
address this question from the perspective of what the PostgreSQL
license means for your
business.
Let’s be honest with ourselves, we are really not in the habit of reading license agreements – they are best left to the legal department to decipher. And seriously, licenses have become so complex and so detailed that even if I try really really hard to read just a few clauses, my head starts to hurt.
The PostgreSQL License is different though.
I have reproduced the license in full at the end of this blog. Take a look. I guarantee it will take less than 5 minutes and you will understand every word of what it says. Straightforward, transparent, simple to understand – and this is just the beginning of how PostgreSQL makes your life easier.
Yup, you read that right. PostgreSQL is absolutely free to use, forever. All you need to do is download the database, install it, and start using it. You don’t need to get into any contract negotiations, you don’t need to worry about asking for permission from anyone, you don’t need to haggle over pricing.
By now you are probably thinking, “Yeah it’s free but that’s probably the limited edition. The features my organization needs are likely in the enterprise edition, and I would need to sell my kidney in order to get it.” You’d be wrong. There are no ‘editions’ of PostgreSQL. All features are available for you to use in the community version, which is free, forever.
Now you may be thinking – “Ok, so the community probably imposes various different limits to force organizations to pay before they can really scale”. Wrong again!
The PostgreSQL community does not impose any limits to the number of servers you install the database on, the amount of
[...]Since I have a passing interest in partitioning in PostgreSQL, I figured I’d check out a recent commit to the development branch of PostgreSQL 10
Implement table partitioning – https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63
Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others.[...]
First – if the word “Bloom” doesn’t yet resonate with you in Postgres context, I’d recommend to read my previous introductory post on the topic here. This one is a follow up with an aim to set up a simple demo use case to exemplify “fields of application” for Bloom and to also gauge pros/cons/performance […]
The post Trying out Postgres Bloom indexes appeared first on Cybertec - The PostgreSQL Database Company.

While I know full well that Amazon’s marketing department doesn’t need my help, I cannot resist flagging this new development from the elves in Santa’s AWS workshop:
Today we are launching a preview of Amazon Aurora PostgreSQL-Compatible Edition. It offers … high durability, high availability, and the ability to quickly create and deploy read replicas. Here are some of the things you will love about it:
Performance – Aurora delivers up to 2x the performance of PostgreSQL running in traditional environments.
Compatibility – Aurora is fully compatible with the open source version of PostgreSQL (version 9.6.1). On the stored procedure side, we are planning to support Perl, pgSQL, Tcl, and JavaScript (via the V8 JavaScript engine). We are also planning to support all of the PostgreSQL features and extensions that are supported in Amazon RDS for PostgreSQL.
Cloud Native – Aurora takes full advantage of the fact that it is running within AWS.
The language Amazon uses around Aurora is really wierd – they talk about “MySQL compatibility” and “PostgreSQL compatibility”. At an extreme, one might interpret that to mean that Aurora is a net-new database providing wire- and function-level compatibility to the target databases. However, in the PostgreSQL case, the fact that they are additionally supporting PostGIS, the server-side languages, really the whole database environment, hints strongly that most of the code is actually PostgreSQL code.
There is not a lot of reference material about what’s going on behind the scenes, but this talk from re:Invent shows that most of the action is in the storage layer. For MySQL, since storage back-ends are pluggable, it’s possible that AWS has added their own back-end. Alternately, they may be running a hacked up version of the InnoDB engine.
For PostgreSQL, with only one storage back-end, it’s pretty much a foregone conclusion that AWS have taken a fork and added some secret sauce to it. However, the fact that they are tracking the community version almost exactly (they curren
[...]I couldn’t find any graph showing all the relations between all
the pg_catalog tables,
so just for fun I wrote a little script to parse the SGML and
generate a graph using GraphViz.
#!/usr/bin/perl
use strict;
use warnings;
use File::Slurp qw(slurp);
use Data::Dumper;
open my $fh, "<:encoding(utf8)", './doc/src/sgml/catalogs.sgml' or die "$!";
my $table;
my $column;
my $references;
my $pg_catalog_fk_map = {};
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m!^\s+<title><structname>([^<>]+)</> Columns</title>$!) {
$table = $1;
} elsif ($line =~ m!^\s+<entry><structfield>([^<>]+)</structfield></entry>$!) {
$column = $1;
} elsif ($line =~ m!^\s+<entry><type>(oid|regproc)</type></entry>$!) {
} elsif (defined $column && $line =~ m!^\s+<entry><literal><link\ linkend="[^"]+"><structname>([^<>]+)</structname></link>\.oid</literal></entry>$!) {
$references = $1;
if (!defined($pg_catalog_fk_map->{$table}->{$column}))
{
$pg_catalog_fk_map->{$table}->{$column} = $references;
} elsif ($pg_catalog_fk_map->{$table}->{$column} ne $references) {
die "Inconsistent data, $table.$column references both $pg_catalog_fk_map->{$table}->{$column} and $references";
}
$pg_catalog_fk_map->{$table}->{$column} = $references;
if (!defined($pg_catalog_fk_map->{$references})) {
$pg_catalog_fk_map->{$references} = {};
}
} else {
undef($column);
undef($references);
}
}
my $dot = qq!
digraph g {
graph [
rankdir = "LR"
];
node [
fontsize = "16"
shape = "ellipse"
];
edge [
];
!;
foreach my $table (sort keys %{$pg_catalog_fk_map}) {
$dot .= qq!
"$table" [
headlabel = "$table"
label = "$table | <oid> oid|!;
foreach my $column (sort keys %{$pg_catalog_fk_map->{$table}}) {
my $references = $pg_catalog_fk_map->{$table}->{$column};
$dot .= "<$colum[...]Recently a coworker asked me this question:
Should I expect variance between minutes and hours for the same query?
And I was forced to give him this answer:
Potentially, but not commonly. Query planning is an inexact science, and regardless of the query being the “same query,” the data is not the “same data.” This isn’t generally the case, but on occasion, changes in data can affect the query execution path. Usually this is a good thing, as the database accounts for new value distributions.
For example, if there are a million distinct values in one column, but 90% of them are the same, certain values should trigger an index scan instead of a sequential scan. Those values will change over time, and if the stats don’t account for that, queries will have non-dependable performance. Of course, this introduces potential correlation assumptions that aren’t correct in some cases, and that also causes unreliable query performance. I guess the question is: which would you rather have?
That answer, despite being “right”, didn’t sit well with me. While we did eventually sit down and optimize the query in question so it was less likely to mislead Postgres, there’s no reason to assume an end-user is going to readily accept unreliable query performance. Nor should they.
But we need to perform some analysis to determine how things got to this point. Let’s start with distribution. We have a table that contains about two weeks worth of data, represented by 66M rows over 100GB of space (135GB including indexes). We’re not scheduled to upgrade to 9.6 until early 2017, so the instance is running Postgres 9.5. It isn’t bleeding edge, but this is hardly an ancient installation.
Consider the query plan for a standard retrieval of the last day worth of data:
EXPLAIN SELECT * FROM NEW WHERE create_time >= '2016-12-08'; QUERY PLAN ----------------------------------------------------------------------------- Bitmap Heap Scan ON NEW (cost=137498.78..8770565.51 ROWS=5819512 width=1032) Re |
libpq is getting some improvements in its connection strings to define some properties that are expected from the backend server thanks to the following commit, that will be part of Postgres 10:
commit: 721f7bd3cbccaf8c07cad2707826b83f84694832
author: Robert Haas <[email protected]>
date: Tue, 29 Nov 2016 12:18:31 -0500
libpq: Add target_session_attrs parameter.
Commit 274bb2b3857cc987cfa21d14775cae9b0dababa5 made it possible to
specify multiple IPs in a connection string, but that's not good
enough for the case where you have a read-write master and a bunch of
read-only standbys and want to connect to whichever server is the
master at the current time. This commit allows that, by making it
possible to specify target_session_attrs=read-write as a connection
parameter.
[...]
Victor Wagner and Mithun Cy. Design review by Álvaro Herrera, Catalin
Iacob, Takayuki Tsunakawa, and Craig Ringer; code review by me. I
changed Mithun's patch to skip all remaining IPs for a host if we
reject a connection based on this new parameter, rewrote the
documentation, and did some other cosmetic cleanup.
In short, a new parameter called target_session_attrs is added, and it can use the following values:
The strings using this parameter can have the following format, both normal connection strings and URIs are of course supported:
host=host1 target_session_attrs=any
host=host1,host2 port=port1,port2 target_session_attrs=any
postgresql://host1:port2,host2:port2/?target_session_attrs=read-write
When attempting for example to connect to a standby, libpq would complain as follows:
$ psql -d "postgresql://localhost:5433/?target_session_attrs=read-write"
psql: could not make a writable connection to server "localhost:5433"
This feature finds its strength with for example a cluster in the shape of a primary and one or more standbys. If a failo
[...]CHAR(16) Conference on Scalability for Business went very well in the heart of New York City.
Hotel Pennsylvania was a great venue; convenient for travel and a great conference venue with coffee, food and evening drinks.
Jonathan Katz was great as MC for the conference, while Eric David Benari did a great job of hosting the Speaker Panel and keeping everybody on their toes with surprising and revealing questions. We’ll be posting the video for that sometime soon.
All the presentations were well received and the room was full all day long. Corridor feedback was that the speakers were good and the content was “dense” and very much in line with the specialised nature of the conference. Many good detailed questions after each talk, always a good sign people are tuned in.
Thanks to everybody that contributed, sponsored or attended. See you again soon. Go Postgres!
Anytime you run a query in Postgres, it needs to compile your SQL into a lower-level plan explaining how exactly to retrieve the data. The more it knows about the tables involved, the smarter the planner can be. To get that information, it gathers statistics about the tables and stores them, predictably enough, in the system table known as pg_statistic. The SQL command ANALYZE is responsible for populating that table. It can be done per-cluster, per-database, per-table, or even per-column. One major pain about analyze is that every table *must* be analyzed after a major upgrade. Whether you upgrade via pg_dump, pg_upgrade, Bucardo, or some other means, the pg_statistic table is not copied over and the database starts as a clean slate. Running ANALYZE is thus the first important post-upgrade step.
Unfortunately, analyze can be painfully slow. Slow enough that the default analyze methods sometimes take longer that the entire rest of the upgrade! Although this article will focus on the pg_upgrade program in its examples, the lessons may be applied to any upgrade method. The short version of the lessons is: run vacuumdb in parallel, control the stages yourself, and make sure you handle any custom per-column statistics.
Before digging into the solution in more detail, let's see why all of this is needed. Doesn't pg_upgrade allow for super-fast Postgres major version upgrades, including the system catalogs? It does, with the notable exception of the pg_statistics table. The nominal reason for not copying the data is that the table format may change from version to version. The real reason is that nobody has bothered to write the conversion logic yet, for pg_upgrade could certainly copy the pg_statistics information: the table has not changed for many years..
At some point, a DBA will wonder if it is possible to simply copy the pg_statistic table from one database to another manually. Alas, it contains columns of the type "anyarray", which means it cannot be dumped and restored:
$ pg_dump[...]If you are a DBA and ever have had to manually call pg_terminate_backend() to kill some misbehaving backend process, then this tool might be of interest.
Long running queries are not a problem, as long as they don’t force other important processes to wait.
Another classic is a human user being forgetting to COMMIT.
Please see the Github page for more info:
https://github.com/trustly/pgterminator