PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Bruce Momjian in EnterpriseDB on 2017-01-18 at 02:45

Having covered SSL certificate creation and the use of certificate authorities (CA), I would like to put it all together and show how certificates and certificate authorities work to ensure trusted Postgres communication.

I have created a diagram showing server, client, and certificate authority certificates. None of these certificates is secret, e.g. the server sends its SSL certificate to the client, and visa versa. In the diagram, the server and client use the same certificate authority certificate. (Intermediate certificate authorities could also be used.)

When the client connects, the server sends its certificate to the client. The client uses the public key in its certificate authority certificate to verify that the server certificate was signed by its trusted certificate authority (the red line). It then uses the public key in the server certificate to encrypt a secret key that is sent to the server. Only a server with the matching private key can reply to generate a session key. It is not the possession of the server certificate that proves the server's identity but the possession of the private key that matches the public key stored in the server's certificate. The same is true for client certificates used for client host and user authentication (the blue line).

Continue Reading »

Posted by Marco Slot in CitusData on 2017-01-17 at 16:02

Indexes are an essential tool for optimizing database performance and are becoming ever more important with big data. However, as the volume of data increases, index maintenance often becomes a write bottleneck, especially for advanced index types which use a lot of CPU time for every row that gets written. Index creation may also become prohibitively expensive as it may take hours or even days to build a new index on terabytes of data in postgres. As of Citus 6.0, we’ve made creating and maintaining indexes that much faster through parallelization.

Citus can be used to distribute PostgreSQL tables across many machines. One of the many advantages of Citus is that you can keep adding more machines with more CPUs such that you can keep increasing your write capacity even if indexes are becoming the bottleneck. As of Citus 6.0 CREATE INDEX can also be performed in a massively parallel fashion, allowing fast index creation on large tables. Moreover, the COPY command can write multiple rows in parallel when used on a distributed table, which greatly improves performance for use-cases which can use bulk ingestion (e.g. sensor data, click streams, telemetry).

To show the benefits of parallel indexing, we’ll walk through a small example of indexing ~200k rows containing large JSON objects from the GitHub archive. To run the examples, we set up a formation using Citus Cloud consisting of 4 worker nodes with 4 cores each, running PostgreSQL 9.6 with Citus 6.

You can download the sample data by running the following commands:

wget http://examples.citusdata.com/github_archive/github_events-2015-01-01-{0..24}.csv.gz
gzip -d github_events-*.gz

Next lets create the table for the GitHub events once as a regular PostgreSQL table and then distribute it across the 4 nodes:

CREATE TABLE github_events (
    event_id bigint,
    event_type text,
    event_public boolean,
    repo_id bigint,
    payload jsonb,
    repo jsonb,
    actor jsonb,
    org jsonb,
    created_at timestamp
);

-- (distributed table only) Shard the table 
[...]

For a while now, I've been annoyed with how difficult it is to set up good backups in PostgreSQL. The difficulty of doing this "right" has pushed people to use things like pg_dump for backups, which is not really a great option once your database reaches any non-toy size. And when visiting customers over the years I've seen a large number of home-written scripts to do PITR backups, most of them broken, and most of that breakage because the APIs provided were too difficult to use.

Over some time, I've worked on a number of ways to improve this situation, alone or with others. The bigger steps are:

  • 9.1 introduced pg_basebackup, making it easier to take base backups using the replication protocol
  • 9.2 introduced transaction log streaming to pg_basebackup
  • 9.6 introduced a new version of the pg_start_backup/pg_stop_backup APIs that are needed to do more advanced base backups, in particular using third party backup tools.

For 10.0, there are a couple of new things that have been done in the past couple of weeks:

While reviewing my notes on some handy Postgres tricks and nasty gotchas to conclude in on-site training course my “current me” again learned some tricks which an older version of “me” had luckily wrote down. So here are two simple tricks that hopefully even a lot of Postgres power-users find surprising. Disabling JOIN re-ordering by […]

The post Two simple Postgres tips to kick-start year 2017 appeared first on Cybertec - The PostgreSQL Database Company.

Here are some news from the front of Postgres 10 development, with the highlight of the following commit:

commit: de41869b64d57160f58852eab20a27f248188135
author: Tom Lane <[email protected]>
date: Mon, 2 Jan 2017 21:37:12 -0500
Allow SSL configuration to be updated at SIGHUP.

It is no longer necessary to restart the server to enable, disable,
or reconfigure SSL.  Instead, we just create a new SSL_CTX struct
(by re-reading all relevant files) whenever we get SIGHUP.  Testing
shows that this is fast enough that it shouldn't be a problem.

In conjunction with that, downgrade the logic that complains about
pg_hba.conf "hostssl" lines when SSL isn't active: now that's just
a warning condition not an error.

An issue that still needs to be addressed is what shall we do with
passphrase-protected server keys?  As this stands, the server would
demand the passphrase again on every SIGHUP, which is certainly
impractical.  But the case was only barely supported before, so that
does not seem a sufficient reason to hold up committing this patch.

Andreas Karlsson, reviewed by Michael Banck and Michael Paquier

Discussion: https://postgr.es/m/[email protected]

This has been wanted for a long time. In some environments where Postgres is deployed, there could be CA and/or CLR files installed by default, and the user may want to replace them with custom entries. Still, in most cases, the problems to deal with is the replacement of expired keys. In each case, after replacing something that needs a reload of the SSL context, a restart of the instance is necessary to rebuild it properly. Note that while it may be fine for some users to pay the cost of an instance restart, some users caring about availability do not want to have to take down a server, so this new feature is most helpful for many people.

All the SSL parameters are impacted by this upgrade, and they are the following ones:

  • ssl
  • ssl_ciphers
  • ssl_prefer_server_ciphers
  • ssl_ecdh_curve
  • ssl_cert_file
  • ssl_key_file
  • ssl_ca_file
  • ssl_crl_file

Note however that there

[...]
Posted by Shaun M. Thomas on 2017-01-13 at 20:59

There are a smorgasbord of database engines out there. From an outside perspective, Postgres is just another on a steadily growing pile of structured data storage mechanisms. Similarly to programming languages like Rust and Go, it’s the new and shiny database systems like MongoDB that tend to garner the most attention. On the other hand, more established engines like Oracle or MySQL have a vastly larger lead that seems insurmountable. In either case, enthusiasm and support is likely to be better represented in exciting or established installations.

So why? Why out of the myriad choices available, use Postgres? I tend to get asked this question by other DBAs or systems engineers that learn I strongly advocate Postgres. It’s actually a pretty fair inquiry, so why not make it the subject of the first PG Phriday for 2017? What distinguishes it from its brethren so strongly that I staked my entire career on it?

Boring!

Postgres isn’t new. It didn’t enjoy the popularity that practically made MySQL a household name as part of the LAMP stack. It didn’t infiltrate corporations several decades ago as the de facto standard for performance and security like Oracle. It isn’t part of a much larger supported data environment like SQL Server. It isn’t small and easy like SQLite. It’s not a distributed hulk like Hadoop, or enticingly sharded like MongoDB or Cassandra. It’s not in-memory hotness like VoltDB.

It’s just a regular, plain old ACID RDBMS.

You can't explain that!

It does after all, have all of the basics many expect in an RDBMS:

  • Tables, Views, Sequences, etc.
  • Subqueries
  • Functions in various languages
  • Triggers
  • Point In Time Recovery

Certain… other database platforms weren’t so complete. As a consequence, Postgres was preferred by those who knew the difference and needed that extra functionality without breaking the bank. It’s not much, but it’s a great way to develop a niche. From there, things get more interesting.

Durababble

For the most part, being boring but reliable was a fact of life until the release of 9.0 when Postgres introduced s

[...]
Posted by Bruce Momjian in EnterpriseDB on 2017-01-12 at 21:15

Having covered the choice of certificate authorities, I want to explain the internals of creating server certificates in Postgres. The instructions are already in the Postgres documentation.

When using these instructions for creating a certificate signing request (CSR), two files are created:

  • certificate signing request file with extension req
  • key file, containing public and private server keys, with extension pem

(It is also possible to use an existing key file.) You can view the contents of the CSR using openssl, e.g.:

Continue Reading »

Posted by Josh Berkus in pgExperts on 2017-01-12 at 00:26
Those of you in the PostgreSQL community will have noticed that I haven't been very active for the past year.  My new work on Linux containers and Kubernetes has been even more absorbing than I anticipated, and I just haven't had a lot of time for PostgreSQL work.

For that reason, as of today, I am stepping down from the PostgreSQL Core Team.

I joined the PostgreSQL Core Team in 2003.  I decided to take on project advocacy, with the goal of making PostgreSQL one of the top three databases in the world.  Thanks to the many contributions by both advocacy volunteers and developers -- as well as the efforts by companies like EnterpriseDB and Heroku -- we've achieved that goal.  Along the way, we proved that community ownership of an OSS project can compete with, and ultimately outlast, venture-funded startups.

Now we need new leadership who can take PostgreSQL to the next phase of world domination.  So I am joining Vadim, Jan, Thomas, and Marc in clearing the way for others.

I'll still be around and still contributing to PostgreSQL in various ways, mostly around running the database in container clouds.  It'll take a while for me to hand off all of my PR responsibilities for the project (assuming that I ever hand all of them off).

It's been a long, fun ride, and I'm proud of the PostgreSQL we have today: both the database, and the community.  Thank you for sharing it with me.
Posted by David Rader in OpenSCG on 2017-01-10 at 12:01

Using Postgres with python is easy and provides first class database capabilities for applications and data processing. When starting a new project, you need to choose which PostgreSQL python driver to use. The best choice depends on your deployment, python version, and background. Here’s a quick guide to choosing between three popular python PostgreSQL drivers:

Psycopg2

This is the most common and widely supported Python PostgreSQL driver. It provides a DB-API compatible wrapper on top of the native libpq postgresql library. It supports automatic mapping of query results to python dictionaries as well as named tuples, and is commonly used by ORM’s and frameworks, including SQLAlchmey. Because psycopg2 uses libpq, it supports the same environment variables as libpq for connection properties (PGDATABASE, PGUSER, PGHOST, etc) as well as using a .pgpass file. And, supports COPY directly for bulk loading. If you use Postgres from multiple languages this driver will feel the most “native PG.”

But, the libpq dependency requires libpq-dev and python-dev packages on Linux or install packages on Windows and macOS for the shared objects/dll’s. Not an issue for a single developer machine or server, but hard to package for a cross-platform Python app (like our BigSQl DevOps management tool) or to use in a PyPy runtime.

pg8000

An actively maintained, pure python DB-API compatible driver that works across OS and in many different Python environments, such as PyPy and Jython. With no native dependencies, it is easy to package pg8000 for distribution with an application with Python 2 or Python 3. pg8000 tries to be focused on just PostgreSQL database access, ignoring some of the convenience functions in libpq. For example, pg800 does not directly support a .pgpass file or environment variables – but you can use the pgpasslib project in conjunction. pg8000 does not include the many Extras included with psycopg2 to retrieve dictionaries or named tuples but does support most data types.

pg8000 is a good choice for distributing with

[...]

Most people using PostgreSQL database systems are aware of the fact that the database engine has to send changes to the so called “Write Ahead Log” (= WAL) to ensure that in case of a crash the database will be able to recover to a consistent state safely and reliably. However, not everybody is aware […]

The post Checkpoint distance and amount of WAL appeared first on Cybertec - The PostgreSQL Database Company.

When setting up SSL in Postgres, you can't just enable SSL. You must also install a signed certificate on the server.

The first step is to create a certificate signing request (CSR) file that contains the host name of the database server. Once created, there are three ways to sign a CSR to create a certificate:

If the certificate is to be self-signed, use the key created by the certificate signing request to create a certificate. If using a local certificate authority, sign the CSR file with the local certificate authority's key.

Continue Reading »

Posted by Jobin Augustine in OpenSCG on 2017-01-09 at 11:58

Many DBAs agree that one of the most useful extension in their arsenal is pg_repack. Pg_repack addresses a serious shortcoming in Postgres: doing VACUUM FULL online. Due to the way Postgres handles the MVCC, tables and indexes become bloated. Dead tuples are addressed by AUTOVACUUM and space will be marked free. In many situations a VACUUM FULL becomes unavoidable because AUTOVACUUM just leaves scattered and fragmented free space in the table as it is. DBA may have to do VACUMM FULL to release such free space back to disk.

Unfortunately VACUUM FULL requires an exclusive lock on the table during an operation. This can’t be performed while the table is being used. This is where pg_repack comes to the rescue of the DBA — to perform a VACUUM FULL almost fully online (there could be a momentary lock).

The popularity of this extension among DBAs led the Postgres by BigSQL project to add pg_repack as a ready to install package.

Installation

Installing pg_repack is quite easy with the pgcli command line.

$ ./pgc list Displays all installable versions of extensions. Let’s install repack13-pg96 (the package version depends on the postgres version we have installed). Installing this is pretty straightforward:

$ ./pgc install repack13-pg96

Installation of pg_repack doesn’t require either a restart or reload. As is commonly known, the pg_repack installation has 2 components: the actual extension and a client tool to invoke the pg_repack functionality. We can create the extension in the desired database as follows from the psql command interface:

postgres=# \c db1
postgres=# CREATE EXTENSION pg_repack;
CREATE EXTENSION

Test Environment

To create a test environment, run pgbench from the command line:

pgbench -U postgres -i -s 10 db1

This produced a pgbench_accounts table of 128 MB. To create a bloat, I ran the update twice.

db1=# update pgbench_accounts set abalance=abalance;

This caused the table to grow up to 384 MB. In a couple of minutes, AUTOVACUUM kicked in and cleans up all dead tuples as expected.

db1=# select n_live_tu

[...]

Barman 2.1

Version 2.1 of Barman, backup and recovery manager for PostgreSQL, was released Thursday, Jan. 5.

The new release, along with several bugfixes, introduces preliminary support for the upcoming PostgreSQL 10, and adds the –archive option to the switch-xlog command.

switch-xlog –archive

The new –archive option is especially useful when setting up a new server.

Until now, the switch-xlog command used to force the PostgreSQL server to switch to a different transaction log file. Now, Barman also gives the –archive option, which triggers WAL archiving after the xlog switch, and forces Barman to wait for the archival of the closed WAL file.

By default Barman expects to receive the WAL in 30 seconds, the amount of seconds to wait can be changed using the –archive-timeout option.
If the switch-xlog command returns an error, it means no WAL file has been archived, and the Barman server is not able to receive WALs from PostgreSQL.

This option allows the users to test the entire WAL archiving process, identifying configuration issues.

Conclusions

The Barman dev team is very happy about this small release. Containing primarily bug fixes, it increases the robustness of Barman thanks to the feedback received through the Barman mailing list and the GitHub issue tracker.

If you are interested in helping us by sponsoring the development, even partially, drop us a line ([email protected]).

Links

Website
Download
Online Documentation
Man page, section 1
Man page, section 5
Support

Posted by Craig Kerstiens on 2017-01-08 at 08:00

It seems each week when I’m reviewing data with someone a feature comes up that they had no idea existed within Postgres. In an effort to continue documenting many of the features and functionality that are useful, here’s a list of just a few that you may find handy the next time you’re working with your data.

Psql, and \e

This one I’ve covered before, but it’s worth restating. Psql is a great editor that already comes with Postgres. If you’re comfortable on the CLI you should consider giving it a try. You can even setup you’re own .psqlrc for it so that it’s well customized to your liking. In particular turning \timing on is especially useful. But even with all sorts of customization if you’re not aware that you can use your preferred editor by using \e then you’re missing out. This will allow you to open up the last run query, edit it, save–and then it’ll run for you. Vim, Emacs, even Sublime text works just take your pick by setting your $EDITOR variable.

Watch

Ever sit at a terminal running a query over and over to see if something on your system changed? If you’re debugging something whether locally or even live in production, watching data change can be key to figuring out. Instead of re-running your query you could simply use the \watch command in Postgres, this will re-run your query automatically every few seconds.

```sql SELECT now() –

   query_start, 
   state, query 

FROM pg_stat_activity \watch ```

JSONB pretty print

I love JSONB as a datatype. Yes, in cases it won’t be the optimal for performance (though at times it can be perfectly fine). If I’m hitting some API that returns a ton of data, I’m usually not using all of it right away. But, you never know when you’ll want to use the rest of it. I use Clearbit this way today, and for safety sake I save all the JSON result instead of de-normalizing it. Unfortunately, when you query this in Postgres you get one giant compressed text of JSON. Yes, you could pipe out to something like jq, or you could simply use Postgres built in function to make it

[...]
Posted by gabrielle roth on 2017-01-07 at 23:14

When: 6-8pm Thursday Jan 19, 2017
Where: iovation
Who: Mark Wong
What: pglogical

Mark Wong will give an overview of pglogical, the latest replication option for Postgres. It’s a lower-impact option than trigger-based replication, and features include the ability to replicate only the databases and tables you choose from a cluster. Part of the talk will cover use cases and future development plans.

Find out more here: https://2ndquadrant.com/en/resources/pglogical/

Mark leads the 2ndQuadrant performance practice as a Performance Consultant for English Speaking Territories, based out of Oregon in the
USA.


If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

iovation provides us a light dinner (usually sandwiches or pizza).

Elevators open at 5:45 and building security closes access to the floor at 6:30.

See you there!


Posted by Bruce Momjian in EnterpriseDB on 2017-01-06 at 15:45

During normal server shutdown, sessions are disconnected, dirty shared buffers and pending write-ahead log (WAL) records are flushed to durable storage, and a clean shutdown record is written to pg_control. During the next server start, pg_control is checked, and if the previous shutdown was clean, startup can ignore the WAL and start immediately.

Unfortunately, a clean shutdown can take some time, and impatient database administrators might get into the habit of using kill -9 or pg_ctl -m immediate to quicken the shutdown. While this does have the intended effect, and you will not lose any committed transactions, it greatly slows down the next database startup because all WAL generated since the last completed checkpoint must be replayed. You can identify an unclean shutdown by looking at the server logs for these two ominous lines:

LOG:  database system was interrupted; last known up at 2016-10-25 12:17:28 EDT
LOG:  database system was not properly shut down; automatic recovery in progress

Continue Reading »

Using a managed service is a very attractive proposition. You are offloading a heck of a lot of worry, especially when it comes to something as complicated and, let’s face it, specialized as a database. Someone else will set it up for you, and back it up, and keep it running, without you having to worry overmuch about it. However, there are downsides. You can only get what the manager is offering. Often that’s good enough. But needs change, and I have often seen people start with managed services, only to find that they want more than they can get that way.

Just yesterday I received a complaint that the Redis Foreign Data Wrapper, which I have done a lot of work on, is not available on Amazon RDS. And that’s completely understandable. Amazon only provide a limited number of extensions, and this isn’t one of them. At least one other managed service, Heroku, does offer this extension, but there are others it doesn’t offer.

So the lesson is: choose your managed service, or even whether to use a managed service at all, very carefully, taking into account both your current needs and your likely future needs.

Today’s post, the first one of 2017, is about the following feature of the upcoming Postgres 10:

commit: 3901fd70cc7ccacef1b0549a6835bb7d8dcaae43
author: Fujii Masao <[email protected]>
date: Mon, 19 Dec 2016 21:15:30 +0900
Support quorum-based synchronous replication.

This feature is also known as "quorum commit" especially in discussion
on pgsql-hackers.

This commit adds the following new syntaxes into synchronous_standby_names
GUC. By using FIRST and ANY keywords, users can specify the method to
choose synchronous standbys from the listed servers.

FIRST num_sync (standby_name [, ...])
ANY num_sync (standby_name [, ...])

The keyword FIRST specifies a priority-based synchronous replication
which was available also in 9.6 or before. This method makes transaction
commits wait until their WAL records are replicated to num_sync
synchronous standbys chosen based on their priorities.

The keyword ANY specifies a quorum-based synchronous replication
and makes transaction commits wait until their WAL records are
replicated to *at least* num_sync listed standbys. In this method,
the values of sync_state.pg_stat_replication for the listed standbys
are reported as "quorum". The priority is still assigned to each standby,
but not used in this method.

The existing syntaxes having neither FIRST nor ANY keyword are still
supported. They are the same as new syntax with FIRST keyword, i.e.,
a priority-based synchronous replication.

Author: Masahiko Sawada
Reviewed-By: Michael Paquier, Amit Kapila and me
Discussion: <CAD21AoAACi9NeC_ecm+Vahm+MMA6nYh=Kqs3KB3np+MBOS_gZg@mail.gmail.com>

Many thanks to the various individuals who were involved in
discussing and developing this feature.

9.6 has introduced the possibility to specify multiple synchronous standbys by extending the syntax of synchronous_standby_names. For example values like ‘N (standby_1,standby_2, … ,standby_M)’ allow a primary server to wait for commit confirmations from N standbys among the set of M nodes defined in the list given by user, depending on

[...]
Posted by Christophe Pettus in pgExperts on 2017-01-04 at 17:00

So, what’s weird about this plan, from a query on a partitioned table? (PostgreSQL 9.3, in this case.)

test=> explain select distinct id from orders where order_timestamp > '2016-05-01';
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=347341.56..347343.56 rows=200 width=10)
   Group Key: orders.id
   ->  Append  (cost=0.00..337096.10 rows=4098183 width=10)
         ->  Seq Scan on orders  (cost=0.00..0.00 rows=1 width=178)
               Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
         ->  Index Scan using orders_20160425_order_timestamp_idx on orders_20160425  (cost=0.43..10612.30 rows=120838 width=10)
               Index Cond: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
         ->  Seq Scan on orders_20160502  (cost=0.00..80539.89 rows=979431 width=10)
               Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
         ->  Seq Scan on orders_20160509  (cost=0.00..74780.41 rows=909873 width=10)
               Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
         ->  Seq Scan on orders_20160516  (cost=0.00..68982.25 rows=845620 width=10)
               Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
         ->  Seq Scan on orders_20160523  (cost=0.00..65777.68 rows=796054 width=10)
               Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
         ->  Seq Scan on orders_20160530  (cost=0.00..36403.57 rows=446366 width=10)
               Filter: (order_timestamp > '2016-05-01 00:00:00'::timestamp without time zone)
(17 rows)

That estimate on the HashAggregate certainly looks wonky, doesn’t it? Just 200 rows even with a huge number of rows below it?

What if we cut down the number of partitions being hit?

test=> explain
[...]
Posted by David Rader in OpenSCG on 2017-01-04 at 15:25

One of the most common questions from new PostgreSQL users is “how do I connect to the database server?” The nuances of pg_hba.conf and how to correctly enable your Python web app to connect to your db server without opening up connections to all users and all servers is not simple for someone new to Postgres. And architecting a secure multi-tenant SaaS system requires knowledge of how roles, schemas, databases, and search paths interact. That’s one reason we wrote a Security Whitepaper a while back.

But, after seeing thousands of MongoDB instances taken hostage by ransomware the “no authorization required” default for MongoDB is looking like a very dumb idea. Just imagine what executives whose developers picked MongoDB are saying today:

“You mean we store our client list in a database without security?

“Anyone can just delete our NoSQL database from the internet?”

“Were we hacked last year when you said we lost data in MongoDB?”

So, a quick “Thank You” to PostgreSQL for making sure that your data is Secure By Default.

Mirror mirror on the wall,
Who is the fairest Database of all?

A frequently asked question, certainly.

DB-Engines recently announced it’s DBMS of the Year. Maybe the cool thing is that PostgreSQL is in 3rd Place. Yee-ha, an open source project is up there!

Let’s look closely about what this means.

PostgreSQL.org’s agreed response was this…

“It’s great to see the continued success of PostgreSQL being reflected in DB-Engines rankings. It’s clear that the strength of the following for the World’s Most Advanced Open Source Database is enough to outweigh the largest software companies as people continue to choose to move away from commercial databases.”

though because of commercial sensitivity this was changed down to this

“It’s great to see the continued success of PostgreSQL being reflected in DB-Engines rankings. It’s clear that the strength of the following for the World’s Most Advanced Open Source Database is enough to draw people away from commercial databases.”

What were the commercial sensitivities? (What about “open source sensitivies”? Well, blame me, cos I agreed the change.)

Well, the title of the post is that a Microsoft product is actually DBMS of the Year, even though it’s not ranked #1 on the main list, that’s still Oracle. And Postgres is #3 on DBMS of the Year, even though we moved through to #5 again, competing with MongoDB for position 4 (although mostly level).

My guess is that Microsoft would like to highlight how it gets more press than PostgreSQL, a point I would concede in an instant. Whether that means it is more popular or has better features is a different thing entirely. People are simply leaving commercial databases in droves to come to PostgreSQL, which is clearly reflected in the very public decline of Oracle licencing revenues over the last 10 quarters and I’m sure its just the same for Microsoft revenue.

The purpose of the announcement from PostgreSQL.org was to highlight that “the strength of the following for the World’s Most Advanced Open Source Database is enough to outweigh t

[...]
Posted by Bruce Momjian in EnterpriseDB on 2017-01-03 at 17:15

Unlike other database systems, Postgres makes the cleanup process visible and tunable to users. Autovacuum performs recycling of old rows and updates optimizer statistics. It appears in ps command output, the pg_stat_activity system view, and optionally in the server logs via log_autovacuum_min_duration.

Postgres also allows fine-grained control over the autovacuum cleanup process. Occasionally users find that cleanup is slowing the system down, and rather than modifying the behavior of autovacuum, they decide to turn it off via the autovacuum setting.

However, turning off autovacuum can cause problems. Initially the system will run faster since there is no cleanup overhead, but after a while old rows will clog up user tables and indexes, leading to increasing slowness. Once that happens, you can turn on autovacuum again, and it will recycle the old rows and free up space, but there will be much unused space that can't be reused quickly, or perhaps ever.

Continue Reading »

As we say welcome to a new year, we have a couple of updates to the finances and payment handling in PostgreSQL Europe, that will affect our members and attendees of our events.

First of all, PostgreSQL Europe has unfortunately been forced to VAT register. This means that most of our invoices (details below) will now include VAT.

Second, we have enabled a new payment provider for those of you that can't or prefer not to use credit cards but that still allows for fast payments.

A few weeks back, I noticed the following tweet from Michael Paquier:

tweet

And my first thought was "that can't be right" (spoiler: Turns out it wasn't. But almost.)

The second thought was "hmm, I wonder how that has actually changed over time". And of course, with today being a day off and generally "slow pace" (ahem), what better way than to analyze the data that we have. The PostgreSQL mailinglist archives are all stored in a PostgreSQL database of course, so running the analytics is a quick job.

The one thing that everyone knows about compositive indexes is: If you have an index on (A, B, C), it can’t be used for queries on (B) or (B, C) or (C), just (A), (A, B) or (A, B, C), right? I’ve said that multiple times in talks. It’s clearly true, right?

Well, no, it’s not. It’s one of those things that is not technically true, but it is still good advice.

The documentation on multi-column indexes is pretty clear:

A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned.

Let’s try this out!

First, create a table and index:

xof=# CREATE TABLE x ( 
xof(#     i integer,
xof(#     f float,
xof(#     g float
xof(# );
CREATE TABLE
xof=# CREATE INDEX ON x(i, f, g);
CREATE INDEX

And fill it with some test data:

xof=# INSERT INTO x SELECT 1, random(), random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# INSERT INTO x SELECT 2, random(), random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# INSERT INTO x SELECT 3, random(), random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# ANALYZE x;
ANALYZE

And away we go!

xof=# EXPLAIN ANALYZE SELECT SUM(g) FROM x WHERE f BETWEEN 0.11 AND 0.12;
                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=599859.50..599859.51 rows=1 width=8) (actual time=91876.057..91876.057 rows=1 loops=1)
   ->  Index Only Scan using x_i_f_g_idx on x  (cost=0.56..599097.71 rows=304716 width=8) (actual time=1820.699..91652.409 rows=300183 loop
[...]
PostgreSQL allows for more than plain password authentication in pg_hba.conf. One of the most flexible is authenticating against a PAM.

Let's see how this works with one-time passwords from  Google Authenticator.

1.) Install Google Authenticator on your Android or iOS device.

2.) Install the Google Authenticator PAM on the machine where your PostgreSQL server lives, like in Step 1 - 4 of this guide.

3.) Connect your device with the account on that machine.

4.) Configure a PAM service for PostgreSQL. E.g. create a file named postgresql where your PAM configs live, on Ubuntu this is /etc/pam.d/. The file should look like this:

auth         sufficient     pam_google_authenticator.so

5.) Configure PostgreSQL to use the PAM. E.g. a line in pg_hba.conf could look like this:

hostssl    all    all    127.0.0.1/32   pam    pamservice=postgresql

And that's basically it. Now, next time you login, PostgreSQL will ask you for a password that is generated individually on your device.

Of course you can use all kinds of PAM with PostgreSQL like this.

Unfortunately, I also found a few caveats along the way. :-(

First, PostgreSQL clients will ask only for one password, regardless if you chain n PAM's for n-factor authentication.

So if you e.g. chain a PAM against LDAP with Google Authenticator as the second factor, this won't work. This seems to be a shortcoming of the PAM implementation in PostgreSQL, not expecting multiple password prompts. It is still possible to enable n-factor authentication though, but only one PAM can prompt for a password. If the other factors are hardware devices like a fingerprint scanner that does not prompt for a password, you are fine.

Alternatively, you can provide your own PAM that takes all passwords in one prompt and handles them internally.

Second, PAM requires PostgreSQL clients to send the password in plaintext. So now is the time to switch on TLS and make it mandatory (Noticed the hostssl switch above?).

Third, some clients like pgAdmin3 break with one-time passwords, becau[...]

On a PostgreSQL primary / secondary pair, it’s very important to monitor replication lag. Increasing replication lag is often the first sign of trouble, such as a network issue, the secondary disconnecting for some reason (or for no reason at all, which does happen rarely), disk space issues, etc.

You can find all kinds of complex scripts that do math on the various WAL positions that are available from the secondary and from pgstatreplication on the primary.

Or you can do this. It’s very cheap and cheerful, and for many installations, it gets the job done.

First, on the primary (and thus on the secondary), we create a one-column table:

CREATE TABLE replication_monitor (
   last_timestamp TIMESTAMPTZ
);

Then, we insert a singel row into the table (you can probably already see where this is going):

INSERT INTO replication_monitor VALUES(now());

Having that, we can start a cron job that runs every minute, updating that value:

* * * * * /usr/bin/psql -U postgres -c "update replication_monitor set last_update=now()" postgres > /dev/null

On the secondary (which is kept in sync with the primary via NTP, so make sure ntpd is running on both!), we have a script, also run from cron, that complains if the value in has fallen more than a certain amount behind now(). Here’s a (pretty basic) Python 2 version:

#!/usr/bin/python

import sys

import psycopg2

conn = psycopg2.connect("dbname=postgres user=postgres")

cur = conn.cursor()

cur.execute("select (now()-last_update)>'5 minutes'::interval from replication_monitor")

problem = cur.fetchone()[0]

if problem:
    print >>sys.stderr, "cat1r replication lag over 5 minutes."

We make sure we get the output from stderr for cron jobs on the secondary, set it up to run every so often, and we’re done!

Of course, this has its limitations:

  • It only has ±2 minutes of resolution, based on how often cron runs. For a basic “is replication working?” check, this is probably fine.

  • It creates traffic in the replication stream and WAL, but if you are really worried about a TIM

[...]

It is too late for stories about Christmas, so here is a Postgres story about the following commit of Postgres 10:

commit: 6ef2eba3f57f17960b7cd4958e18aa79e357de2f
author: Andres Freund <[email protected]>
date: Thu, 22 Dec 2016 11:31:50 -0800
Skip checkpoints, archiving on idle systems.

Some background activity (like checkpoints, archive timeout, standby
snapshots) is not supposed to happen on an idle system. Unfortunately
so far it was not easy to determine when a system is idle, which
defeated some of the attempts to avoid redundant activity on an idle
system.

To make that easier, allow to make individual WAL insertions as not
being "important". By checking whether any important activity happened
since the last time an activity was performed, it now is easy to check
whether some action needs to be repeated.

Use the new facility for checkpoints, archive timeout and standby
snapshots.

The lack of a facility causes some issues in older releases, but in my
opinion the consequences (superflous checkpoints / archived segments)
aren't grave enough to warrant backpatching.

Author: Michael Paquier, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Amit Kapila, Kyotaro HORIGUCHI
Bug: #13685
Discussion:
https://www.postgresql.org/message-id/[email protected]
https://www.postgresql.org/message-id/CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkYv_45dKwA@mail.gmail.com
Backpatch: 

Postgres 9.0 has been the first release to introduce the parameter wal_level, with three different values:

  • “minimal”, to get enough information WAL-logged to recover from a crash.
  • “archive”, to be able to recover from a base backup and archives.
  • “hot_standby”, to be able to have a standby node work, resulting in information about exclusive locks and currently running transactions to be WAL-logged, called standby snapshots.
  • “logical”, introduced in 9.4, to work with logical decoding.

In 9.6, “archive” and “hot_standby” have been merged into “replica” as both levels have no difference in

[...]
Posted by Pavel Stehule on 2016-12-23 at 19:27
I had a customer with strange issues of OpenERP - the main problem was long life of OpenERP/PostgreSQL sessions. The fix was not hard - using pgBouncer in transaction mode with short server life time (ten minutes).
Posted by Scott Mead in OpenSCG on 2016-12-22 at 22:17

I spend a lot of time trying to learn what’s already been implemented, as DBA’s, we tend to live in that world. It’s important that you have tools that allow you to quickly get the visual so you can get on with your job. One of the biggest ‘reverse engineering’ tasks that DBA’s have to do is in the area of data modeling. There’s a lot of tools that do this, but my favorite of late is SchemaSpy (schemapy.sf.net). I’ve found that the output is complete and the ERD’s are easy to read.

java -jar schemaSpy_5.0.0.jar -t pgsql -host localhost:5432 -db postgres \
-dp ~/software/pg/drivers/jdbc/postgresql-9.4.1208.jre6.jar \
-o ERD -u postgres -s myschema

SchemaSpy is nice because it can talk to many different types of databases… a side effect of this is a fairly complex set of commandline switches.

-t specifies the type, -t pgsql specifies postgres
-o specifies a directory to create the output in

All you need in order to run this is the SchemaSpy jar and a postgresql jdbc driver. In the output directory ‘ERD’. Pulling up ERD/index.html gives you a nice page with a table list and some basics.

SchemaSpyDash

As a DBA, I really love the ‘Insertion Order’ and ‘Deletion Order’ links here. SchemaSpy reverse engineers the referential integrity chain for me! Clicking either one gives me a simple page, top to bottom with the right order!

InsertionOrder

Now for the real reason that I super-love SchemaSpy. The ‘Relationships’ tab. I’ve loaded up a pgbench schema. pgbench doesn’t actually create any real keys, but column names are consistent. SchemaSpy notices this and ‘infers’ relationships for me! This is huge, even without explicity keys, I can begin to infer what the developer intended (the estimated rowcount is helpful too 🙂

Relationships

I won’t force you to follow me through all of the tabs. If you’re looking for a schema visualization tool, give SchemaSpy a try.

Happy PostgreSQL-ing!