== PostgreSQL Weekly News - January 20 2013 ==
The 6th annual "Prague PostgreSQL Developers Day" conference,
organized by CSPUG (Czech and Slovak PostgreSQL Users Group), will be
held on May 30, 2013 at Faculty of Mathematics and Physics, Charles
University (Malostranske namesti 25, Prague). The CfP is open until
April 14, 2013 . More information in Czech is at
== PostgreSQL Product News ==
MJSQLView Version 3.41, a Java-based UI which supports PostgreSQL,
Ora2PG 10.1, a migration system for moving from Oracle to PostgreSQL,
pgBadger 2.3, a PostgreSQL log analyzer written in Perl, released:
pgFormatter v1.2, a SQL formatter/beautifier which supports keywords
from SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011 and PostgreSQL
keywords are not in the standard, released.
== PostgreSQL Jobs for January ==
== PostgreSQL Local ==
The FOSDEM PGDay conference that will be held before FOSDEM in
Brussels, Belgium, on Feb 1st, 2013.
The first edition of the Australian PostgreSQL Day (PGDay.AU 2013)
will be held in Melbourne on February 4, 2013. Registration is free of
charge. Schedule and info:
PyPgDay will be held on March 13th at the Santa Clara Convention
Center, the first day of PyCon. Info here:
Submit your proposal here by January 20, 2013:
PGDay NYC 2013 will be held on March 22, 2013 in New York City.
PostgreSQL Session will be held on March 28th, 2013 in Paris,
France. The Call for Papers is open.
PGCon 2013 will be held May 23-24 2013, in Ottawa at the University of
== PostgreSQL in the News ==
Planet PostgreSQL: http://planet.postgresql.org/
PostgreSQL Weekly News is brought to you this week by David Fetter
Submit news and announcements by Sunday at 3:00pm Pacific time.
Please send English language ones to [email protected], German language
to [email protected], Italian language to [email protected] Spanish language
to [email protected]
== Applied Patches ==
Alvaro Herrera pushed:
- Remove spurious space. Andres Freund
- Rework order of checks in ALTER / SET SCHEMA When attempting to move
an object into the schema in which it already was, for most objects
classes we were correctly complaining about exactly that ("object is
already in schema"); but for some other object classes, such as
functions, we were instead complaining of a name collision ("object
already exists in schema"). The latter is wrong and misleading, per
complaint from Robert Haas in
To fix, refactor the way these checks are done. As a bonus, the
resulting code is smaller and can also share some code with Rename
cases. While at it, remove use of getObjectDescriptionOids() in
error messages. These are normally disallowed because of
translatability considerations, but this one had slipped through
since 9.1. (Not sure that this is worth backpatching, though, as it
would create some untranslated messages in back branches.) This is
loosely based on a patch by KaiGai Kohei, heavily reworked by me.
- Split out XLog reading as an independent facility. This new
facility can not only be used by xlog.c to carry out crash recovery,
but also by external programs. By supplying a function to read XLog
pages from somewhere, all the WAL reading can be used for completely
different purposes. For the standard backend use, the behavior
should be pretty much the same as previously. As for non-backend
programs, an hypothetical pg_xlogdump program is now closer to
reality, but some more backend support is still necessary. This
patch was originally submitted by Andres Freund in a different form,
but Heikki Linnakangas opted for and authored another design of the
concept. Andres has advanced the patch since Heikki's initial
version. Review and some (mostly cosmetics) changes by me.
- Accelerate end-of-transaction dropping of relations When relations
are dropped, at end of transaction we need to remove the files and
clean the buffer pool of buffers containing pages of those
relations. Previously we would scan the buffer pool once per
relation to clean up buffers. When there are many relations to
drop, the repeated scans make this process slow; so we now instead
pass a list of relations to drop and scan the pool once, checking
each buffer against the passed list. When the number of relations
is larger than a threshold (which as of this patch is being set to
20 relations) we sort the array before starting, and bsearch the
array; when it's smaller, we simply scan the array linearly each
time, because that's faster. The exact optimal threshold value
depends on many factors, but the difference is not likely to be
significant enough to justify making it user-settable. This has
been measured to be a significant win (a 15x win when dropping
100,000 relations; an extreme case, but reportedly a real one).
Author: Tomas Vondra, some tweaks by me Reviewed by: Robert Haas,
Shigeru Hanada, Andres Freund, Álvaro Herrera
- Fix off-by-one bug in xlog reading logic. Bug reported by Michael
Paquier Author: Andres Freund
Tom Lane pushed:
- Add new timezone abbrevation "FET". This seems to have been
invented in 2011 to represent GMT+3, non daylight savings rules, as
now used in Europe/Kaliningrad and Europe/Minsk. There are no
conflicts so might as well add it to the Default list. Per bug
#7804 from Ruslan Izmaylov.
- Fix obsolete SQL syntax in comment. This was legal back in the days
of add_missing_from, though perhaps never good style. It's not
legal anymore ... Jan Urbański
- Fix hash_update_hash_key() to handle same-bucket case correctly.
Original coding would corrupt the hashtable if the item being
updated was at the end of its bucket chain and the new hash key
hashed to that same bucket. Diagnosis and fix by Heikki
- Reject out-of-range dates in to_date(). Dates outside the supported
range could be entered, but would not print reasonably, and
operations such as conversion to timestamp wouldn't behave sanely
either. Since this has the potential to result in undumpable table
data, it seems worth back-patching. Hitoshi Harada
- Improve memory space management in tuplesort and tuplestore. The
code originally just doubled the size of the tuple-pointer array so
long as that would fit in allowedMem. This could result in failing
to use as much as half of allowedMem, if (as is typical) the last
doubling attempt didn't quite fit. Worse, we might double the array
size but be unable to use most of the added slots, because there was
no room left within the allowedMem limit for tuples the slots should
point to. To fix, double only so long as we've used less than half
of allowedMem in total. Then do one more array enlargement, but
scale it based on total memory consumption so far. This will work
nicely as long as the average tuple size is reasonably stable, and
in any case should be better than the old method. This change will
result in large sort operations consuming a larger fraction of
work_mem than they typically did in the past. The release notes
should mention that users may want to revisit their work_mem
settings, if they'd tuned those settings based on the old behavior
of sorting. Jeff Janes, reviewed by Peter Geoghegan and Robert Haas
- Protect against SnapshotNow race conditions in pg_tablespace scans.
Use of SnapshotNow is known to expose us to race conditions if the
tuple(s) being sought could be updated by concurrently-committing
transactions. CREATE DATABASE and DROP DATABASE are particularly
exposed because they do heavyweight filesystem operations during
their scans of pg_tablespace, so that the scans run for a very long
time compared to most. Furthermore, the potential consequences of a
missed or twice-visited row are nastier than average: 1. createdb()
could fail with a bogus "file already exists" error, or silently
fail to copy one or more tablespace's worth of files into the new
database. 2. remove_dbtablespaces() could miss one or more
tablespaces, thus failing to free filesystem space for the dropped
database. 3. check_db_file_conflict() could likewise miss a
tablespace, leading to an OID conflict that could result in data
loss either immediately or in future operations. (This seems of
very low probability, though, since a duplicate database OID would
be unlikely to start with.) Hence, it seems worth fixing these three
places to use MVCC snapshots, even though this will someday be
superseded by a generic solution to SnapshotNow race conditions.
Back-patch to all active branches. Stephen Frost and Tom Lane
- Modernize string literal syntax in tutorial example. Un-double the
backslashes in the LIKE patterns, since standard_conforming_strings
is now the default. Just to be sure, include a command to set
standard_conforming_strings to ON in the example. Back-patch to
9.1, where standard_conforming_strings became the default. Josh
Kupershmidt, reviewed by Jeff Janes
- Use SET TRANSACTION READ ONLY in pg_dump, if server supports it.
This currently does little except serve as documentation. (The one
case where it has a performance benefit, SERIALIZABLE mode in 9.1
and up, was already using READ ONLY mode.) However, it's possible
that it might have performance benefits in future, and in any case
it seems like good practice since it would catch any accidentally
non-read-only operations. Pavan Deolasee
- Fix an O(N^2) performance issue for sessions modifying many
relations. AtEOXact_RelationCache() scanned the entire relation
cache at the end of any transaction that created a new relation or
assigned a new relfilenode. Thus, clients such as pg_restore had an
O(N^2) performance problem that would start to be noticeable after
creating 10000 or so tables. Since typically only a small number of
relcache entries need any cleanup, we can fix this by keeping a
small list of their OIDs and doing hash_searches for them. We fall
back to the full-table scan if the list overflows. Ideally, the
maximum list length would be set at the point where N hash_searches
would cost just less than the full-table scan. Some quick
experimentation says that point might be around 50-100; I (tgl)
conservatively set MAX_EOXACT_LIST = 32. For the case that we're
worried about here, which is short single-statement transactions,
it's unlikely there would ever be more than about a dozen list
entries anyway; so it's probably not worth being too tense about the
value. We could avoid the hash_searches by instead keeping the
target relcache entries linked into a list, but that would be
noticeably more complicated and bug-prone because of the need to
maintain such a list in the face of relcache entry drops. Since a
relcache entry can only need such cleanup after a
somewhat-heavyweight filesystem operation, trying to save a
hash_search per cleanup doesn't seem very useful anyway --- it's the
scan over all the not-needing-cleanup entries that we wish to avoid
here. Jeff Janes, reviewed and tweaked a bit by Tom Lane
- Fix error-checking typo in check_TSCurrentConfig(). The code failed
to detect an out-of-memory failure. Xi Wang
- Fix one-byte buffer overrun in PQprintTuples(). This bug goes back
to the original Postgres95 sources. Its significance to modern PG
versions is marginal, since we have not used PQprintTuples()
internally in a very long time, and it doesn't seem to have ever
been documented either. Still, it *is* exposed to client apps, so
somebody out there might possibly be using it. Xi Wang
Heikki Linnakangas pushed:
- Return value of lseek() can be negative on failure. Because the
return value of lseek() was assigned to an unsigned size_t variable,
we'd fail to notice an error return code -1. Compiler gave a warning
about this. Andres Freund
- Give a proper error message if connecting to incompatible server.
The WAL streaming message format changed in 9.3, so 9.3
pg_basebackup or pg_receivelog won't work against older servers.
- Don't pass NULL to fprintf, if not currently connected to a
database. Backpatch all the way to 8.3. Fixes bug #7811, per report
and diagnosis by Meng Qingzhong.
- Make \? help message more clear when not connected. On second
thought, "none" could mislead to think that you're connected a
database with that name. Duplicate the whole string, so that it can
be more easily translated. In back-branches, thought, just use an
empty string in place of the database name, to avoid adding a
- Make GiST indexes on-disk compatible with 9.2 again. The patch that
turned XLogRecPtr into a uint64 inadvertently changed the on-disk
format of GiST indexes, because the NSN field in the GiST page
opaque is an XLogRecPtr. That breaks pg_upgrade. Revert the format
of that field back to the two-field struct that XLogRecPtr was
before. This is the same we did to LSNs in the page header to avoid
changing on-disk format. Bump catversion, as this invalidates any
existing GiST indexes built on 9.3devel.
- Fix a couple of error-handling bugs in the xlogreader patch.
XLogReadRecord should reset its state on every error, to make sure
it re-reads the page on next call. It was inconsistent in that some
errors did that, but some did not. In ReadRecord(), don't give up
on an error if we're in standby mode. The loop was set up to retry,
but the checks within the loop broke out of the loop on any error.
Andres Freund, with some tweaking by me.
- Make pg_receivexlog and pg_basebackup -X stream work across timeline
switches. This mirrors the changes done earlier to the server in
standby mode. When receivelog reaches the end of a timeline, as
reported by the server, it fetches the timeline history file of the
next timeline, and restarts streaming from the new timeline by
issuing a new START_STREAMING command. When pg_receivexlog crosses
a timeline, it leaves the .partial suffix on the last segment on the
old timeline. This helps you to tell apart a partial segment left in
the directory because of a timeline switch, and a completed segment.
If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is
still generated on the old timeline. This includes two small
changes to the streaming replication protocol: First, when you reach
the end of timeline while streaming, the server now sends the TLI of
the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't
need to parse the timeline history file like a standby server does.
Second, when BASE_BACKUP command sends the begin and end WAL
positions, it now also sends the timeline IDs corresponding the
- When xlogreader asks the callback function to read a page, make sure
we get a large enough part of the page to include the beginning of
the next record we're interested in. The XLogPageRead callback uses
the requested length to decide which timeline to stream WAL from,
and if the first call is short, and the page contains a timeline
switch, we'll repeatedly try to stream that page from the old
timeline, and never get across the timeline switch.
- I added a result set to START_STREAMING command, but neglected
walreceiver. The patch to allow pg_receivexlog to switch timeline
added a result set after copy has ended in START_STREAMING command,
to return the next timeline's ID to the client. But walreceived
didn't get the memo, and threw an error on the unexpected result
- Now that START_REPLICATION returns the next timeline's ID after
reaching end of timeline, take advantage of that in walreceiver.
Startup process is still in control of choosign the target timeline,
by scanning the timeline history files present in pg_xlog, but
walreceiver now uses the next timeline's ID to fetch its history
file immediately after it has finished streaming the old timeline.
Before, the standby would first try to restart streaming on the old
timeline, which fetches the missing timeline history file as a
side-effect, and only then restart from the new timeline. This
patch eliminates the extra iteration, which speeds up the timeline
switch and reduces the noise in the log caused by the extra restart
on the old timeline.
- Use the right timeline when beginning to stream from master. The
xlogreader refactoring broke the logic to decide which timeline to
start streaming from. XLogPageRead() uses the timeline history to
check which timeline the requested WAL position falls into. However,
after the refactoring, XLogPageRead() is always first called with
the first page in the segment, to verify the segment header, and
only then with the actual WAL position we're interested in. That
first read of the segment's header made XLogPageRead() to always
start streaming from the old timeline containing the segment header,
not the timeline containing the actual record, if there was a
timeline switch within the segment. I thought I fixed this
yesterday, but that fix was too narrow and only fixed this for the
corner-case that the timeline switch happened in the first page of
the segment. To fix this more robustly, pass explicitly the position
of the record we're actually interested in to XLogPageRead, and use
that to decide which timeline to read from, rather than deduce it
from the page and offset. Per report from Fujii Masao.
Peter Eisentraut pushed:
- Get rid of pg_dump's README. It was largely full of outdated and
incorrect information. Move the few notes which were still relevant
into header comments of pg_backup_tar.c and pg_dumpall.c. Josh
- libpq doc: Clarify what commands return PGRES_TUPLES_OK. The old
text claimed that INSERT and UPDATE always return PGRES_COMMAND_OK,
but INSERT/UPDATE with RETURNING return PGRES_TUPLES_OK. Josh
- doc: Fix syntax of a URL. Leading white space before the "http:" is
apparently treated as a relative link at least by some browsers.
Magnus Hagander pushed:
- Support multiple -t/--table arguments for more commands. On top of
the previous support in pg_dump, add support to specify multiple
tables (by using the -t option multiple times) to pg_restore,
clsuterdb, reindexdb and vacuumdb. Josh Kupershmidt, reviewed by
Karl O. Pinc
- Truncate filenames in the leadning end in pg_basebackup verbose
output. When truncating at the end, like before, the output would
often end up just showing the path instead of the filename. Also
increase the length of the filename by 5, which still keeps us at
less than 80 characters in most outputs.
- Make size-output fixed length in pg_basebackup verbose mode. This
way the line doesn't shift right as the amount of data processed
- Base the default SSL ciphers on DEFAULT instead of ALL. It's better
to start from what the OpenSSL people consider a good default and
then remove insecure things (low encryption, exportable encryption
and md5 at this point) from that, instead of starting from
everything that exists and remove from that. We trust the OpenSSL
people to make good choices about what the default is.
- Silence compiler warnings
- Clarify that streaming replication can be both async and sync. Josh
Bruce Momjian pushed:
- Add a latex-longtable output format to psql. latex longtable is
more powerful than the 'tabular' output format 'latex' uses. Also
add border=3 support to 'latex'.
- Improve pg_upgrade error report. If the cluster alignments don't
match, output this suggestion: Likely one cluster is a 32-bit
install, the other 64-bit
- psql latex fixes. Remove extra line at bottom of table for new
'latex' mode border=3. Also update 'latex'-longtable 'tableattr'
docs to say 'whitespace-separated' instead of 'space'.
- Rename new latex longtable function name, for consistency
Robert Haas pushed:
- Unbreak lock conflict detection for Hot Standby. This got broken in
the original fast-path locking patch, because I failed to account
for the fact that Hot Standby startup process might take a strong
relation lock on a relation in a database to which it is not bound,
and confused MyDatabaseId with the database ID of the relation being
locked. Report and diagnosis by Andres Freund. Final form of patch
Andrew Dunstan pushed:
- Make pgxs build executables with the right suffix. Complaint and
patch from Zoltán Böszörményi. When cross-compiling, the native
make doesn't know about the Windows .exe suffix, so it only builds
with it when explicitly told to do so. The native make will not see
the link between the target name and the built executable, and might
this do unnecesary work, but that's a bigger problem than this one,
if in fact we consider it a problem at all. Back-patch to all live
== Rejected Patches (for now) ==
No one was disappointed this week :-)
== Pending Patches ==
Peter Eisentraut sent in a patch to find libxml2 using pkg-config
where the latter is available.
Magnus Hagander sent in another revision of a patch to add a
Gurjeet Singh sent in a patch to avoid truncating empty "ON COMMIT
DELETE ROWS" temp tables on COMMIT.
Alvaro Herrera sent in a patch to fix an issue where items can get
added inadvertently to the pg_catalog schema.
Jeevan Chalke send in a patch to fix an infelicity in the interaction
of unlogged tables and GiST indexes.
Peter Eisentraut sent in a patch to allow passing diff(1) options to
Andres Freund sent in a patch to allow PostgreSQL to compile without
warning with gcc's -Wtype-limits and -Wempty-body enabled.
Andrew Dunstan sent in three more revisions of a patch to implement a
JSON API inside PostgreSQL.
Peter Eisentraut sent in a patch to add an --idempotent option to
Dean Rasheed sent in a patch implementing WITH CHECK OPTION for
Andres Freund and Alvaro Herrera traded revisions of a patch to
implement logical changeset generation.
Peter Eisentraut sent in a patch to allow changing global
configuration settings from SQL by extending the pg_db_role_setting
machinery and surfacing that extension in SQL.
Peter Eisentraut sent in a patch to replace the plugins directory
setting with a GUC called user_loadable_libraries.
KaiGai Kohei sent in two revisions of a patch to add a name-qualified
creation label in SE-Pgsql.
KaiGai Kohei sent in a patch to add db_schema:search permission checks
KaiGai Kohei sent in a patch to add db_procedure:execute permission
checks to SE-Pgsql.
KaiGai Kohei sent in another revision of a patch to implement
row-level access control.
Josh Hansen sent in a patch to change the log level of "doesn't
support secondary split" from LOG to DEBUG1.
Simon Riggs sent in a patch to implement SeqAm, which allows you to
specify a plugin that alters the behaviour for sequence allocation and
resetting, aimed specifically at clustering systems.
Peter Eisentraut sent in another revision of a patch to implement the
Michael Paquier sent in two more revisions of a patch to implement
Peter Eisentraut sent in a patch implementing system administration
functions with hardcoded superuser checks.
Following on a submission by Greg Smith of a pg_corrupt utility, Jeff
Davis sent in another revision of a patch to add page checksums.
Kevin Grittner sent in two more revisions of a patch to implement
Abhijit Menon-Sen sent in a patch to clarify a fatal error in conflict
resolution in src/backend/tcop/postgres.c.
KaiGai Kohei and Alvaro Herrera traded patches to rework ALTER.
Jeff Davis sent in three more revisions of a patch to remove
Dan Farina sent in two more revisions of a patch to add \watch to
Alvaro Herrera sent in a patch to prevent the FK locks patch from
triggering a condition where an xlog record of length 0, a disallowed
condition there, could appear.
Stephen Frost sent in a patch to ensure that a new snapshot is taken
at the beginning of each CREATE DATABASE statement.
Andres Freund sent in another revision of a patch to centralize the
Assert* macros into c.h so it's common between backend and frontend.
Andres Freund sent in another revision of a patch to create unified
frontend support for pg_malloc et al. and palloc/pfree emulation.
Amit Kapila sent in another revision of a patch to compute the max LSN
of data pages.
Amit Kapila and Zoltan Boszormenyi traded patches to allow
postgresql.conf values to be changed via SQL.
Tomonari Katsumata sent in a patch to subdivide the privileges for a
replication role into two categories: master and cascade.
Phil Sorber sent in another revision of a patch to create a pg_ping
Bruce Momjian sent in a patch to change the system() return value in
pg_upgrade on failure.
Magnus Hagander sent in a patch which checks whether a server is in
recovery mode when deciding whether to attempt to dump unlogged
tables, per bug report from Joe Van Dyk, who ran into a crash
condition when attempting to do so.
Pavel Stehule sent in another revision of a patch to fix a corner case
in the usage of variadic functions.
Sent via pgsql-announce mailing list ([email protected])
To make changes to your subscription: