Features Download
From: David Fetter <david <at> fetter.org>
Subject: == PostgreSQL Weekly News - December 02 2012 ==
Newsgroups: gmane.comp.db.postgresql.announce
Date: Monday 3rd December 2012 05:07:00 UTC (over 5 years ago)
== PostgreSQL Weekly News - December 02 2012 ==

== PostgreSQL Local ==

The FOSDEM PGDay conference that will be held before FOSDEM in
Brussels, Belgium, on Feb 1st, 2013.  The CfP for both this event and
for the PG track of FOSDEM are open.

PGDay NYC 2013 will be held on March 22, 2013 in New York City.  The
CfP submission deadline is January 7th, 2013 at noon eastern time.
papers AT nycpug DOT org.

== PostgreSQL in the News ==

Planet PostgreSQL: http://planet.postgresql.org/

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm Pacific time.
Please send English language ones to [email protected], German language
to [email protected], Italian language to [email protected]  Spanish language
to [email protected]

== Applied Patches ==

Tom Lane pushed:

- Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance
  trees.  In a query such as "SELECT DISTINCT min(x) FROM tab", the
  DISTINCT is pretty useless (there being only one output row), but
  nonetheless it shouldn't fail.  But it could fail if "tab" is an
  inheritance parent, because planagg.c's code for fixing up
  equivalence classes after making the index-optimized MIN/MAX
  transformation wasn't prepared to find child-table versions of the
  aggregate expression.  The least ugly fix seems to be to add an
  option to mutate_eclass_expressions() to skip child-table
  equivalence class members, which aren't used anymore at this stage
  of planning so it's not really necessary to fix them.  Since child
  members are ignored in many cases already, it seems plausible for
  mutate_eclass_expressions() to have an option to ignore them too.
  Per bug #7703 from Maxim Boguk.  Back-patch to 9.1.  Although the
  same code exists before that, it cannot encounter child-table
  aggregates AFAICS, because the index optimization transformation
  cannot succeed on inheritance trees before 9.1 (for lack of

- Revert patch for taking fewer snapshots.  This reverts commit
  d573e239f03506920938bf0be56c868d9c3416da, "Take fewer snapshots".
  While that seemed like a good idea at the time, it caused execution
  to use a snapshot that had been acquired before locking any of the
  tables mentioned in the query.  This created user-visible anomalies
  that were not present in any prior release of Postgres, as reported
  by Tomas Vondra.  While this whole area could do with a redesign
  (since there are related cases that have anomalies anyway), it
  doesn't seem likely that any future patch would be reasonably
  back-patchable; and we don't want 9.2 to exhibit a behavior that's
  subtly unlike either past or future releases.  Hence, revert to
  prior code while we rethink the problem.

- Add explicit casts in ilist.h's inline functions.  Needed to silence
  C++ errors, per report from Peter Eisentraut.  Andres Freund

- Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.  Commit
  8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP
  poor choice of catalog state representation.  The pg_index state for
  an index that's reached the final pre-drop stage was the same as the
  state for an index just created by CREATE INDEX CONCURRENTLY.  This
  meant that the (necessary) change to make RelationGetIndexList
  ignore about-to-die indexes also made it ignore freshly-created
  indexes; which is catastrophic because the latter do need to be
  considered in HOT-safety decisions.  Failure to do so leads to
  incorrect index entries and subsequently wrong results from queries
  depending on the concurrently-created index.  To fix, add an
  additional boolean column "indislive" to pg_index, so that the
  freshly-created and about-to-die states can be distinguished.  (This
  change obviously is only possible in HEAD.  This patch will need to
  be back-patched, but in 9.2 we'll use a kluge consisting of
  overloading the formerly-impossible state of indisvalid = true and
  indisready = false.) In addition, change CREATE/DROP INDEX
  CONCURRENTLY so that the pg_index flag changes they make without
  exclusive lock on the index are made via heap_inplace_update()
  rather than a normal transactional update.  The latter is not very
  safe because moving the pg_index tuple could result in concurrent
  SnapshotNow scans finding it twice or not at all, thus possibly
  resulting in index corruption.  This is a pre-existing bug in CREATE
  INDEX CONCURRENTLY, which was copied into the DROP code.  In
  addition, fix various places in the code that ought to check to make
  sure that the indexes they are manipulating are valid and/or ready
  as appropriate.  These represent bugs that have existed since 8.2,
  since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or
  invalid index behind, and we ought not try to do anything that might
  fail with such an index.  Also fix RelationReloadIndexInfo to ensure
  it copies all the pg_index columns that are allowed to change after
  initial creation.  Previously we could have been left with stale
  values of some fields in an index relcache entry.  It's not clear
  whether this actually had any user-visible consequences, but it's at
  least a bug waiting to happen.  In addition, do some code and docs
  review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but
  mostly addition and revision of comments.  This will need to be
  back-patched, but in a noticeably different form, so I'm committing
  it to HEAD before working on the back-patch.  Problem reported by
  Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and
  Andres Freund.

- Suppress parallel build in interfaces/ecpg/preproc/.  This is to see
  if it will stop intermittent build failures on buildfarm member
  okapi.  We know that gmake 3.82 has some problems with sometimes not
  honoring dependencies in parallel builds, and it seems likely that
  this is more of the same.  Since the vast bulk of the work in the
  preproc directory is associated with creating preproc.c and then
  preproc.o, parallelism buys us hardly anything here anyway.  Also,
  make both this .NOTPARALLEL and the one previously added in
  interfaces/ecpg/Makefile be conditional on "ifeq
  ($(MAKE_VERSION),3.82)".  The known bug in gmake is fixed upstream
  and should not be present in 3.83 and up, and there's no reason to
  think it affects older releases.

- Produce a more useful error message for over-length Unix socket
  paths.  The length of a socket path name is constrained by the size
  of struct sockaddr_un, and there's not a lot we can do about it
  since that is a kernel API.  However, it would be a good thing if we
  produced an intelligible error message when the user specifies a
  socket path that's too long --- and getaddrinfo's standard API is
  too impoverished to do this in the natural way.  So insert explicit
  tests at the places where we construct a socket path name.  Now
  you'll get an error that makes sense and even tells you what the
  limit is, rather than something generic like "Non-recoverable
  failure in name resolution".  Per trouble report from Jeremy Drake
  and a fix idea from Andrew Dunstan.

- Add missing buffer lock acquisition in GetTupleForTrigger().  If we
  had not been holding buffer pin continuously since the tuple was
  initially fetched by the UPDATE or DELETE query, it would be
  possible for VACUUM or a page-prune operation to move the tuple
  while we're trying to copy it.  This would result in a garbage "old"
  tuple value being passed to an AFTER ROW UPDATE or AFTER ROW DELETE
  trigger.  The preconditions for this are somewhat improbable, and
  the timing constraints are very tight; so it's not so surprising
  that this hasn't been reported from the field, even though the bug
  has been there a long time.  Problem found by Andres Freund.
  Back-patch to all active branches.

- Take buffer lock while inspecting btree index pages in
  contrib/pageinspect.  It's not safe to examine a shared buffer
  without any lock.

- Allow adding values to an enum type created in the current
  transaction.  Normally it is unsafe to allow ALTER TYPE ADD VALUE in
  a transaction block, because instances of the value could be added
  to indexes later in the same transaction, and then they would still
  be accessible even if the transaction rolls back.  However, we can
  allow this if the enum type itself was created in the current
  transaction, because then any such indexes would have to go away
  entirely on rollback.  The reason for allowing this is to support
  pg_upgrade's new usage of pg_restore --single-transaction: in
  --binary-upgrade mode, pg_dump emits enum types as a succession of
  ALTER TYPE ADD VALUE commands so that it can preserve the values'
  OIDs.  The support is a bit limited, so we'll leave it undocumented.
  Andres Freund

- Make sure sharedir/extension/ directory is created when needed.  The
  previous coding worked as long as MODULEDIR wasn't set explicitly,
  because we create sharedir/$(datamoduledir) and the default value of
  that is "extension".  But if some other value is specified for
  MODULEDIR then the installation directory needed for the control
  file wasn't made.  Cédric Villemain

- Prevent passing gmake's environment variables down through
  pg_regress.  When we do "make install" to create a temp
  installation, we don't want that instance of make to try to
  communicate with any instance of make that might be calling us.
  This is known to cause problems if the upper make has a -jN flag,
  and in principle could cause problems even without that.  Unset the
  relevant environment variables to prevent such issues.  Andres

- Don't advance checkPoint.nextXid near the end of a checkpoint
  sequence.  This reverts commit
  c11130690d6dca64267201a169cfb38c1adec5ef in favor of actually fixing
  the problem: namely, that we should never have been modifying the
  checkpoint record's nextXid at this point to begin with.  The
  nextXid should match the state as of the checkpoint's logical WAL
  position (ie the redo point), not the state as of its physical
  position.  It's especially bogus to advance it in some wal_levels
  and not others.  In any case there is no need for the checkpoint
  record to carry the same nextXid shown in the XLOG_RUNNING_XACTS
  record just emitted by LogStandbySnapshot, as any replay operation
  will already have adopted that value as current.  This fixes bug
  #7710 from Tarvi Pillessaar, and probably also explains bug #6291
  from Daniel Farina, in that if a checkpoint were in progress at the
  instant of XID wraparound, the epoch bump would be lost as reported.
  (And, of course, these days there's at least a 50-50 chance of a
  checkpoint being in progress at any given instant.) Diagnosed by me
  and independently by Andres Freund.  Back-patch to all branches
  supporting hot standby.

- Recommend triggers, not rules, in the CREATE VIEW reference page.
  We've generally recommended use of INSTEAD triggers over rules since
  that feature was added; but this old text in the CREATE VIEW
  reference page didn't get the memo.  Noted by Thomas Kellerer.

- Update time zone data files to tzdata release 2012j.  DST law
  changes in Cuba, Israel, Jordan, Libya, Palestine, Western Samoa,
  and portions of Brazil.

Heikki Linnakangas pushed:

- Add OpenTransientFile, with automatic cleanup at end-of-xact.  Files
  opened with BasicOpenFile or PathNameOpenFile are not automatically
  cleaned up on error. That puts unnecessary burden on callers that
  only want to keep the file open for a short time. There is
  AllocateFile, but that returns a buffered FILE * stream, which in
  many cases is not the nicest API to work with. So add function
  called OpenTransientFile, which returns a unbuffered fd that's
  cleaned up like the FILE* returned by AllocateFile().  This plugs a
  few rare fd leaks in error cases: 1. copy_file() - fixed by by using
  OpenTransientFile instead of BasicOpenFile 2. XLogFileInit() - fixed
  by adding close() calls to the error cases.  Can't use
  OpenTransientFile here because the fd is supposed to persist over
  transaction boundaries.  3. lo_import/lo_export - fixed by using
  OpenTransientFile instead of PathNameOpenFile.  In addition to
  plugging those leaks, this replaces many BasicOpenFile() calls with
  OpenTransientFile() that were not leaking, because the code
  meticulously closed the file on error. That wasn't strictly
  necessary, but IMHO it's good for robustness.  The same leaks exist
  in older versions, but given the rarity of the issues, I'm not
  backpatching this. Not yet, anyway - it might be good to backpatch
  later, after this mechanism has had some more testing in master

- If we don't have a backup-end-location, don't claim we've reached
  it.  This was apparently a typo, which caused recovery to think that
  it immediately reached the end of backup, and allowed the database
  to start up too early.  Reported by Jeff Janes. Backpatch to 9.2,
  where this code was introduced.

Alvaro Herrera pushed:

- Split out rmgr rm_desc functions into their own files This is
  necessary (but not sufficient) to have them compilable outside of a
  backend environment.

- Change test ExceptionalCondition to return void.  Commit 81107282a
  changed it in assert.c, but overlooked this other file.

Michael Meskes pushed:

- When processing nested structure pointer variables ecpg always
  expected an array datatype which of course is wrong.  Applied patch
  by Muhammad Usama  to fix this.

Robert Haas pushed:

- Basic binary heap implementation.  There are probably other places
  where this can be used, but for now, this just makes MergeAppend use
  it, so that this code will have test coverage.  There is other work
  in the queue that will use this, as well.  Abhijit Menon-Sen,
  reviewed by Andres Freund, Robert Haas, Álvaro Herrera, Tom Lane,
  and others.

Simon Riggs pushed:

- Cleanup VirtualXact at end of Hot Standby.

- Correctly init fast path fields on PGPROC

- COPY FREEZE and mark committed on fresh tables.  When a relfilenode
  is created in this subtransaction or a committed child transaction
  and it cannot otherwise be seen by our own process, mark tuples
  committed ahead of transaction commit for all COPY commands in same
  transaction. If FREEZE specified on COPY and pre-conditions met then
  rows will also be frozen.  Both options designed to avoid revisiting
  rows after commit, increasing performance of subsequent commands
  after data load and upgrade. pg_restore changes later.  Simon Riggs,
  review comments from Heikki Linnakangas, Noah Misch and design input
  from Tom Lane, Robert Haas and Kevin Grittner

- Tweak tests in COPY FREEZE

- Second tweak of COPY FREEZE

- Clarify operation of online checkpoints.  Previous comments left,
  but were too obscure for such an important aspect of the system.

- XidEpoch++ if wraparound during checkpoint.  If wal_level =
  hot_standby we update the checkpoint nextxid, though in the case
  where a wraparound occurred half-way through a checkpoint we would
  neglect updating the epoch also. Updating the nextxid is arguably
  the wrong thing to do, but changing that may introduce subtle bugs
  into hot standby startup, while updating the value doesn't cause any
  known bugs yet. Minimal fix now to HEAD and backbranches, wider fix
  later in HEAD.  Bug reported in #6291 by Daniel Farina and slightly
  differently in Cause analysis and recommended fixes from Tom Lane
  and Andres Freund.  Applied patch is minimal version of Andres
  Freund's work.

- Rearrange storage of data in xl_running_xacts.  Previously we stored
  all xids mixed together.  Now we store top-level xids first,
  followed by all subxids. Also skip logging any subxids if the
  snapshot is suboverflowed, since there are potentially large numbers
  of them and they are not useful in that case anyway. Has value in
  the envisaged design for decoding of WAL.  No planned effect on Hot
  Standby.  Andres Freund, reviewed by me

- Reduce scope of changes for COPY FREEZE.  Allow support only for
  freezing tuples by explicit command. Previous coding mistakenly
  extended slightly beyond what was agreed as correct on -hackers.  So
  essentially a partial revoke of earlier work, leaving just the COPY
  FREEZE command.

Magnus Hagander pushed:

- Add libpq function PQconninfo().  This allows a caller to get back
  the exact conninfo array that was used to create a connection,
  including parameters read from the environment.  In doing this,
  restructure how options are copied from the conninfo to the actual
  connection.  Zoltan Boszormenyi and Magnus Hagander

Andrew Dunstan pushed:

- Clean environment for pg_upgrade test.  This removes exisiting PG
  settings from the environment for pg_upgrade tests, just like
  pg_regress does.

- Add mode where contrib installcheck runs each module in a separately
  named database.  Normally each module is tested in aq database named
  contrib_regression, which is dropped and recreated at the beginhning
  of each pg_regress run.  This mode, enabled by adding
  USE_MODULE_DB=1 to the make command line, runs most modules in a
  database with the module name embedded in it.  This will make
  testing pg_upgrade on clusters with the contrib modules a lot
  easier.  Still to be done: adapt to the MSVC build system.
  Backpatch to 9.0, which is the earliest version it is reasonably
  possible to test upgrading from.

Bruce Momjian pushed:

- Move long_options structures to the top of main() functions, for
  consistency.  Per suggestion from Tom.

- In pg_upgrade, dump each database separately and use
  --single-transaction to restore each database schema.  This yields
  performance improvements for databases with many tables.  Also,
  remove split_old_dump() as it is no longer needed.

- Split initdb.c main() code into multiple functions, for easier

- In pg_upgrade, improve status wording now that we have per-database
  status output for dump/restore.

- Remove pg_restore's --single-transaction option, as it throws errors
  in certain cases.

- Revert: In pg_upgrade, remove pg_restore's --single-transaction
  option, as it throws errors in certain cases.

- In pg_upgrade, remove 'set -x' from test script.

Peter Eisentraut pushed:

- doc: Fix broken links to DocBook wiki

Tatsuo Ishii pushed:

- Fix psql crash while parsing SQL file whose encoding is different
  from client encoding and the client encoding is not *safe* one. Such
  an example is, file encoding is UTF-8 and client encoding SJIS.
  Patch contributed by Jiang Guiqing.

== Rejected Patches (for now) ==

No one was disappointed this week :-)

== Pending Patches ==

Etsuro Fujita sent in another revision of the patch to add PRE- and
POST-processor options to COPY.

Alvaro Herrera and Kevin Grittner traded patches to implement foreign
key locks.

Bruce Momjian sent in three more revisions of a patch to fix an
infelicity in pg_upgrade when there are many tables.

Heikki Linnakangas and Alexander Korotkov traded patches and Erik
Rijkers sent in a series of tests for adding index support to certain
kinds of regular expression searches.

Alvaro Herrera sent in another revision of a patch to create a generic
wal reading facility dubbed XLogReader.

Dimitri Fontaine sent in another revision of a patch to add event
trigger information.

Andres Freund sent in a patch to fix an infelicity in ilist.h with
respect to C++ compilers.

Pavel Stehule sent in two more revisions of a patch to implement

Amit Kapila sent in another revision of a patch to compute the max LSN
of data pages.

Shigeru HANADA sent in another revision of a patch to create a FDW for

Amit Kapila sent in another revision of a patch to enable changing
configuration parameters from SQL.

Tom Lane sent in another revision of a patch to refactor the flex and
bison rules.

Heikki Linnakangas sent in a patch to refactor the standby mode logic.

Andres Freund sent in two revisions of a patch to fix "make -jN".

Zoltan Boszormenyi sent in another revision of a patch to enable
pg_basebackup configure and start a standby.

KaiGai Kohei sent in another revision of a patch to refactor the ALTER

Bruce Momjian sent in a patch to address infelicities in pg_upgrade
for the case of large number of tables.

KaiGai Kohei sent in another revision of a patch to create
OAT_POST_ALTER object access hooks.

Pavel Stehule sent in a patch to fix a corner use case of variadic
functions usage.

Phil Sorber sent in another revision of a patch to create a pg_ping

Alastair Turner sent in a patch to allow checking file parameters to
psql before password prompt.

Jan Wieck sent in another revision of a patch to help fix a
performance issue by using an autovacuum truncate exclusive lock.

Andrew Dunstan sent in a patch to allow using a separate database for
contrib module testing in "make check".

Tomas Vondra sent in another revision of a patch allow people to
reduce the load stats collection puts on the system.

Sent via pgsql-announce mailing list ([email protected])
To make changes to your subscription:
CD: 4ms