Release Date: 2016-11-21
Major enhancements in Pgpool-II 3.6 include:
Improve the behavior of failover. In the steaming replication mode, client sessions will not be disconnected when a failover occurs any more if the session does not use the failed standby server. If the primary server goes down, still all sessions will be disconnected. Also it is possible to connect to Pgpool-II even if it is doing health checking retries. Before all attempt of connecting to Pgpool-II failed while doing health checking retries.
New PGPOOL SET command has been introduced. Certain configuration parameters now can be changed on the fly in a session.
Watchdog is significantly enhanced. It becomes more reliable than previous releases.
Handling of extended query protocol (e.g. used by Java applications) in streaming replication mode speeds up if many rows are returned in a result set.
Import parser of PostgreSQL 9.6.
In some cases pg_terminate_backend() now does not trigger a failover.
Change documentation format from raw HTML to SGML.
The above items are explained in more detail in the sections below.
Improve the behavior of failover. (Tatsuo Ishii)
In the steaming replication mode, client sessions will not be disconnected when a failover occurs any more if the session does not use the failed standby server. If the primary server goes down, still all sessions will be disconnected. Health check timeout case will also cause the full session disconnection. Other health check error, including retry over case does not trigger full session disconnection.
For user's convenience, "show pool_nodes" command shows the session local load balance node info since this is important for users in case of failover. If the load balance node is not the failed node, the session will not be affected by failover.
Also now it is possible to connect to Pgpool-II even if it is doing health checking retries. Before all attempt of connecting to Pgpool-II failed while doing health checking retries. Before any attempt to connect to Pgpool-II fails if it is doing a health check against failed node even if failover_on_backend_error is off because Pgpool-II child first tries to connect to all backend including the failed one and exits if it fails to connect to a backend (of course it fails). This is a temporary situation and will be resolved once pgpool executes failover. However if the health check is retrying, the temporary situation keeps longer depending on the setting of health_check_max_retries and health_check_retry_delay. This is not good. Attached patch tries to mitigate the problem:
When an attempt to connect to backend fails, give up connecting to the failed node and skip to other node, rather than exiting the process if operating in streaming replication mode and the node is not primary node.
Mark the local status of the failed node to "down". This will let the primary node be selected as a load balance node and every queries will be sent to the primary node. If there's other healthy standby nodes, one of them will be chosen as the load balance node.
After the session is over, the child process will suicide to not retain the local status.
These are similar to the PostgreSQL's SET and SHOW commands for GUC variables, adding the functionality in Pgpool-II to set and reset the value of config parameters for the current session, and for that it adds a new syntax in Pgpool-II which is similar to PostgreSQL's SET and RESET variable syntax with an addition of PGPOOL keyword at the start.
Currently supported configuration parameters by PGPOOL SHOW/SET/RESET are: log_statement, log_per_node_statement, check_temp_table, check_unlogged_table, allow_sql_comments, client_idle_limit, log_error_verbosity, client_min_messages, log_min_messages, client_idle_limit_in_recovery.
Sync inconsistent status of PostgreSQL nodes in Pgpool-II instances after restart. (bug 218) (Muhammad Usama)
Watchdog does not synchronize status.
Enhance performance of SELECT when lots of rows involved. (Tatsuo Ishii)
Pgpool-II flushes data to network (calling write(2)) every time it sends a row data ("Data Row" message) to frontend. For example, if 10,000 rows needed to be transfer, 10,000 times write()s are issued. This is pretty expensive. Since after repeating to send row data, "Command Complete" message is sent, it's enough to issue a write() with the command complete message. Also there are unnecessary flushing are in handling the command complete message.
Quick testing showed that from 47% to 62% performance enhancements were achieved in some cases.
Unfortunately, performance in workloads where transferring few rows, will not be enhanced since such rows are needed to flush to network anyway.
Import PostgreSQL 9.6's SQL parser. (Bo Peng)
This allows Pgpool-II to fully understand the newly added SQL syntax such as COPY INSERT RETURNING.
In some cases
pg_terminate_backend() now does not trigger a failover. (Muhammad Usama)
Because PostgreSQL returns exactly the same error code as postmaster
down case and
pg_terminate_backend() case, using
pg_terminate_backend() raises a failover which user might not want. To
fix this, now Pgpool-II finds a pid of backend which is the target of
pg_terminate_backend() and does not trigger failover if so.
This functions is limited to the case of simple protocol and the pid
is given to
pg_terminate_backend() as a constant. So if you call
pg_terminate_backend() via extended protocol (e.g. Java) still
pg_terminate_backend() triggers a failover.
HTML documents are now generated from SGML documents. (Muhammad Usama, Tatsuo Ishii, Bo Peng)
It is intended to have better construction, contents and maintainability. Also man pages are now generated from SGML. However, still there's tremendous room to enhance the SGML documents. Please help us!
Make authentication error message more user friendly. (Tatsuo Ishii)
When attempt to connect to backend (including health checking), emit error messages from backend something like "sorry, too many clients already" instead of "invalid authentication message response type, Expecting 'R' and received '%c'"
Tighten up health check timer expired condition in pool_check_fd(). (Muhammad Usama)
Add new script called "watchdog_setup". (Tatsuo Ishii)
watchdog_setup is a command to create a temporary installation of Pgpool-II clusters with watchdog for mainly testings.
Add "-pg" option to pgpool_setup. (Tatsuo Ishii)
This is useful when you want to assign specific port numbers to PostgreSQL while using pgpool_setup. Also now pgpool_setup is installed in the standard bin directory which is same as pgpool.
Add "replication delay" column to "show pool_nodes". (Tatsuo Ishii)
This column shows the replication delay value in bytes if operated in streaming replication mode.
Do not update status file if all backend nodes are in down status. (Chris Pacejo, Tatsuo Ishii)
This commit tries to remove the data inconsistency in replication mode found in [pgpool-general: 3918] by not recording the status file when all backend nodes are in down status. This surprisingly simple but smart solution was provided by Chris Pacejo.
Allow to use multiple SSL cipher protocols. (Muhammad Usama)
By replacing TLSv1_method() with SSLv23_method() while initializing the SSL session, we can use more protocols than TLSv1 protocol.
Allow to use arbitrary number of items in the black_function_list/white_function_list. (Muhammad Usama)
Previously there were fixed limits for those.
Properly process empty queries (all comments). (Tatsuo Ishii)
Pgpool-II now recognizes an empty query consisted of all comments (for example "/* DBD::Pg ping test v3.5.3 */") (note that no ';') as an empty query.
Before such that query was recognized an error.
Add some warning messages for wd_authkey hash calculation failure. (Yugo Nagata)
Sometimes wd_authkey calculation fails for some reason other than authkey mismatch. The additional messages make these distinguishable for each other.
Fix the broken log_destination = syslog functionality. (Muhammad Usama)
Fixing the logging to the syslog destination, which got broken by the PGPOOL SET/SHOW command commit, and also enhancing the log_destination configuration parameter to be assigned with the comma separated list of multiple destinations for the Pgpool-II log. Now, after this commit log_destination can be set to any combination of 'syslog' and 'stderr' log destinations.
Change the default value of search_primary_node_timeout from 10 to 300. (Tatsuo Ishii)
Prior default value 10 seconds is sometimes too short for a standby to be promoted.
Change the Makefile under directory src/sql/, that is proposed by [pgpool-hackers: 1611]. (Bo Peng)
Change the PID length of pcp_proc_count command output to 6 characters long. (Bo Peng)
If the Pgpool-II process ID are over 5 characters, the 6th character of each process ID will be removed. This commit changes the process ID length of pcp_proc_count command output to 6 characters long.
Redirect all user queries to primary server. (Tatsuo Ishii)
Up to now some user queries are sent to other than the primary server even if load_balance_mode = off. This commit changes the behavior: if load_balance_mode = off in streaming replication mode, now all the user queries are sent to the primary server only.
Fixing a potential crash in pool_stream functions. (Muhammad Usama)
POOL_CONNECTION->con_info should be checked for null value before de-referencing when read or write fails on backend socket.
Fixing the design of failover command propagation on watchdog cluster. (Muhammad Usama)
Overhauling the design of how failover, failback and promote node commands are propagated to the watchdog nodes. Previously the watchdog on pgpool-II node that needs to perform the node command (failover, failback or promote node) used to broadcast the failover command to all attached pgpool-II nodes. And this sometimes makes the synchronization issues, especially when the watchdog cluster contains a large number of nodes and consequently the failover command sometimes gets executed by more than one Pgpool-II.
Now with this commit all the node commands are forwarded to the master/coordinator watchdog, which in turn propagates to all standby nodes. Apart from above the commit also changes the failover command interlocking mechanism and now only the master/coordinator node can become the lock holder so the failover commands will only get executed on the master/coordinator node.
Fix the case when all backends are down then 1 node attached. (Tatsuo Ishii)
When all backends are down, no connection is accepted. Then 1 PostgreSQL becomes up, and attach the node using pcp_attach_node. It successfully finishes. However, when a new connection arrives, still the connection is refused becausePgpool-II child process looks into the cached status, in which the recovered node is still in down status if mode is streaming replication mode (native replication and other modes are fine). Solution is, if all nodes are down, force to restart all pgpool child.
Fix for avoiding downtime when Pgpool-II changes require a restart. (Muhammad Usama)
To fix this, the verification mechanism of configuration parameter values is reversed, previously the standby nodes used to verify their parameter values against the respective values on the master Pgpool-II node and when the inconsistency was found the FATAL error was thrown, now with this commit the verification responsibility is delegated to the master Pgpool-II node. Now the master node will verify the configuration parameter values of each joining standby node against its local values and will produce a WARNING message instead of an error in case of a difference. This way the nodes having the different configurations will also be allowed to join the watchdog cluster and the user has to manually look out for the configuration inconsistency warnings in the master Pgpool-II log to avoid the surprises at the time of Pgpool-II master switch over.
Fix a problem with the watchdog failover_command locking mechanism. (Muhammad Usama)
Add compiler flag "-fno-strict-aliasing" in configure.ac to fix compiler error. (Tatsuo Ishii)
Do not use
random() while generating MD5 salt. (Tatsuo Ishii)
random() should not be used in security related applications. To
PostmasterRandom() from PostgreSQL. Also
store current time at the start up of Pgpool-II main process for later
Don't ignore sync message from frontend when query cache is enabled. (Tatsuo Ishii)
Fix bug that Pgpool-II fails to start if listen_addresses is empty string. (bug 237) (Muhammad Usama)
The socket descriptor array (fds) was not getting the array end marker when TCP listen addresses are not used.
Create regression log directory if it does not exist yet. (Tatsuo Ishii)
Fixing the error messages when the socket operation fails. (Muhammad Usama)
Update regression test 003.failover to reflect the changes made to show pool_nodes. (Tatsuo Ishii)
Fix hang when portal suspend received. (bug 230) (Tatsuo Ishii)
Fix pgpool doesn't de-escalate IP in case network restored. (bug 228) (Muhammad Usama)
set_state function is made to de-escalate, when it is changing the local node's state from the coordinator state to some other state.
SIGUSR1 signal handler should be installed before watchdog initialization. (Muhammad Usama)
Since there can be a case where a failover request from other watchdog nodes arrive at the same time when the watchdog has just been initialized, and if we wait any longer to install a SIGUSR1 signal handler, it can result in a potential crash
Fix Pgpool-II doesn't escalate ip in case of another node unavailability. (bug 215) (Muhammad Usama)
The heartbeat receiver fails to identify the heartbeat sender watchdog node when the heartbeat destination is specified in terms of an IP address while wd_hostname is configured as a hostname string or vice versa.
Fixing a coding mistake in watchdog code. (Muhammad Usama)
wd_issue_failover_lock_command() function is supposed to forward command type
passed in as an argument to the
wd_send_failover_sync_command() function instead
it was passing the NODE_FAILBACK_CMD command type.
The commit also contains some log message enhancements.
Display human readable output for backend node status. (Muhammad Usama)
Changed the output of pcp_node_info utility and show commands display human readable backend status string instead of internal status code.
Replace "MAJOR" macro to prevent occasional failure. (Tatsuo Ishii)
The macro calls
pool_virtual_master_db_node_id() and then access
backend->slots[id]->con using the node id returned. In rare cases, it
could point to 0 (in case when the DB node is not connected), which
gives access to con->major, then it causes a segfault.
Fix "kind mismatch" error message in Pgpool-II. (Muhammad Usama)
Many of "kind mismatch..." errors are caused by notice/warning messages produced by one or more of the DB nodes. In this case now Pgpool-II forwards the messages to frontend, rather than throwing the "kind mismatch..." error. This would reduce the chance of "kind mismatch..." errors.
Fix handling of pcp_listen_addresses config parameter. (Muhammad Usama)
Save and restore errno in each signal handler. (Tatsuo Ishii)
Fix usage of
wait(2) in pgpool main process. (Tatsuo Ishii)
The usage of
wait(2) in Pgpool-II
main could cause infinite wait in the system call. Solution is,
waitpid(2) instead of
pool_read() does not emit error messages when
read(2) returns -1 if
failover_on_backend_error is off. (Tatsuo Ishii)
Fix buffer over run problem in "show pool_nodes". (Tatsuo Ishii)
While processing "show pool_nodes", the buffer for hostname was too short. It should be same size as the buffer used for pgpool.conf. Problem reported by a twitter user who is using pgpool on AWS (which could have very long hostname).
Fix [pgpool-hackers: 1638] pgpool-II does not use default configuration. (Muhammad Usama)
Configuration file not found should just throw a WARNING message instead of ERROR or FATAL.
Fix bug with load balance node id info on shmem. (Tatsuo Ishii)
There are few places where the load balance node was mistakenly put on wrong place. It should be placed on:
ConnectionInfo *con_info[child id, connection pool_id, backend id].load_balancing_node].
In fact it was placed on:
*con_info[child id, connection pool_id, 0].load_balancing_node].
As long as the backend id in question is 0, it is ok. However while testing Pgpool-II 3.6's enhancement regarding failover, if primary node is 1 (which is the load balance node) and standby is 0, a client connecting to node 1 is disconnected when failover happens on node 0. This is unexpected and the bug was revealed.
It seems the bug was there since long time ago but it had not found until today by the reason above.
Fix for bug that pgpool hangs connections to database. (bug 197) (Muhammad Usama)
The client connection was getting stuck when backend node and remote Pgpool-II node becomes unavailable at the same time. The reason was a missing command timeout handling in the function that sends the IPC commands to watchdog.
Fix a possible hang during health checking. (bug 204) (Yugo Nagata)
Health checking was hang when any data wasn't sent
from backend after
connect(2) succeeded. To fix this,
pool_check_fd() returns 1 when
select(2) exits with
EINTR due to SIGALRM while health checking is performed.
Deal with the case when the primary is not node 0 in streaming replication mode. (Tatsuo Ishii)
http://www.pgpool.net/mantisbt/view.php?id=194#c837 reported that if
primary is not node 0, then statement timeout could occur even after
bug194-3.3.diff was applied. After some investigation, it appeared
that MASTER macro could return other than primary or load balance
node, which was not supposed to happen, thus
do_query() sends queries
to wrong node (this is not clear from the report but I confirmed it in
pool_virtual_master_db_node_id(), which is called in MASTER macro
returns query_context->virtual_master_node_id if query context
exists. This could return wrong node if the variable has not been set
yet. To fix this, the function is modified: if the variable is not
either load balance node or primary node, the primary node id is
If statement timeout is enabled on backend and
do_query() sends a query to primary node, and all of following user queries are sent to
standby, it is possible that the next command, for example END, could
cause a statement timeout error on the primary, and a kind mismatch
error on pgpool-II is raised. (bug 194) (Tatsuo Ishii)
This fix tries to mitigate the problem by sending sync message instead
of flush message in
do_query(), expecting that the sync message reset
the statement timeout timer if we are in an explicit transaction. We
cannot use this technique for implicit transaction case, because the
sync message removes the unnamed portal if there's any.
Plus, pg_stat_statement will no longer show the query issued by do_query() as "running".
Plus, pg_stat_statement will no longer show the query issued by
do_query() as "running".
Fix extended protocol handling in raw mode. (Tatsuo Ishii)
Bug152 reveals that extended protocol handling in raw mode (actually
other than in stream mode) was wrong in
Unlike stream mode, they should wait for backend response.
Fix confusing comments in pgpool.conf. (Tatsuo Ishii)
Fix Japanese and Chinese documentation bug about raw mode. (Yugo Nagata, Bo Peng)
Connection pool is available in raw mode.
SET default_transaction_isolation TO 'serializable'. (bug 191) (Bo Peng)
SET default_transaction_isolation TO 'serializable' is sent to not only primary but also to standby server in streaming replication mode, and this causes an error. Fix is, in streaming replication mode, SET default_transaction_isolation TO 'serializable' is sent only to the primary server.
Fix extended protocol hang with empty query. (bug 190) (Tatsuo Ishii)
The fixes related to extended protocol cases in 3.5.1 broke the case of empty query. In this case backend replies with "empty query response" which is same meaning as a command complete message. Problem is, when empty query response is received, pgpool does not reset the query in progress flag thus keeps on waiting for backend. However, backend will not send the ready for query message until it receives a sync message. Fix is, resetting the in progress flag after receiving the empty query response and reads from frontend expecting it sends a sync message.
Fix for [pgpool-general: 4569] segfault during trusted_servers check. (Muhammad Usama)
PostgreSQL's memory and exception manager APIs adopted by the Pgpool-II 3.4 are not thread safe and are causing the segmentation fault in the watchdog lifecheck process, as it uses the threads to ping configured trusted hosts for checking the upstream connections. Fix is to remove threads and use the child process approach instead.
Validating the PCP packet length. (Muhammad Usama)
Without the validation check, a malformed PCP packet can crash the PCP child and/or can run the server out of memory by sending the packet with a very large data size.
Fix pgpool_setup to not confuse log output. (Tatsuo Ishii)
Before it simply redirects the stdout and stderr of pgpool process to a log file. This could cause log contents being garbled or even missed because of race condition caused by multiple process being writing concurrently. I and Usama found this while investigating the regression failure of 004.watchdog. To fix this, pgpool_setup now generates startall script so that pgpool now sends stdout/stderr to cat command and cat writes to the log file (It seems the race condition does not occur when writing to a pipe).
Fix for [pgpool-general: 4519] Worker Processes Exit and Are Not Re-spawned. (Muhammad Usama)
The problem was due to a logical mistake in the code for checking the exiting child process type when the watchdog is enabled. I have also changed the severity of the message from FATAL to LOG, emitted for child exits due to max connection reached.
Fix pgpool hung after receiving error state from backend. (bug #169) (Tatsuo Ishii)
This could happen if we execute an extended protocol query and it fails.
Fix query stack problems in extended protocol case. (bug 167, 168) (Tatsuo Ishii)
Fix [pgpool-hackers: 1440] yet another reset query stuck problem. (Tatsuo Ishii)
After receiving X message from frontend, if Pgpool-II detects EOF on the connection before sending reset query, Pgpool-II could wait for backend which had not received the reset query. To fix this, if EOF received, treat this as FRONTEND_ERROR, rather than ERROR.
Fix for [pgpool-general: 4265] another reset query stuck problem. (Muhammad Usama)
The solution is to report FRONTEND_ERROR instead of simple ERROR when pool_flush on front-end socket fails.
Fixing pgpool-recovery module compilation issue with PostgreSQL 9.6. (Muhammad Usama)
Incorporating the change of function signature for
GetConfigOption() functions in PostgreSQL 9.6
Fix compile issue on FreeBSD. (Muhammad Usama)
Add missing include files. The patch is contributed by the bug reporter and enhanced a little by me.
Fix regression test to check timeout of each test. (Yugo Nagata)
Add some warning messages for wd_authkey hash calculation failure. (Yugo Nagata)
Sometimes wd_authkey calculation fails for some reason other than authkey mismatch. The additional messages make these distinguishable for each other.