View Issue Details

IDProjectCategoryView StatusLast Update
0000155Pgpool-IIBugpublic2022-01-21 07:26
Reporterlmcauwa Assigned To 
PriorityhighSeverityblockReproducibilitysometimes
Status newResolutionopen 
PlatformlinuxOSredhatOS Versionredhat6.6
Summary0000155: failover: set new primary node: -1
DescriptionWe are using pgpool II
# pgpool -version
pgpool-II version 3.4.2 (tataraboshi)

we have the following setup:

one pgpool node, with one master DB and 3 slave nodes

during our failover testing, after server failover happened,

we got the following:
2015-11-07 08:43:52: pid 6378: LOG: failover: set new primary node: -1
2015-11-07 08:43:52: pid 6378: LOG: failover: set new master node: 0

Then it start to direct the update request to the node 0, which is read-only

2015-11-07 08:44:05: pid 1698: LOG: pool_send_and_wait: Error or notice message from backend: : DB node id: 0 backend pid: 8651 statement: "update mdnowner.MDN_COMPONENT_PROCESS set MDN_COMPONENT_ID=$1, CREATED_AT=$2, NAME=$3, RUNNING=$4, STATUS=$5, UPDATED_AT=$6, version=$7 where id=$8 and version=$9" message: "cannot execute UPDATE in a read-only transaction"

At this point, pgpool was completed confused itself. It was not able to detect the proper master and slave node.

After I restart the pgpool, the problem is gone.

TagsNo tags attached.

Activities

pbielawski

2022-01-21 07:26

reporter   ~0003986

Hello,

I know its quite old topic.
I have the same issue with 3 node pgpool 4.2.7 + 3 node postgresql 13.5 sr setup.

The issue take place when one of the postgresql fails (putting VM down).
After setting new primary node to -1 leading pgpool node restart itself after the issue (because of the lack of consensus and possibility of split-brain) and then new master performs failover properly.
However, it makes the whole cluster to take over a minute to perform failover, which is quite unacceptable time.

I have attached log from the issue.

Sincerly,
pbielawski
pgpool-2022-01-20.log (18,804 bytes)   
2022-01-20 22:19:43: pid 5235: LOG:  pool_read_kind: error message from main backend:the database system is shutting down
2022-01-20 22:19:43: pid 5235: ERROR:  unable to read message kind
2022-01-20 22:19:43: pid 5235: DETAIL:  kind does not match between main(45) slot[1] (52)
2022-01-20 22:19:43: pid 5240: LOG:  pool_read_kind: error message from main backend:the database system is shutting down
2022-01-20 22:19:43: pid 5240: ERROR:  unable to read message kind
2022-01-20 22:19:43: pid 5240: DETAIL:  kind does not match between main(45) slot[1] (52)
2022-01-20 22:19:43: pid 5164: LOG:  pool_read_kind: error message from main backend:the database system is shutting down
2022-01-20 22:19:43: pid 5164: ERROR:  unable to read message kind
2022-01-20 22:19:43: pid 5164: DETAIL:  kind does not match between main(45) slot[1] (52)
2022-01-20 22:19:43: pid 5152: LOG:  pool_read_kind: error message from main backend:the database system is shutting down
2022-01-20 22:19:43: pid 5152: ERROR:  unable to read message kind
2022-01-20 22:19:43: pid 5152: DETAIL:  kind does not match between main(45) slot[1] (52)
2022-01-20 22:19:43: pid 5220: LOG:  failed to connect to PostgreSQL server on "pgsql01:5432", getsockopt() failed
2022-01-20 22:19:43: pid 5220: DETAIL:  Operation now in progress
2022-01-20 22:19:43: pid 5220: LOG:  received degenerate backend request for node_id: 0 from pid [5220]
2022-01-20 22:19:43: pid 3552: LOG:  new IPC connection received
2022-01-20 22:19:43: pid 3552: LOG:  watchdog received the failover command from local pgpool-II on IPC interface
2022-01-20 22:19:43: pid 3552: LOG:  watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2022-01-20 22:19:43: pid 3552: LOG:  failover requires the majority vote, waiting for consensus
2022-01-20 22:19:43: pid 3552: DETAIL:  failover request noted
2022-01-20 22:19:43: pid 3552: LOG:  failover command [DEGENERATE_BACKEND_REQUEST] request from pgpool-II node "pgpl02:5432 Linux pgpl02" is queued, waiting for the confirmation from other nodes
2022-01-20 22:19:43: pid 5220: LOG:  degenerate backend request for node_id: 0 from pid [5220], will be handled by watchdog, which is building consensus for request
2022-01-20 22:19:43: pid 5220: FATAL:  failed to create a backend connection
2022-01-20 22:19:43: pid 5220: DETAIL:  executing failover on backend
2022-01-20 22:19:43: pid 3537: LOG:  child process with pid: 5220 exits with status 256
2022-01-20 22:19:43: pid 3537: LOG:  fork a new child process with pid: 5242
2022-01-20 22:19:43: pid 5222: LOG:  failed to connect to PostgreSQL server on "pgsql01:5432", getsockopt() failed
2022-01-20 22:19:43: pid 5222: DETAIL:  Operation now in progress
2022-01-20 22:19:43: pid 5222: LOG:  received degenerate backend request for node_id: 0 from pid [5222]
2022-01-20 22:19:43: pid 3552: LOG:  new IPC connection received
2022-01-20 22:19:43: pid 3552: LOG:  watchdog received the failover command from local pgpool-II on IPC interface
2022-01-20 22:19:43: pid 3552: LOG:  watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2022-01-20 22:19:43: pid 3552: LOG:  Duplicate failover request from "pgpl02:5432 Linux pgpl02" node
2022-01-20 22:19:43: pid 3552: DETAIL:  request ignored
2022-01-20 22:19:43: pid 3552: LOG:  failover requires the majority vote, waiting for consensus
2022-01-20 22:19:43: pid 3552: DETAIL:  failover request noted
2022-01-20 22:19:43: pid 5222: LOG:  degenerate backend request for 1 node(s) from pid [5222], is changed to quarantine node request by watchdog
2022-01-20 22:19:43: pid 5222: DETAIL:  watchdog is taking time to build consensus
2022-01-20 22:19:43: pid 5222: LOG:  signal_user1_to_parent_with_reason(0)
2022-01-20 22:19:43: pid 5222: FATAL:  failed to create a backend connection
2022-01-20 22:19:43: pid 5222: DETAIL:  executing failover on backend
2022-01-20 22:19:43: pid 3537: LOG:  Pgpool-II parent process received SIGUSR1
2022-01-20 22:19:43: pid 3537: LOG:  Pgpool-II parent process has received failover request
2022-01-20 22:19:43: pid 3552: LOG:  new IPC connection received
2022-01-20 22:19:43: pid 3552: LOG:  received the failover indication from Pgpool-II on IPC interface
2022-01-20 22:19:43: pid 3552: LOG:  watchdog is informed of failover start by the main process
2022-01-20 22:19:43: pid 3537: LOG:  starting quarantine. shutdown host pgsql01(5432)
2022-01-20 22:19:43: pid 3537: LOG:  Restart all children
2022-01-20 22:19:43: pid 3537: LOG:  failover: set new primary node: -1
2022-01-20 22:19:43: pid 3537: LOG:  failover: set new main node: 1
2022-01-20 22:19:43: pid 5166: LOG:  worker process received restart request
2022-01-20 22:19:43: pid 3552: LOG:  new IPC connection received
2022-01-20 22:19:43: pid 3552: LOG:  received the failover indication from Pgpool-II on IPC interface
2022-01-20 22:19:43: pid 3552: LOG:  watchdog is informed of failover end by the main process
quarantine done. shutdown host pgsql01(5432)2022-01-20 22:19:43: pid 3537: LOG:  quarantine done. shutdown host pgsql01(5432)
2022-01-20 22:19:44: pid 5130: LOG:  restart request received in pcp child process
2022-01-20 22:19:44: pid 3537: LOG:  PCP child 5130 exits with status 0 in failover()
2022-01-20 22:19:44: pid 3537: LOG:  fork a new PCP child pid 5363 in failover()
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 4794 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5022 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5056 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5061 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5077 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5094 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5096 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5097 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5115 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5125 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5131 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5132 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5133 exits with status 256
2022-01-20 22:19:44: pid 5363: LOG:  PCP process: 5363 started
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5134 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5135 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5136 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5137 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5138 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5139 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5140 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5141 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5142 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5143 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5144 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5145 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5146 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5147 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5148 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5149 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5150 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5151 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5152 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5153 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5154 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5155 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5156 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5157 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5158 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5159 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5160 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5161 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5162 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5163 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5164 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5165 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  worker child process with pid: 5166 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  fork a new worker child process with pid: 5364
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5167 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5168 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5169 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5170 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5171 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5172 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5173 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5174 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5175 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5176 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5177 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5178 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5179 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5180 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5181 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5182 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5183 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5184 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5185 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5186 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5187 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5188 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5189 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5190 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5191 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5192 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5193 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5194 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5195 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5196 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5197 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5198 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5199 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5200 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5201 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5202 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5203 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5204 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5205 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5206 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5207 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5208 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5209 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5210 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5211 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5212 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5213 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5214 exits with status 256
2022-01-20 22:19:44: pid 5364: LOG:  process started
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5215 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5216 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5217 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5218 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5219 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5221 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5222 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5223 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5224 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5225 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5226 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5227 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5228 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5229 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5230 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5231 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5232 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5233 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5234 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5235 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5236 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5237 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5238 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5239 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5240 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5241 exits with status 256
2022-01-20 22:19:44: pid 3537: LOG:  child process with pid: 5242 exits with status 256
2022-01-20 22:19:44: pid 3552: LOG:  new IPC connection received
2022-01-20 22:19:51: pid 3680: LOG:  failed to connect to PostgreSQL server on "pgsql01:5432", timed out
2022-01-20 22:19:51: pid 3680: ERROR:  failed to make persistent db connection
2022-01-20 22:19:51: pid 3680: DETAIL:  connection to host:"pgsql01:5432" failed
2022-01-20 22:19:51: pid 3680: LOG:  health check retrying on DB node: 0 (round:1)
2022-01-20 22:19:54: pid 3552: LOG:  new IPC connection received
2022-01-20 22:19:57: pid 3680: LOG:  failed to connect to PostgreSQL server on "pgsql01:5432", timed out
2022-01-20 22:19:57: pid 3680: ERROR:  failed to make persistent db connection
2022-01-20 22:19:57: pid 3680: DETAIL:  connection to host:"pgsql01:5432" failed
2022-01-20 22:19:57: pid 3680: LOG:  health check retrying on DB node: 0 (round:2)
2022-01-20 22:19:58: pid 3552: LOG:  We are not able to build consensus for our primary node failover request, got 1 votes only for failover request ID:194
2022-01-20 22:19:58: pid 3552: DETAIL:  resigning from the coordinator
2022-01-20 22:19:58: pid 3552: LOG:  watchdog node state changed from [LEADER] to [JOINING]
2022-01-20 22:19:58: pid 3552: LOG:  removing the local node "pgpl02:5432 Linux pgpl02" from watchdog cluster leader
2022-01-20 22:19:58: pid 5365: LOG:  watchdog: de-escalation started
2022-01-20 22:19:58: pid 3552: LOG:  watchdog node state changed from [JOINING] to [INITIALIZING]
2022-01-20 22:19:58: pid 5365: LOG:  successfully released the delegate IP:"192.168.23.190"
2022-01-20 22:19:58: pid 5365: DETAIL:  'if_down_cmd' returned with success
2022-01-20 22:19:58: pid 3552: LOG:  watchdog de-escalation process with pid: 5365 exit with SUCCESS.
2022-01-20 22:19:59: pid 3552: LOG:  watchdog node state changed from [INITIALIZING] to [STANDING FOR LEADER]
2022-01-20 22:19:59: pid 3552: LOG:  watchdog node state changed from [STANDING FOR LEADER] to [PARTICIPATING IN ELECTION]
2022-01-20 22:19:59: pid 3552: LOG:  watchdog node state changed from [PARTICIPATING IN ELECTION] to [INITIALIZING]
2022-01-20 22:19:59: pid 3552: LOG:  setting the remote node "pgpl03:5432 Linux pgpl03" as watchdog cluster leader
2022-01-20 22:20:00: pid 3552: LOG:  watchdog node state changed from [INITIALIZING] to [STANDBY]
2022-01-20 22:20:00: pid 3552: LOG:  signal_user1_to_parent_with_reason(1)
2022-01-20 22:20:00: pid 3552: LOG:  successfully joined the watchdog cluster as standby node
2022-01-20 22:20:00: pid 3552: DETAIL:  our join coordinator request is accepted by cluster leader node "pgpl03:5432 Linux pgpl03"
pgpool-2022-01-20.log (18,804 bytes)   

Issue History

Date Modified Username Field Change
2015-11-10 23:08 lmcauwa New Issue
2022-01-21 07:26 pbielawski Note Added: 0003986
2022-01-21 07:26 pbielawski File Added: pgpool-2022-01-20.log