[pgpool-general: 2358] Re: Problems with recovered master node.

Tatsuo Ishii ishii at postgresql.org
Sat Dec 21 08:20:46 JST 2013


Does it? It's not what I expected. Can you please show the pgpool log?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> But then, why  it works when master node is node 0?
> 
> On 20/12/13 01:01, Tatsuo Ishii wrote:
>> I see now the source of the problem:
>>
>>> 2013-12-19 20:35:16 LOG:   pid 4328: statement: BEGIN; SET TRANSACTION
>>> ISOLATION LEVEL READ COMMITTED
>> pgpool-II cannot handle "multi statement queries" (multiple SQL
>> statement in single line like this). If pgpool-II finds such a query,
>> it sends to the primary node (in your case node 1) regardless the
>> contents of the query. So it sends "BEGIN" to node 1, while it does
>> not send to node 0. Later on "ROLLBACK" is sent to both node 0 and
>> 1. Of course there's no explicit transaction is running on node 0, it
>> fails.
>>
>> If such a query is generated by the flame work, There's not much
>> which pgpool-II can do...
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>>> I actually don't know too  much about the internals of the application.
>>> And the problem is when the application is trying to start a connection.
>>> (The RollBack are probably internals of sqlalchemy, I really don't know)
>>> The only thing I know is that they are using sqlalchemy and flask.
>>>
>>> Here is the pgppol log about the error:
>>> 2013-12-19 20:32:29 LOG:   pid 31014: received smart shutdown request
>>> 2013-12-19 20:32:30 LOG:   pid 4295: pgpool-II successfully started.
>>> version 3.3.2 (tokakiboshi)
>>> 2013-12-19 20:32:30 LOG:   pid 4295: find_primary_node: primary node id is 1
>>> 2013-12-19 20:35:16 LOG:   pid 4328: statement: SET DATESTYLE TO 'ISO'
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 0 backend pid: 5382
>>> statement: BEGIN
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: BEGIN
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 2 backend pid: 16458
>>> statement: BEGIN
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 0 backend pid: 5382
>>> statement: SET DATESTYLE TO 'ISO'
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: SET DATESTYLE TO 'ISO'
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 2 backend pid: 16458
>>> statement: SET DATESTYLE TO 'ISO'
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: COMMIT
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 2 backend pid: 16458
>>> statement: COMMIT
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 0 backend pid: 5382
>>> statement: COMMIT
>>> 2013-12-19 20:35:16 LOG:   pid 4328: statement: SHOW client_encoding
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: SHOW client_encoding
>>> 2013-12-19 20:35:16 LOG:   pid 4328: statement: SHOW
>>> default_transaction_isolation
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: SHOW default_transaction_isolation
>>> 2013-12-19 20:35:16 LOG:   pid 4328: statement: BEGIN; SET TRANSACTION
>>> ISOLATION LEVEL READ COMMITTED
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
>>> 2013-12-19 20:35:16 LOG:   pid 4328: statement: select version()
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: select version()
>>> 2013-12-19 20:35:16 LOG:   pid 4328: statement: select current_schema()
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: select current_schema()
>>> 2013-12-19 20:35:16 LOG:   pid 4328: statement: show transaction
>>> isolation level
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: show transaction isolation level
>>> 2013-12-19 20:35:16 LOG:   pid 4328: statement: SELECT CAST('test plain
>>> returns' AS VARCHAR(60)) AS anon_1
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
>>> 2013-12-19 20:35:16 LOG:   pid 4328: statement: SELECT CAST('test
>>> unicode returns' AS VARCHAR(60)) AS anon_1
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
>>> 2013-12-19 20:35:16 LOG:   pid 4328: statement: ROLLBACK
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
>>> statement: ROLLBACK
>>> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 0 backend pid: 5382
>>> statement: ROLLBACK
>>> 2013-12-19 20:35:16 LOG:   pid 4328: pool_send_and_wait: Error or notice
>>> message from backend: : DB node id: 0 backend pid: 5382 statement:
>>> ROLLBACK message: there is no transaction in progress
>>> 2013-12-19 20:35:16 ERROR: pid 4328: read_kind_from_backend: 1 th kind C
>>> does not match with master or majority connection kind N
>>> 2013-12-19 20:35:16 ERROR: pid 4328: kind mismatch among backends.
>>> Possible last query was: "ROLLBACK" kind details are: 0[N: there is no
>>> transaction in progress] 1[C]
>>> 2013-12-19 20:35:16 LOG:   pid 4328: do_child: exits with status 1 due
>>> to error
>>>
>>> And here is the stack trace:
>>> 2013-12-19 20:35:25,080 - application.flaskapp - ERROR - kind mismatch
>>> among backends. Possible last query was: "ROLLBACK" kind details are:
>>> 0[N: there is no transaction in progress] 1[C]
>>> HINT:  check data consistency among db nodes
>>> ERROR:  kind mismatch among backends. Possible last query was:
>>> "ROLLBACK" kind details are: 0[N: there is no transaction in progress] 1[C]
>>> HINT:  check data consistency among db nodes
>>> Traceback (most recent call last):
>>>   File
>>> "/usr/lib/python2.6/site-packages/application/server/flaskapp.py", line
>>> 34, in __init__
>>>     self._init_db()
>>>   File "/usr/lib/python2.6/site-packages/application/server/orgtool.py",
>>> line 47, in _init_db
>>>     self.db = ApplicationMutableDb(self.config)
>>>   File "/usr/lib/python2.6/site-packages/application/server/db.py", line
>>> 1014, in __init__
>>>     ApplicationAppendDb.__init__(self, *args, **kwargs)
>>>   File "/usr/lib/python2.6/site-packages/application/server/db.py", line
>>> 937, in __init__
>>>     ApplicationQueryDb.__init__(self, *args, **kwargs)
>>>   File "/usr/lib/python2.6/site-packages/application/server/db.py", line
>>> 234, in __init__
>>>     BaseDb.__init__(self, *args, **kwargs)
>>>   File "/usr/lib/python2.6/site-packages/application/server/db.py", line
>>> 192, in __init__
>>>     self.db_conn()
>>>   File "/usr/lib/python2.6/site-packages/application/server/db.py", line
>>> 216, in db_conn
>>>     self.conn = self.engine.connect()
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/engine/base.py",
>>> line 2322, in connect
>>>     return self._connection_cls(self, **kwargs)
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/engine/base.py",
>>> line 872, in __init__
>>>     self.__connection = connection or engine.raw_connection()
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/engine/base.py",
>>> line 2408, in raw_connection
>>>     return self.pool.unique_connection()
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/pool.py", line 169,
>>> in unique_connection
>>>     return _ConnectionFairy(self).checkout()
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/pool.py", line 371,
>>> in __init__
>>>     rec = self._connection_record = pool._do_get()
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/pool.py", line 758,
>>> in _do_get
>>>     return self._create_connection()
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/pool.py", line 174,
>>> in _create_connection
>>>     return _ConnectionRecord(self)
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/pool.py", line 259,
>>> in __init__
>>>     pool.dispatch.first_connect.exec_once(self.connection, self)
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/event.py", line 265,
>>> in exec_once
>>>     self(*args, **kw)
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/event.py", line 274,
>>> in __call__
>>>     fn(*args, **kw)
>>>   File
>>> "/usr/lib/python2.6/site-packages/sqlalchemy/engine/strategies.py", line
>>> 166, in first_connect
>>>     dialect.initialize(c)
>>>   File
>>> "/usr/lib/python2.6/site-packages/sqlalchemy/dialects/postgresql/base.py",
>>> line 977, in initialize
>>>     super(PGDialect, self).initialize(connection)
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/engine/default.py",
>>> line 187, in initialize
>>>     self.do_rollback(connection.connection)
>>>   File "/usr/lib/python2.6/site-packages/sqlalchemy/engine/default.py",
>>> line 299, in do_rollback
>>>     connection.rollback()
>>> InternalError: kind mismatch among backends. Possible last query was:
>>> "ROLLBACK" kind details are: 0[N: there is no transaction in progress] 1[C]
>>> HINT:  check data consistency among db nodes
>>> ERROR:  kind mismatch among backends. Possible last query was:
>>> "ROLLBACK" kind details are: 0[N: there is no transaction in progress] 1[C]
>>> HINT:  check data consistency among db nodes
>>>
>>> 2013-12-19 20:35:25,085 - application - ERROR - Action failed: 500:
>>> Internal Server Error
>>> Traceback (most recent call last):
>>>   File "./orgtool", line 65, in main
>>>     org = self._get_orgtool(options.confbase)
>>>   File "./orgtool", line 19, in _get_orgtool
>>>     org = orgtool.OrgTool(confbase)
>>>   File "/usr/lib/python2.6/site-packages/application/server/orgtool.py",
>>> line 41, in __init__
>>>     BaseServer.__init__(self, environ, os.path.join(confbase,
>>> ".application"))
>>>   File
>>> "/usr/lib/python2.6/site-packages/application/server/flaskapp.py", line
>>> 37, in __init__
>>>     raise wzex.InternalServerError("DATABASE CONNECTION FAILED")
>>>
>>> On 19/12/13 17:28, Tatsuo Ishii wrote:
>>>>> Hi.
>>>>>
>>>>> It wasn't installed, but I installed it and had the same result.
>>>> Can you show the PostgreSQL around "ROLLBACK"?
>>>>
>>>>> By the way, to install it I just need to execute this right?
>>>>> CREATE EXTENSION pgpool_regclass;
>>>> Yes.
>>>>
>>>> Best regards,
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>> English: http://www.sraoss.co.jp/index_en.php
>>>> Japanese: http://www.sraoss.co.jp
>>>>
>>>>> On 18/12/13 18:17, Tatsuo Ishii wrote:
>>>>>> Have you installed pgpool_regclass?
>>>>>>
>>>>>> Best regards,
>>>>>> --
>>>>>> Tatsuo Ishii
>>>>>> SRA OSS, Inc. Japan
>>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>>> Japanese: http://www.sraoss.co.jp
>>>>>>
>>>>>>> Hi everyone.
>>>>>>>
>>>>>>> I'm having an issue with pgpool.
>>>>>>> First of all my configuration:
>>>>>>> 1 Server with pgpool-II version 3.3.2 (tokakiboshi)
>>>>>>> 3 Servers with:
>>>>>>> postgresql93-contrib-9.3.2-1PGDG.rhel6.x86_64
>>>>>>> postgresql93-devel-9.3.2-1PGDG.rhel6.x86_64
>>>>>>> postgresql93-libs-9.3.2-1PGDG.rhel6.x86_64
>>>>>>> postgresql93-9.3.2-1PGDG.rhel6.x86_64
>>>>>>> postgresql93-server-9.3.2-1PGDG.rhel6.x86_64
>>>>>>>
>>>>>>> All 4 servers are RHEL 6 machines. (Amazon AMI's actually)
>>>>>>>
>>>>>>> Pgpool is configured to have Master/Slave Nodes with replication
>>>>>>> and Load Balancing.
>>>>>>>
>>>>>>> #------------------------------------------------------------------------------
>>>>>>> # LOAD BALANCING MODE
>>>>>>> #------------------------------------------------------------------------------
>>>>>>>
>>>>>>> load_balance_mode = on
>>>>>>>                                    # Activate load balancing mode
>>>>>>>                                    # (change requires restart)
>>>>>>> ignore_leading_white_space = on
>>>>>>>                                    # Ignore leading white spaces of each
>>>>>>> query
>>>>>>> white_function_list = ''
>>>>>>>                                    # Comma separated list of function names
>>>>>>>                                    # that don't write to database
>>>>>>>                                    # Regexp are accepted
>>>>>>> black_function_list = 'nextval,setval'
>>>>>>>                                    # Comma separated list of function names
>>>>>>>                                    # that write to database
>>>>>>>                                    # Regexp are accepted
>>>>>>>
>>>>>>>
>>>>>>> #------------------------------------------------------------------------------
>>>>>>> # MASTER/SLAVE MODE
>>>>>>> #------------------------------------------------------------------------------
>>>>>>>
>>>>>>> master_slave_mode = on
>>>>>>>                                    # Activate master/slave mode
>>>>>>>                                    # (change requires restart)
>>>>>>> master_slave_sub_mode = 'stream'
>>>>>>>                                    # Master/slave sub mode
>>>>>>>                                    # Valid values are combinations slony or
>>>>>>>                                    # stream. Default is slony.
>>>>>>>                                    # (change requires restart)
>>>>>>>
>>>>>>> # - Streaming -
>>>>>>>
>>>>>>> sr_check_period = 1
>>>>>>>                                    # Streaming replication check period
>>>>>>>                                    # Disabled (0) by default
>>>>>>> sr_check_user = 'postgres'
>>>>>>>                                    # Streaming replication check user
>>>>>>>                                    # This is necessary even if you disable
>>>>>>>                                    # streaming replication delay check with
>>>>>>>                                    # sr_check_period = 0
>>>>>>> sr_check_password = ''
>>>>>>>                                    # Password for streaming replication
>>>>>>> check user
>>>>>>> delay_threshold = 1
>>>>>>>                                    # Threshold before not dispatching
>>>>>>> query to standby node
>>>>>>>                                    # Unit is in bytes
>>>>>>>                                    # Disabled (0) by default
>>>>>>>
>>>>>>> # - Special commands -
>>>>>>>
>>>>>>> follow_master_command = '/bin/follow_master.sh %d %h %D %m %H %R'
>>>>>>>                                    # Executes this command after master
>>>>>>> failover
>>>>>>>                                    # Special values:
>>>>>>>                                    #   %d = node id
>>>>>>>                                    #   %h = host name
>>>>>>>                                    #   %p = port number
>>>>>>>                                    #   %D = database cluster path
>>>>>>>                                    #   %m = new master node id
>>>>>>>                                    #   %H = hostname of the new master node
>>>>>>>                                    #   %M = old master node id
>>>>>>>                                    #   %P = old primary node id
>>>>>>>                                    #   %r = new master port number
>>>>>>>                                    #   %R = new master database cluster path
>>>>>>>                                    #   %% = '%' character
>>>>>>>
>>>>>>>
>>>>>>> #All three nodes are configured like this
>>>>>>> backend_hostnamen = 'ip_node_n'
>>>>>>> backend_portn = 5432
>>>>>>> backend_weightn = 0
>>>>>>> backend_data_directoryn = '/var/lib/pgsql/9.3/data/'
>>>>>>> backend_flagn = 'ALLOW_TO_FAILOVER'
>>>>>>>
>>>>>>> I'm using repmgr to handle the replication in the nodes.
>>>>>>>
>>>>>>>
>>>>>>> My problem occurs when I kill the master node, everything works as
>>>>>>> expected and a slave takes it's place.
>>>>>>> My application continues working with no major issues until I decide to
>>>>>>> recover the dead master node.
>>>>>>>
>>>>>>> After that pgool starts sending this message:
>>>>>>> 2013-12-18 17:19:07 LOG:   pid 30801: do_child: exits with status 1 due
>>>>>>> to error
>>>>>>> 2013-12-18 17:19:10 LOG:   pid 30802: pool_send_and_wait: Error or
>>>>>>> notice message from backend: : DB node id: 0 backend pid: 27585
>>>>>>> statement: ROLLBACK message: there is no transaction in progress
>>>>>>> 2013-12-18 17:19:10 ERROR: pid 30802: read_kind_from_backend: 1 th kind
>>>>>>> C does not match with master or majority connection kind N
>>>>>>> 2013-12-18 17:19:10 ERROR: pid 30802: kind mismatch among backends.
>>>>>>> Possible last query was: "ROLLBACK" kind details are: 0[N: there is no
>>>>>>> transaction in progress] 1[C]
>>>>>>>
>>>>>>> And I identified that the problem is that pgpool is sending a rollback
>>>>>>> to the old master node [node 0] (Something illegal as far as I know) and
>>>>>>> then all the transaction is cancelled and my application fails.
>>>>>>> If I promote the old master again, then the application works with no
>>>>>>> problem.
>>>>>>>
>>>>>>> Anyone has an idea why is this ocurring?
>>>>>>>
>>>>>>> Regards!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>>
>>>>>>>
>>>>>>> Be aware that if you reply directly to this message, your reply may not be 
>>>>>>> secure. Do not send us communications that contain unencrypted confidential 
>>>>>>> information such as passwords, account numbers, or Social Security numbers.
>>>>>>>
>>>>>>> This message, including any attachments, is for the sole use of the 
>>>>>>> intended recipient(s) and may contain confidential and privileged 
>>>>>>> information. If you are not the intended recipient, please destroy all 
>>>>>>> copies of this message and any attachments. In addition, please notify 
>>>>>>> Gazzang immediately by email to info at gazzang.com.
>>>>>>> _______________________________________________
>>>>>>> pgpool-general mailing list
>>>>>>> pgpool-general at pgpool.net
>>>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>>> -- 
>>>>>
>>>>>
>>>>> Be aware that if you reply directly to this message, your reply may not be 
>>>>> secure. Do not send us communications that contain unencrypted confidential 
>>>>> information such as passwords, account numbers, or Social Security numbers.
>>>>>
>>>>> This message, including any attachments, is for the sole use of the 
>>>>> intended recipient(s) and may contain confidential and privileged 
>>>>> information. If you are not the intended recipient, please destroy all 
>>>>> copies of this message and any attachments. In addition, please notify 
>>>>> Gazzang immediately by email to info at gazzang.com.
>>>
>>> -- 
>>>
>>>
>>> Be aware that if you reply directly to this message, your reply may not be 
>>> secure. Do not send us communications that contain unencrypted confidential 
>>> information such as passwords, account numbers, or Social Security numbers.
>>>
>>> This message, including any attachments, is for the sole use of the 
>>> intended recipient(s) and may contain confidential and privileged 
>>> information. If you are not the intended recipient, please destroy all 
>>> copies of this message and any attachments. In addition, please notify 
>>> Gazzang immediately by email to info at gazzang.com.
> 
> 
> -- 
> 
> 
> Be aware that if you reply directly to this message, your reply may not be 
> secure. Do not send us communications that contain unencrypted confidential 
> information such as passwords, account numbers, or Social Security numbers.
> 
> This message, including any attachments, is for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient, please destroy all 
> copies of this message and any attachments. In addition, please notify 
> Gazzang immediately by email to info at gazzang.com.


More information about the pgpool-general mailing list