View Issue Details

IDProjectCategoryView StatusLast Update
0000615Pgpool-IIBugpublic2020-08-24 09:23
Reportersvelic Assigned Tot-ishii  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionopen 
OSUbuntuOS Version18.04 
Product Version4.1.2 
Target Version4.1.3Fixed in Version4.1.3 
Summary0000615: Issue with replication type of connection from client side
DescriptionI got an error, when I create logical replication part subscription on other postgres(pgdb1.labpgtest.itdep.org in this case):
---------------[psql session on pgdb1.labpgtest.itdep.org]
testpgpooldb=# SELECT pglogical.create_subscription(
testpgpooldb(# subscription_name := 'subscription_labpgall_testpgpooldb',
testpgpooldb(# provider_dsn := 'host=labpgall.itdep.org port=15432 user=replica_logical password=replica_logical_super_password dbname=testpgpooldb sslmode=require',
testpgpooldb(# synchronize_structure := true,
testpgpooldb(# synchronize_data := true
testpgpooldb(# );
ERROR: could not connect to the postgresql server in replication mode: FATAL: no PostgreSQL user name specified in startup packet

DETAIL: dsn was: host=labpgall.itdep.org port=15432 user=replica_logical password=replica_logical_super_password dbname=testpgpooldb sslmode=require
----------------
-----[pgpool log, debug]
...
May 29 09:37:07 pgp1 pgpool-11-lab_pg_all[662000]: [290-2] [CHILD] [662000-332] subscription_labpgall_testpgpooldb_create replica_logical@testpgpooldb: LOCATION: pool_proto_modules.c:2514
...
May 29 09:37:07 pgp1 pgpool-11-lab_pg_all[661978]: [219-1] [MAIN] [661978-454] [No Connection] [No Connection]@[No Connection]: WARNING: child process with pid: 662000 was terminated by segmentation fault
...
------------------------

I use "pglogical" extention from 2ndquadrant for logical replication .

There are no errors if logical replication configured between postgresses
pgdb1.labpgtest.itdep.org -> pgdb1.labpgall.itdep.org:15431

Some questions:
- Can the pgpool accept replication type of connection from client side?
- Do we need some additional or special configuration in case of replication client connection?
Steps To Reproduce- Create pgpool cluster
- Create postgres master or with postgres slaves and stream replication. With logical replication in postgres config. Create pgplogical extention in some test db.
- Create independent postgres with test db
- Add logical replication nodes configuration on postgres master and independent postgres
- Try to add logical replication subscription on independent postgres side, and got error.
Additional Information## My lab Postgres HA cluster structure:
All postgreses version 11
All pgpool in cluster latest version 4.1.2

## Balancer cluster:
3 NO pgp2.labpgall.itdep.org:15432 Linux pgp2 pgp2.labpgall.itdep.org
pgp1.labpgall.itdep.org:15432 Linux pgp1 pgp1.labpgall.itdep.org 15432 9000 7 STANDBY
pgp2.labpgall.itdep.org:15432 Linux pgp2 pgp2.labpgall.itdep.org 15432 9000 4 MASTER
pgp3.labpgall.itdep.org:15432 Linux pgp3 pgp3.labpgall.itdep.org 15432 9000 7 STANDBY

## Backends:
postgres@pgp1:~$ psql -h /var/run/postgresql -p 15432 -U pgpool -c 'SHOW pool_nodes;'
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+--------------------------+-------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0 | pgdb1.labpgall.itdep.org | 15431 | up | 0.333333 | primary | 0 | true | 0 | | | 2020-05-29 09:43:25
 1 | pgdb2.labpgall.itdep.org | 15431 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2020-05-29 09:43:25
 2 | pgdb3.labpgall.itdep.org | 15431 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2020-05-29 09:43:25
(3 rows)

-----[pool_hba.conf: pgp1.labpgall.itdep.org & other pgpools]
hostssl replication replica_logical pgdb1.labpgtest.itdep.org md5 #for logical replication from labpgtest
hostssl all replica_logical pgdb1.labpgtest.itdep.org md5 #for logical replication from labpgte

-----[pg_hba.conf: pgdb1.labpgall.itdep.org]
hostssl all,replication replica_logical pgp1.labpgall.itdep.org md5 #for logical replication from labpgall
hostssl all,replication replica_logical pgp2.labpgall.itdep.org md5 #for logical replication from labpgall
hostssl all,replication replica_logical pgp3.labpgall.itdep.org md5 #for logical replication from labpgall
hostssl all,replication replica_logical pgp1.labpgall.itdep.org md5 #for logical replication from labpgtest
hostssl all,replication replica_logical pgp2.labpgall.itdep.org md5 #for logical replication from labpgtest
hostssl all,replication replica_logical pgp3.labpgall.itdep.org md5 #for logical replication from labpgtest
-----------------------

Feel free to contact me, If You need additional information or tests.
TagsNo tags attached.

Activities

svelic

2020-05-29 19:55

reporter  

pgpool.conf (7,743 bytes)
pgpool_log_info.txt (4,045 bytes)   
pgpool_log_info.txt (4,045 bytes)   
pgpool_log_debug.txt (85,725 bytes)   
pgpool_log_debug.txt (85,725 bytes)   

t-ishii

2020-06-01 10:15

developer   ~0003373

> - Can the pgpool accept replication type of connection from client side?
Not sure. In my understanding basically replication protocol is based on existing frontend/backend protocol. So theoretically Pgpool-II should be able to deal wit the replication protocol.. but in reality software could go wrong... In your case pgpool segfaulted. Can you please share the back trace?

svelic

2020-06-01 19:31

reporter   ~0003374

Thank's for reply!
I think, It was a one of the child process, which accept replication connection.
Can you help me with a commands get back trace?

t-ishii

2020-06-01 21:20

developer   ~0003375

First of all, you need to start pgpool taking core dump. You can do this by using ulimit command.

# ulimit -c unlimited

Start pgpool
# pgpool start

This should produce a file named "core" (maybe with pid added to the file name). Then start gdb:
# gdb pgpool "the core file name"
# bt full

Please share the output of the command.

svelic

2020-06-02 02:17

reporter   ~0003376

Sorry, for my easy english.

Thank you for clarification regarding replication type of client connection.

Please, find requested information in attached files(postgres log, pgpool log, modified pgpool.conf, gdb bt output).
I can attach full core file if it is needed.

I made a smaller installation of my test lab and did additional steps:
- remove 2 postgres slaves from pgpool configuration(pgdb1 & pgdb2);
- turn off watch dog and left only 1 pgpool node(pgpool.conf: use_watchdog = off) ;
- increase sr_check and health_check interval to 60sec.
- turn debug logs both on postgres(debug4) and pgpool(debug5) sides;
- install on pgpool side: gdb + pgpool2-dbgsym packages;

#### Used commands to run pgpool:
# su - postgres
$ ulimit -c unlimited
$ /usr/sbin/pgpool --dont-detach --discard-status --debug
## stop pgpool ##
$ mv core 01062020_core_pglogical
$ gdb pgpool "01062020_core_pglogical" ## out put in attached file
01062020_core_pglogical.txt (7,010 bytes)   
01062020_core_pglogical.txt (7,010 bytes)   
pgpool.conf.txt (7,958 bytes)   
pgpool.conf.txt (7,958 bytes)   
postgresql-11-lab_pg_all.log.txt (105,375 bytes)   
postgresql-11-lab_pg_all.log.txt (105,375 bytes)   

t-ishii

2020-06-02 16:12

developer   ~0003378

Thank you for the info. I need more info regarding the core file. Can yo please share the gdb output of followings?

p Req_info->master_node_id
p session->backend->slots[0]

svelic

2020-06-02 21:34

reporter   ~0003379

Yes, of course.
This is requested information:
(gdb) p Req_info->master_node_id
$1 = 0
(gdb) p session->backend->slots[0]
$2 = (POOL_CONNECTION_POOL_SLOT *) 0x0

t-ishii

2020-06-03 10:03

developer   ~0003380

Last edited: 2020-06-03 11:37

(gdb) p Req_info->master_node_id
$1 = 0
This is expected. master_node_id represents the first available PostgreSQL node number (0 origin).
(gdb) p session->backend->slots[0]
$2 = (POOL_CONNECTION_POOL_SLOT *) 0x0
This is not normal. The slots array keep pointers to memory which contains various information including connection socket to PostgreSQL backend. In this case Pgpool-II tried to extract application_name which is kept in the area too. Unfortunately the pointer was 0, which caused the segmentation fault error. Let me think why this could happen...

t-ishii

2020-06-03 11:37

developer   ~0003381

Ok, I think I found the cause of the problem. It was related to the following settings in pgpool.conf:
- connection_cache = off
- log_line_prefix = '[%P] [%p-%l] %a %u@%d: '
When connection_cache is off, pgpool resets connection slots (session->backend->slots[]) at the time when clients sends termination request to pgpool. Since log_min_messages is DEBUG5, pgpool wanted to emit this log message:
DEBUG: RESET ALL CONFIG VARIABLE
Unfortunately "%a" is included in the log_line_prefix parameter and %a needs to look into session->backend->slots[0], pgpool segfaults. Fix is, if session->backend->slots[] is NULL, do not try to fetch application name from the variable. Patch attached.

BTW, in the mean time can you please try to test pgpool without using %a in the log_line_prefix_parameter? If my theory is correct, you will not see the segfault.
elog.diff (762 bytes)   
elog.diff (762 bytes)   

svelic

2020-06-03 20:00

reporter   ~0003382

Excellent, thank you!
Segmentation fault is gone.
We often use this option(application) in log files and in load balancing.

t-ishii

2020-06-04 11:38

developer   ~0003383

Thank you for the confirmation. I have committed the patch to all necessary branches (4.1 or before).

t-ishii

2020-06-04 11:40

developer   ~0003384

Also I am interested in whether logical replication works with pgpool. Please let me know if you succeed or not.

svelic

2020-06-09 20:28

reporter   ~0003388

No. Replication through pgpool did not work, an error the same as described in description.

t-ishii

2020-06-26 21:12

developer   ~0003425

I found a bug with Pgpool-II which could cause:
ERROR: could not connect to the postgresql server in replication mode: FATAL: no PostgreSQL user name specified in startup packet
Please try attached patch.
user-missing.diff (392 bytes)   
user-missing.diff (392 bytes)   

svelic

2020-06-26 22:44

reporter   ~0003426

Thank you!
I can try patch after 2 weeks.

t-ishii

2020-06-29 09:03

developer   ~0003427

Ok, I'm looking forward to hearing from you 2 weeks after.

svelic

2020-07-21 19:38

reporter   ~0003437

Hi!
pgpool2 recompiled with recommended patch and try new binaries.

Confirmed, the previous error is gone.
I mean: "ERROR: could not connect to the postgresql server in replication mode: FATAL: no PostgreSQL user name specified in startup packet"

Logical replication was started.

Thank you very much!
Issue can be closed.

t-ishii

2020-07-21 20:11

developer   ~0003438

Thank you for the feedback and I am glad to hear that!
I am going to close the issue.

Issue History

Date Modified Username Field Change
2020-05-29 19:55 svelic New Issue
2020-05-29 19:55 svelic File Added: pgpool.conf
2020-05-29 19:55 svelic File Added: pgpool_log_info.txt
2020-05-29 19:55 svelic File Added: pgpool_log_debug.txt
2020-06-01 10:09 t-ishii Assigned To => t-ishii
2020-06-01 10:09 t-ishii Status new => assigned
2020-06-01 10:15 t-ishii Note Added: 0003373
2020-06-01 10:15 t-ishii Status assigned => feedback
2020-06-01 19:31 svelic Note Added: 0003374
2020-06-01 19:31 svelic Status feedback => assigned
2020-06-01 21:20 t-ishii Note Added: 0003375
2020-06-01 21:21 t-ishii Status assigned => feedback
2020-06-02 02:17 svelic File Added: 01062020_core_pglogical.txt
2020-06-02 02:17 svelic File Added: pgpool-11-lab_pg_all.log.txt
2020-06-02 02:17 svelic File Added: pgpool.conf.txt
2020-06-02 02:17 svelic File Added: postgresql-11-lab_pg_all.log.txt
2020-06-02 02:17 svelic Note Added: 0003376
2020-06-02 02:17 svelic Status feedback => assigned
2020-06-02 16:12 t-ishii Note Added: 0003378
2020-06-02 16:12 t-ishii Status assigned => feedback
2020-06-02 21:34 svelic Note Added: 0003379
2020-06-02 21:34 svelic Status feedback => assigned
2020-06-03 10:03 t-ishii Note Added: 0003380
2020-06-03 11:37 t-ishii File Added: elog.diff
2020-06-03 11:37 t-ishii Note Added: 0003381
2020-06-03 11:37 t-ishii Note Edited: 0003380
2020-06-03 11:38 t-ishii Status assigned => feedback
2020-06-03 20:00 svelic Note Added: 0003382
2020-06-03 20:00 svelic Status feedback => assigned
2020-06-04 11:38 t-ishii Note Added: 0003383
2020-06-04 11:39 t-ishii Target Version => 4.1.3
2020-06-04 11:40 t-ishii Note Added: 0003384
2020-06-04 11:40 t-ishii Status assigned => feedback
2020-06-09 20:28 svelic Note Added: 0003388
2020-06-09 20:28 svelic Status feedback => assigned
2020-06-26 21:12 t-ishii File Added: user-missing.diff
2020-06-26 21:12 t-ishii Note Added: 0003425
2020-06-26 21:12 t-ishii Status assigned => feedback
2020-06-26 22:44 svelic Note Added: 0003426
2020-06-26 22:44 svelic Status feedback => assigned
2020-06-29 09:02 t-ishii Status assigned => feedback
2020-06-29 09:03 t-ishii Note Added: 0003427
2020-07-21 19:38 svelic Note Added: 0003437
2020-07-21 19:38 svelic Status feedback => assigned
2020-07-21 20:11 t-ishii Note Added: 0003438
2020-07-21 20:12 t-ishii Status assigned => resolved
2020-08-24 09:23 administrator Status resolved => closed
2020-08-24 09:23 administrator Fixed in Version => 4.1.3