View Issue Details

IDProjectCategoryView StatusLast Update
0000118Pgpool-IIBugpublic2015-02-26 07:55
ReporterndarshanAssigned Tot-ishii 
PriorityimmediateSeverityblockReproducibilityalways
Status assignedResolutionopen 
PlatformLinux OSFedora release OS Version17 Beefy Miracle
Product Version 
Target VersionFixed in Version 
Summary0000118: Pgpool does not connect with Tableau client with postgress driver.
DescriptionI am having the below setup

-> postgres server installed and running fine.

-> pgpool-II-3.3.3 installed and running on the same host as the potgres server.

-> in the pgpool.conf file make use to set “memory_cache_enabled = on” so that we hit the cache in the pgpool when we repeat same select query from client.

->installed the latest postgres odbc driver on windows 7 laptop, from this below pgpool install downlaod page.http://www.postgresql.org/ftp/odbc/versions/msi/

 -> Install the driver above downloaded postgres driver, then go to “controlpanel” -> ”administrative tools” -> “ODBC data sources (32-bit)” -> Then choose “Add” -> scroll all teh way down and choose “Postgre SQL ANSI” -> “Finish” -> IN the next page enter the values for database = “Postgress” “Username”” <user>, “port” = “9999” (default value of pgpool listener port). click “Test”. Will say “Connection Successful” -> “OK” -> “Save” save this setting.

-> Download and install and open Tableau software (on the same a windows 7) machine (you can down load the trial version from below and use it for 1 month).

http://www.tableausoftware.com/products/trial


  choose "Connect to data" -> under "On a server" -> all the way at the bottom click on "Other Databases (OBDC)" -> "DSN" click on the empty box and choose the driver you have installed. In my case I have given the default name as "PostgreSQL30-pure" -> Click on red button "Connect"

->Will get a pop up "select data source" -> chose your driver added "PostreSQL30" in my case -> click "OK"

-> will get another pop up "PostgreSQL connection"
 window, here make sure all the field are properly populated, like "Database"
 "server" , "UserNAme" and "Password". Then select "Ok"

-> Now it will try to connect to the pgpool and come back successful.

-> Now scroll all the way to the down and again choose the "Connect" button and the Tableau hangs forever.
->pg pool got a request as below and is stuck hanging for frontend to respond. But the frontend is hung waiting for pgpool to respond.

The query is

SELECT "COL" FROM (SELECT 1 AS "COL") AS "SUBQUERY"


Pgpool tracing is :

Oct 28 16:31:25 darshan pgpool[5009]: DB node id: 0 backend pid: 5402 statement: DEALLOCATE "_PLAN00A63090"
Oct 28 16:31:25 darshan pgpool[5009]: DB node id: 0 backend pid: 5402 statement: Parse: SELECT "COL"
Oct 28 16:31:25 darshan pgpool[5009]: DB node id: 0 backend pid: 5402 statement: D message
Oct 28 16:31:25 darshan pgpool[5009]: DB node id: 0 backend pid: 5402 statement: B message
Oct 28 16:31:25 darshan pgpool[5009]: query result fetched from cache. statement: SELECT "COL"


-> I have looked at the code in pool_proto_modules.c : Execute ( ) -> The current state of back end is "Bind" and frontend (in this case tableau with postgres driver) issues "Execute". So we are in that function "Execute()"

As part of Execute() pgpool it finds the results in its own cache and decides to server from its mem cache instead of going to backend. So the below code is where it has communicated to the frontend with the results from the cache is a probably buggy place. Because after that the pgpool skips serving the backend due to this pool_set_skip_reading_from_backends()"as in the below code.

Then goes and serves frontend to see the next query as "P" (meaning parse) waits on front end waiting front the data on the socket. But the frontend is also waiting on pgpool for something. So there is a miss match in the protocol when the data is found in cache.



                /* If the query is SELECT from table to cache, try to fetch cached result. */
                status = pool_fetch_from_memory_cache(frontend, backend, search_query, &foundp);

        if (status != POOL_CONTINUE)
                        return status;

                if (foundp)
                {
                        pool_ps_idle_display(backend);
                        pool_set_skip_reading_from_backends();
                        pool_stats_count_up_num_cache_hits();
                        pool_unset_query_in_progress();
                        return POOL_CONTINUE;
                }


How ever this scenario cannot be reproduced in the jdbc + java simple program I write. Where I do a parse-> bind -> execute -> close. First time there is no cache, but future run it gets from cache and return the data but doe snot run into that hang as in the case of tableau + postgres driver, also because the steps are different in the case of tableau frontend.
Steps To ReproduceConnect tableaus to pgpool with postgress server in backend.

I am having the below setup

-> postgres server installed and running fine.

-> pgpool-II-3.3.3 installed and running on the same host as the potgres server.

-> in the pgpool.conf file make use to set “memory_cache_enabled = on” so that we hit the cache in the pgpool when we repeat same select query from client.

->installed the latest postgres odbc driver on windows 7 laptop, from this below pgpool install downlaod page.http://www.postgresql.org/ftp/odbc/versions/msi/

 -> Install the driver above downloaded postgres driver, then go to “controlpanel” -> ”administrative tools” -> “ODBC data sources (32-bit)” -> Then choose “Add” -> scroll all teh way down and choose “Postgre SQL ANSI” -> “Finish” -> IN the next page enter the values for database = “Postgress” “Username”” <user>, “port” = “9999” (default value of pgpool listener port). click “Test”. Will say “Connection Successful” -> “OK” -> “Save” save this setting.

-> Download and install and open Tableau software (on the same a windows 7) machine (you can down load the trial version from below and use it for 1 month).

http://www.tableausoftware.com/products/trial


  choose "Connect to data" -> under "On a server" -> all the way at the bottom click on "Other Databases (OBDC)" -> "DSN" click on the empty box and choose the driver you have installed. In my case I have given the default name as "PostgreSQL30-pure" -> Click on red button "Connect"

->Will get a pop up "select data source" -> chose your driver added "PostreSQL30" in my case -> click "OK"

-> will get another pop up "PostgreSQL connection"
 window, here make sure all the field are properly populated, like "Database"
 "server" , "UserNAme" and "Password". Then select "Ok"

-> Now it will try to connect to the pgpool and come back successful.

-> Now scroll all the way to the down and again choose the "Connect" button and the Tableau hangs forever.
->pg pool got a request as below and is stuck hanging for frontend to respond. But the frontend is hung waiting for pgpool to respond.

The query is

SELECT "COL" FROM (SELECT 1 AS "COL") AS "SUBQUERY"


Pgpool tracing is :

Oct 28 16:31:25 darshan pgpool[5009]: DB node id: 0 backend pid: 5402 statement: DEALLOCATE "_PLAN00A63090"
Oct 28 16:31:25 darshan pgpool[5009]: DB node id: 0 backend pid: 5402 statement: Parse: SELECT "COL"
Oct 28 16:31:25 darshan pgpool[5009]: DB node id: 0 backend pid: 5402 statement: D message
Oct 28 16:31:25 darshan pgpool[5009]: DB node id: 0 backend pid: 5402 statement: B message
Oct 28 16:31:25 darshan pgpool[5009]: query result fetched from cache. statement: SELECT "COL"


-> I have looked at the code in pool_proto_modules.c : Execute ( ) -> The current state of back end is "Bind" and frontend (in this case tableau with postgres driver) issues "Execute". So we are in that function "Execute()"

As part of Execute() pgpool it finds the results in its own cache and decides to server from its mem cache instead of going to backend. So the below code is where it has communicated to the frontend with the results from the cache is a probably buggy place. Because after that the pgpool skips serving the backend due to this pool_set_skip_reading_from_backends()"as in the below code.

Then goes and serves frontend to see the next query as "P" (meaning parse) waits on front end waiting front the data on the socket. But the frontend is also waiting on pgpool for something. So there is a miss match in the protocol when the data is found in cache.



                /* If the query is SELECT from table to cache, try to fetch cached result. */
                status = pool_fetch_from_memory_cache(frontend, backend, search_query, &foundp);

        if (status != POOL_CONTINUE)
                        return status;

                if (foundp)
                {
                        pool_ps_idle_display(backend);
                        pool_set_skip_reading_from_backends();
                        pool_stats_count_up_num_cache_hits();
                        pool_unset_query_in_progress();
                        return POOL_CONTINUE;
                }


How ever this scenario cannot be reproduced in the jdbc + java simple program I write. Where I do a parse-> bind -> execute -> close. First time there is no cache, but future run it gets from cache and return the data but doe snot run into that hang as in the case of tableau + postgres driver, also because the steps are different in the case of tableau frontend.

TagsNo tags attached.

Activities

t-ishii

2014-11-06 08:21

developer   ~0000477

I have no idea why you cannot reproduce it with JDBC. Maybe you can find something special with tableau. I sugges to take packet monitoring by using wire shark or something (for no pgpool-II used case).

ndarshan

2014-11-07 06:02

reporter   ~0000480

Hi, Also I have seen another problem int he pgpool, where it runs into sigsegv. Due to below code in ReadyForQuery() when memory cache is enabled.

In the code below the problem is that "session_context->query_context..."
query_context could be released and hence it is NULL. when trying to dereference it we will run into sigsegv.


                /* Memory cache enabled? */
                        if (cache_commit && pool_config->memory_cache_enabled)
                        {
                                /* If we are doing extended query and the state is after EXECUTE,
                                 * then we can commit cache.
                                 * We check latter condition by looking at query_context->query_w_hex.
                                 * This check is necessary for certain frame work such as PHP PDO.
                                 * It sends Sync message right after PARSE and it produces
                                 * "Ready for query" message from backend.
                                 */
                                if (pool_is_doing_extended_query_message())
                                {
                                        if (session_context->query_context->query_state[MASTER_NODE_ID] == POOL_EXECUTE_COMPLETE)
                                        {
                                                pool_handle_query_cache(backend, session_context->query_context->query_w_hex, node, state)\
;
                                        free(session_context->query_context->query_w_hex);
                                                session_context->query_context->query_w_hex = NULL;
                                        }
                                }
                else

t-ishii

2014-11-07 09:12

developer   ~0000482

Good catch! Will fix.

ndarshan

2014-11-07 09:30

reporter   ~0000483

Hi t-ishii, thansk!

Regarding the original problem, I still feel strongly that there is something in the protocol and the hand shake with the frontend when the pgpool finds the result for the query in the memcache (not shm) and the frontend/client is served form the cache.

I see that after it responds to the front end with the query results found in cache, it goes back into the function ProcessFrontendResponse()->pool_read2() and waits on a date on the socket from the front end which is below code.

 if (MAJOR(backend) == PROTO_MAJOR_V3)
        {
         if (pool_read(frontend, &len, sizeof(len)) < 0)
                    return POOL_END;
            len = ntohl(len) - 4;
        if (len > 0)
                        bufp = pool_read2(frontend, len);
        }



Also in this scenario it said the backend to bind, but in the next step when it found the result in the cache it never responds to backend about to served results and to come out of bind, Which means I see that the postgress backend is hanging on the bind for ever. Then the pgpool process hanging on the read() from the front end as described above and the front end also hanging I believe because it did not get the results it wants. So there is mis communication from pgpool to backend when the query result is found in memcache and served. I strongly suspect the function pool_fetch_from_memory_cache(), specifically when it responds to the frontend with the results.

I would request you to kindly look at this part of logic to see if you can see any bug or anomaly.
 
On the side note:

Another reason why somebody other than me could not have seen this issues is because the sigsegv kills the process serving the query and the system does not progress any more.

I think memcache came into action only after pgpool-II v 3.3 and above and hence we did not have the problem when we were on version 3.1

thanks
Darshan

ndarshan

2014-11-07 09:36

reporter   ~0000484

BTW t-ishii,

Another strong reason I have to eliminate there is any problem in the client/frontend is that, in absence of pgpool the Tableau software can seamlessly connect to the Postgress backend server without the hang which I am facing in presence of the pgpool.

Hope this give you some useful info to see where the problem could be in code.

thanks
Darshan

t-ishii

2014-11-07 09:38

developer   ~0000485

Can I assume that if you connect to PostgreSQL using pgpool-II *without* memory cache, then your Tableau client works?

ndarshan

2014-11-07 09:44

reporter   ~0000486

Absolutely, I have tested that and it works like a charm.

All I had to do is in just turn this below parameter to "off" in the pgpool.conf

memory_cache_enabled = on

thanks
Darshan

t-ishii

2014-11-07 09:56

developer   ~0000487

Another way to track the problem is, restarting pgpool-II with -d (debug) option, grab logs while the Tableau client connects and does some query (probably very simple one like "select 1" is enough) *without* memory cache. With the log I may be able to get an idea how packets the client sending are different from JDBC.

ndarshan

2014-11-07 10:13

reporter   ~0000488

I can try to turn on debugging and show you the log file. Give me some time as I need to set up.

But just FYI the problem is that the initial connection establishment between Tableau and pgool/postgres server itself is running into this hang. Meaning, Tableau issues a set of queries as part of connecting succesfully and there itself it is running into this issues.

I will get back to you soon.

regard
Darshan

ndarshan

2014-11-19 12:34

reporter  

log.txt (294,036 bytes)

ndarshan

2014-11-19 12:47

reporter   ~0000494

Last edited: 2014-11-19 12:56

View 2 revisions

Hi T-ishii, sorry for not getting back earlier. Below I have the tracing and additional information as you asked for.

Below as you can see 11 is the one serving the frontend.


[darshan1@darshan config]$ ps -ef |grep pgpool
darshan1 1163 1 0 19:23 ? 00:00:00 pgpool -f config/pgpool.conf -c -d
darshan1 1165 1163 0 19:23 ? 00:00:00 pgpool: wait for connection request
darshan1 1166 1163 0 19:23 ? 00:00:00 pgpool: wait for connection request
darshan1 1167 1163 0 19:23 ? 00:00:00 pgpool: wait for connection request
darshan1 1168 1163 0 19:23 ? 00:00:00 pgpool: wait for connection request
darshan1 1169 1163 0 19:23 ? 00:00:00 pgpool: wait for connection request
darshan1 1170 1163 0 19:23 ? 00:00:00 pgpool: wait for connection request
darshan1 1171 1163 0 19:23 ? 00:00:00 pgpool: darshan1 kdsdb 10.8.0.18(17821) idle
darshan1 1172 1163 0 19:23 ? 00:00:00 pgpool: PCP: wait for connection request
darshan1 1173 1163 0 19:23 ? 00:00:00 pgpool: worker process


[darshan1@darshan config]$ strace -p 1171
Process 1171 attached
select(7, [6], NULL, [6], NULL


The call stack form GDB for 1171 look like this as I had described you earlier

(gdb) bt
#0 0x0000003512aea9d3 in __select_nocancel () from /lib64/libc.so.6
0000001 0x0000000000416591 in pool_check_fd (cp=cp@entry=0x1008040) at pool_process_query.c:951
0000002 0x0000000000416796 in pool_check_fd (cp=cp@entry=0x1008040) at pool_process_query.c:971
0000003 0x000000000041db62 in pool_read2 (cp=cp@entry=0x1008040, len=1599097870) at pool_stream.c:264
0000004 0x000000000044f1ff in ProcessFrontendResponse (frontend=frontend@entry=0x1008040, backend=backend@entry=0x1006080)
    at pool_proto_modules.c:2615
0000005 0x0000000000419f89 in pool_process_query (frontend=frontend@entry=0x1008040, backend=backend@entry=0x1006080,
    reset_request=reset_request@entry=0) at pool_process_query.c:232
0000006 0x000000000040bef8 in do_child (unix_fd=unix_fd@entry=4, inet_fd=inet_fd@entry=5) at child.c:374
0000007 0x000000000040729f in fork_a_child (unix_fd=4, inet_fd=5, id=6) at main.c:1238
0000008 0x0000000000405719 in main (argc=<optimized out>, argv=<optimized out>) at main.c:648
(gdb)

Also I have uploaded two files
1) "pgpool-log" file has the tracing/debug messages printed from pgpool server

2. "log" which is the log file from the tableau client (kindly look at the last line and match it with the "pgpool-log" file to see which query the client is hung on). As you can see below the last line in tableau "log" file there is"begin-query" but no matching "end-query", implies client is hanging on the query to return from pgpool.

"begin-query","v":{"protocol":"af1e0a8","query":"SELECT 1"}}

ndarshan

2014-11-19 12:47

reporter  

pgpool-log.pages (414,327 bytes)

ndarshan

2014-11-19 12:55

reporter   ~0000495

As I had described earlier a simple query protocol does not have this hang when there was a cache hit and server from memcache. For example if do "select 1;" after the first time the result is fetched form cache and the client is returned the result form cache (which is different code path though).

Also another useful tip is that in the pgpool.conf file if I turn off "memory_cache_enabled = off" then, I do not have this hang issue with the same tableau client, since pgpool does not go look in its memcache and hence goes to the backend to get the result, and returns the result it got from backend to tableau. Which implies the problem is not in the tableau but in pgpool some code path where the protocol of replying to the client is broken.

ndarshan

2015-02-24 09:14

reporter   ~0000523

Hi, I am following to check if there is any update on the fix for issue?

regards
Darshan

t-ishii

2015-02-26 07:55

developer   ~0000524

Can you please provide pgpool log with and without debug option as I said in the previous message? Also log of image file does not help too much. I fail to use standard editor function such as searching.

Issue History

Date Modified Username Field Change
2014-10-31 08:20 ndarshan New Issue
2014-11-06 08:21 t-ishii Note Added: 0000477
2014-11-06 08:22 t-ishii Assigned To => t-ishii
2014-11-06 08:22 t-ishii Status new => assigned
2014-11-07 06:02 ndarshan Note Added: 0000480
2014-11-07 09:12 t-ishii Note Added: 0000482
2014-11-07 09:30 ndarshan Note Added: 0000483
2014-11-07 09:36 ndarshan Note Added: 0000484
2014-11-07 09:38 t-ishii Note Added: 0000485
2014-11-07 09:44 ndarshan Note Added: 0000486
2014-11-07 09:56 t-ishii Note Added: 0000487
2014-11-07 10:13 ndarshan Note Added: 0000488
2014-11-19 12:34 ndarshan File Added: log.txt
2014-11-19 12:47 ndarshan Note Added: 0000494
2014-11-19 12:47 ndarshan File Added: pgpool-log.pages
2014-11-19 12:55 ndarshan Note Added: 0000495
2014-11-19 12:56 ndarshan Note Edited: 0000494 View Revisions
2015-02-24 09:14 ndarshan Note Added: 0000523
2015-02-26 07:55 t-ishii Note Added: 0000524