11/29/2007 Summary: - Clarification from last week's discussion - Another SUMS performance test - Non-blocking SUMS request queue in DRMS - Reformatting code issue Clarification ============= Last week I brought up some concerns for SUMS's lack of support for concurrent users. Jim and I had some discussions and understood each other. What Jim was talking about is 'throughput', while I was talking about 'latency'. Let's say it takes n unit of time to serve a request when there is no other concurrent request present. The latency is the same as the throughput in the case. When we have 100 sequential requests, i.e., they come in precisely one after the other, it takes 100 x n to serve all requests. The latency for each request is n. When we have 100 concurrent requests, it takes 100 x n to serve all requests. Jim's test demonstrated that. This throughput is same as when the requests come in a sequential manner. The latency, however, is far from n for each request. Instead it's n, 2n, 3n, ... 100 x n. From the perspective of individual SUMS users (through DRMS), this mounts to a performance degradation even though sums maintains maximum throughput. This is what I meant by "lack of support for concurrent users". This has to do with the serialization by the RPC server. A direct consequence of this is that any random SUMS request can potentially delay production usage since all requests go to the same SUMS server. Phil has suggested to replicate some SUMS tables for web users. Another SUMS performance test ============================= Created a 'typical' usage pattern to test SUMS performance we request a random chunk of SU's, they are not necessarily contiguous but related. I simulate that by finding a random sunum, getting a chunk next to it, and randomly selecting a subset out of that chunk. The truly random case, hence worst case, will be random sunum's in a request. You can run this test program as follows: on phil: /home/karen/cvs/jsoc/bin/linux_ia32/sum_test on n02: /home/karen/cvs/jsoc/bin/linux_x86_64/sum_test ---------------------------------------------------------------------- I also did a follwoup DB timing test for selecting mutilple rows in a single query using the "IN" syntex. In these tests, no SUMS program is involved, and I used the same sunum's generated by sum_test.c. Below is the timing figures for a given number of ds_index values. The queries are of the form "select * from sum_main where ds_index in (,,,)". This particular form does not differ from the 'or' syntex in query plan. No. of ds_index speed 1 1.152 ms 10 1.790 ms 100 6.534 ms 200 9.942 ms 300 14.925 ms 400 19.001 ms 500 23.033 ms when more than one cond is used, e.g., ds_index = 1 or ds_index = 2, etc. the query invokes bitmap index scan as opposed to straight index scan when only one cond is used. truly random No. of ds_index speed 1 1.183 ms 10 2.206 ms 100 8.909 ms 200 13.120 ms 300 19.408 ms 400 24.880 ms 500 28.345 ms Non-blocking SUMS request queue in DRMS ======================================= The problem: DRMS currently submit one request to SUMS and waits until SUMS replies. While waiting, DRMS does not process any pending SUMS requests. When we have multiple clients connecting to the same drms_server, one client requesting some off-line data can block another client accessing on-line data. The solution: a non-blocking SUMS request queue in DRMS I started off with the design of second sums thread to poll periodically to get reply from SUMS and put it into the reply queue. I got stuck before my second sum thread can't get poll results from SUMS. Then I learned that RPC is thread safe, i.e., it provides each thread with a different context. No wonder it did not work! I also tried forking. Even though the child process got a copy of the parents's everything and can poll successfully. I am not sure how to clean up these two copies, two SUM_close()? After these two unsuccessful trials, I settled with splitting time between waiting on the incoming request queue and polling sums in the current sums thread. The original queue design lets the sums thread block until there is some sums request. To break out of the block, I changed from pthread_cont_wait() to pthread_cont_timedwait(). Then I discovered the man page for the latter is wrong. In the end, it all got sorted out. And it worked! However, this is only half of the problem. The other half has to do with locking on the server thread. Because these server threads read/write shared data structures, SU cache, in this case, a mutex has to be employed to ensure correct concurrent access. When we put the locking before the request even enters the queue, we practically make the queue of size 1, i.e., only one request can be processed at any given time. That's what's going on in our current server thread. I moved the locking to a lower level to allow multiple requests to enter the queue. I finished with coding and am about to test it. Reformatting code issue ======================= I got some input from Art regarding reformatting code. Two general types of reformatting: 1. adding whitespaces and newlines, removing white spaces, collapsing lines, etc 2. rewriting code, e.g., removing curl brackets for if statement blocks. Type 1 is mostly innocuous, althought there is room for accidently introducing typos. Type 2 is extremely dangerous. It is VERY easy to change the semantics accidentally when you change from one style to the other. Here is an example provided by Art "before" if (!err) { yada affecting err if (!err) { yada affecting err if (!err) { do something } } } "after" if (!err) { yada affecting err } if (!err) { yada affecting err } if (!err) { yada affecting err } This above reformatting completely breaks the code. And it will be very hard to debug such problem in the future. Nothing can convince me the rewritten code is to act the same way as the original without any runtime tests. Other less severe but still rather annoying problems with reformatting: - It generates compilation errors. - It makes it hard to use cvs history information. I rely heavily on cvs diff to compare and merge code. - It makes it harder to merge code. With useless changes scattered throughout the src code, it will take much more time to go through them to merge code. I think it's time we stop reformatting, type 1 and type 2. I would like to propose a policy for updating src code. An src update must be one of the following: - adding new features - fixing bugs - adding comments Art's suggestion: "We need to develop is a policy for changing others' code. I'm thinking that we try to keep the style of the person who "owns" the code? I guess this means the person who first submitted the code into cvs. That means Rasmus for much of the code - I think we should keep his style in his original code. Otherwise, we start mixing styles, and then somebody is going to want to reformat to make it consistent."