11/29/2007

Summary:
- Clarification from last week's discussion
- Another SUMS performance test
- Non-blocking SUMS request queue in DRMS
- Reformatting code issue

Clarification
=============

Last week I brought up some concerns for SUMS's lack of support for
concurrent users. Jim and I had some discussions and understood each
other. What Jim was talking about is 'throughput', while I was
talking about 'latency'. 

Let's say it takes n unit of time to serve a request when there is no
other concurrent request present. The latency is the same as the
throughput in the case.

When we have 100 sequential requests, i.e., they come in precisely one
after the other, it takes 100 x n to serve all requests. The latency
for each request is n. When we have 100 concurrent requests, it takes
100 x n to serve all requests. Jim's test demonstrated that. This
throughput is same as when the requests come in a sequential manner.
The latency, however, is far from n for each request. Instead it's n,
2n, 3n, ... 100 x n.  From the perspective of individual SUMS users
(through DRMS), this mounts to a performance degradation even though
sums maintains maximum throughput. This is what I meant by "lack of
support for concurrent users". This has to do with the
serialization by the RPC server. A direct consequence of this is that
any random SUMS request can potentially delay production usage since
all requests go to the same SUMS server.

Phil has suggested to replicate some SUMS tables for web users.

Another SUMS performance test
=============================

Created a 'typical' usage pattern to test SUMS performance we request
a random chunk of SU's, they are not necessarily contiguous but
related. I simulate that by finding a random sunum, getting a chunk
next to it, and randomly selecting a subset out of that chunk.
The truly random case, hence worst case, will be random sunum's in a
request.

You can run this test program as follows:

on phil:
/home/karen/cvs/jsoc/bin/linux_ia32/sum_test

on n02:
/home/karen/cvs/jsoc/bin/linux_x86_64/sum_test

----------------------------------------------------------------------
I also did a follwoup DB timing test for selecting mutilple rows in a
single query using the "IN" syntex.

In these tests, no SUMS program is involved, and I used the same
sunum's generated by sum_test.c. Below is the timing figures for a
given number of ds_index values. The queries are of the form "select *
from sum_main where ds_index in (,,,)". This particular form does not
differ from the 'or' syntex in query plan.

No. of ds_index         speed
1                       1.152 ms
10                      1.790 ms
100                     6.534 ms
200                     9.942 ms
300                     14.925 ms
400                     19.001 ms
500                     23.033 ms

when more than one cond is used, e.g., ds_index = 1 or ds_index = 2,
etc. the query invokes bitmap index scan as opposed to straight index
scan when only one cond is used.

truly random
No. of ds_index         speed
1			1.183 ms
10			2.206 ms
100			8.909 ms
200			13.120 ms
300			19.408 ms
400			24.880 ms
500			28.345 ms


Non-blocking SUMS request queue in DRMS
=======================================

The problem: DRMS currently submit one request to SUMS and waits until
SUMS replies. While waiting, DRMS does not process any pending SUMS
requests. When we have multiple clients connecting to the same
drms_server, one client requesting some off-line data can block
another client accessing on-line data.

The solution: a non-blocking SUMS request queue in DRMS

I started off with the design of second sums thread to poll
periodically to get reply from SUMS and put it into the reply queue. I
got stuck before my second sum thread can't get poll results from
SUMS. Then I learned that RPC is thread safe, i.e., it provides each
thread with a different context. No wonder it did not work!

I also tried forking. Even though the child process got a copy of the
parents's everything and can poll successfully. I am not sure how to
clean up these two copies, two SUM_close()?

After these two unsuccessful trials, I settled with splitting time
between waiting on the incoming request queue and polling sums in the
current sums thread. The original queue design lets the sums thread
block until there is some sums request. To break out of the block, I
changed from pthread_cont_wait() to pthread_cont_timedwait(). Then I
discovered the man page for the latter is wrong. In the end, it all
got sorted out. And it worked!

However, this is only half of the problem.  The other half has to do
with locking on the server thread. Because these server threads
read/write shared data structures, SU cache, in this case, a mutex has
to be employed to ensure correct concurrent access. When we put the
locking before the request even enters the queue, we practically make
the queue of size 1, i.e., only one request can be processed at any
given time. That's what's going on in our current server thread. I
moved the locking to a lower level to allow multiple requests to enter
the queue. I finished with coding and am about to test it.

Reformatting code issue
=======================

I got some input from Art regarding reformatting code. 
Two general types of reformatting:
1. adding whitespaces and newlines, removing white spaces, collapsing
   lines, etc
2. rewriting code, e.g., removing curl brackets for if statement
   blocks.

Type 1 is mostly innocuous, althought there is room for accidently
introducing typos. Type 2 is extremely dangerous. It is VERY easy to
change the semantics accidentally when you change from one style to
the other. Here is an example provided by Art

"before"
if (!err)
{
 yada affecting err
 if (!err)
 {
   yada affecting err
   if (!err)
   {
     do something
   }
 }
}

"after"

if (!err)
{
 yada affecting err
}
if (!err)
{
 yada affecting err
}
if (!err)
{
 yada affecting err
}

This above reformatting completely breaks the code. And it will be
very hard to debug such problem in the future. Nothing can convince me
the rewritten code is to act the same way as the original without any
runtime tests.

Other less severe but still rather annoying problems with reformatting:
- It generates compilation errors. 
- It makes it hard to use cvs history information.
  I rely heavily on cvs diff to compare and merge code.
- It makes it harder to merge code. With useless changes scattered
  throughout the src code, it will take much more time to go
  through them to merge code.

I think it's time we stop reformatting, type 1 and type 2. I would
like to propose a policy for updating src code. An src update must be
one of the following:
- adding new features
- fixing bugs
- adding comments

Art's suggestion:
"We need to develop is a policy for changing others' code.  I'm
thinking that we try to keep the style of the person who "owns" the
code? I guess this means the person who first submitted the code into
cvs.  That means Rasmus for much of the code - I think we should keep
his style in his original code. Otherwise, we start mixing styles, and
then somebody is going to want to reformat to make it consistent."