2007-11-15 ---------- Summary: - Bug fix for seriesname - Learning bits - Data staging - Doxygen: a src code documentation generator tool Bug fix for seriesname ====================== Thanks for Art for fixing the bug for case-insensitive seriesname last week. I fixed a few more places for drms_create_series() and drms_delete_series(). I also Changed all SQL queries on seriesnames from '=' to '~~*'. Learning bits ============= Where do things get piled up? - At SUMS: SUMS requests from the universe talk to a single SUMS server. Synchronous RPCs, server needs to multi-task to server multiple clients. - At DRMS: head-of-line blocking SUMS requests from all modules connecting to the same drms_server is queued in a single inbox for SUMS. When the request at the head of the queue is being served, the rest of the requests in the queue is blocked. Things I learned about SUMS: - The SUMS object parameter to SUM_poll() does not have any effect. In other words, SUM_poll() is whole system wide, it's not specific to a given SUMS object. - Once we abandon the strict SUM_get()/SUM_wait() pair paradigm for the case we don't want to wait, we can no longer use SUM_wait() for this particular SUMS object. Thankfully SUM_get() and SUM_wait() are specific to a given SUMS object. Concerns for SUMS's lack of support for concurrent users - Previous performance tests showed significant SUMS performance degradation with a large number of drms_server processes (~100) - Learned about multi-user support in RPC server. Found promising example in John Bloomer's book "Power programming with RPC". ---------------------------------------------------------------------- Data staging ============ Should not use "show_keys -p" to stage large amount of data - show_keys runs as a module: starts a transaction, closes it upon exit. - long running transaction for no good reason: DRMS is only used to gather sunums, should exit DRMS as soon as possbile. Request bundling - Feasible places to bundle up SUM_get() requests: currently one SU per request. . This requires a different function entry point. - SUM_alloc() trickier because slots are involved. ---------------------------------------------------------------------- Current "blocking" logic in src code: drms_storageunit.c 1. add request to inbox of sums_thread 2. wait to remove reply from outbox of sum_thread * "don't wait" translates to skip this step * sums thread will block if data is offline * potential drms_server exit problem: missing SUM_close() drms_server.c - sums_thread issues SUM_get() and SUM_wait() . SUM_wait() blocks if data is offline . SUM_get() and SUM_wait() used in pairs, no confusion. ---------------------------------------------------------------------- Implementation plan for non-blocking logic: data structure - one hash container for the SUMS object pool - each SUMS object in the pool contains: . sums object handle: . idle: idle or waiting for reply from SUMS . tag: if a sums object has an outstanding request, this marks the drms server thread id. Create second sums thread to poll periodically to get reply from SUMS and put it into the reply queue. These two sums threads will both modify data structure such as the status, whether idle or not, of a sums object. The first or existing sums thread will specialize in taking requests out of the inbox and send them to SUMS without blocking. ---------------------------------------------------------------------- Doxygen ======= Source code documentation generator tool http://www.stack.nl/~dimitri/doxygen/index.html "It can generate an on-line documentation browser (in HTML) and/or an off-line reference manual (in LaTeX) from a set of documented source files. There is also support for generating output in RTF (MS-Word), PostScript, hyperlinked PDF, compressed HTML, and Unix man pages. The documentation is extracted directly from the sources, which makes it much easier to keep the documentation consistent with the source code." "You can configure doxygen to extract the code structure from undocumented source files. This is very useful to quickly find your way in large source distributions. You can also visualize the relations between the various elements by means of include dependency graphs, inheritance diagrams, and collaboration diagrams, which are all generated automatically."