01/24/08 Summary - Progress on remote SUMS - DRMS/SUMS garbage collection - Code cleanup - Proposal to use bool type - Eclipse IDE for C/C++ Progress on remote SUMS ----------------------- Notes from last week's discussion: - upper 16 bits of sunum will be used to identify a SUMS site - need to reserve bit patterns for SUMS-less case. I don't understand why we need it. - remote SUMS runs with sum_main_sequence starts at a different initial value. -> Since our sequence already started, Stanford SUMS has all 16 bits set to 0, and that will be our identifier. Wrote simple lookup function drms_su_local_sunum() to check alias table. The more difficult problem is how to ingest at the remote SUMS. Unless we are willing to build another translation for slotnum, it must be the same for the src SUMS and the remote SUMS for records that are replicated. We have two options to ingest such data into a remote SUMS: 1. ingest SU unit: original SU sunum - separate ingest route that calls SUM_alloc to get local SU sunum, then copy the ingest contents - trivial to maintain alias table 2. ingest fits(or any other format that can be handled by DRMS)files: It's non-trivial to establish connection between replicated DRMS records and any files to be ingested into a remote DRMS. Any suggestions? DRMS/SUMS garbage collection ---------------------------- Based on our discussion last week, I propose the following solutions: 1. For records that have temporary SU. When it's time for SUMS to reclaim the disk space, write the contents of Records.txt along with sunum to dead SU table, either per series/namespace or global. We will leave these segment-missing records as is for now. If at some point we decide to remove them, we'll know which sunum is invalid from the dead SU table. In fact, this is redundant information; a join between the series table and sum_main will give us all the valid sunums. The efficiency of such join, however, can't be good given a monster size series table (we already know sum_main will be of monster size). 2. For delete_series require an additional SUMS RPC call to remove all SUs (or information about such SUs) associated with a given seriesname before a given time. I moved away from Jim's original suggestions of DRMS sending SUMS a list of sunums to remove. This is because the list maybe too long to be efficient. It's not hard to imagine a series to be deleted containing thousands of records. What SUMS needs to do to clean up: find all ds_index that matches a given seriesname and was created before a given date (time when delete_series is issued). This way no confusion should arise when a future create_series creates a new series with the same seriesname, the timestamp will leave the new series alone. columns creat_date and owning_series contain the required information. Code cleanup ------------ Removed authentication between drms_server and client module. -lcrypto is no longer needed. -ldl is needed for dsds related code. found out the cause for warning messages for functions like getpeername(), connect() etc. They all have to do with either const or restrict qualifier. describe_series.c: rec_cnt changed from int to long long drms_segment.c: drms_segment_write_from_file to use drms_segment_filename(). In case of earlier failure (file writing failure), set seg->filename to '\0'. Proposal to use bool type -------------------------- I'd like to propose using bool type for many flag variables (0 or 1 value only, e.g., the verbose flag). C99 standard defines _Bool type. stdbool.h (Can someone point to me where this file is?) defines bool as an alias, as well as true and false. The size of bool is 1B, as opposed to 4B for int. Eclipse IDE for C/C++ --------------------- http://eclipse.org I installed it on my linux desktop, only tried it for 10 minutes, wish I had more time. Seems quite powerful. Really popular for java developer. It can do a lot of background tasks: syntax checking, build, etc. Tools available for code refactor.