10/11/2007 Summary: - drms_server problem: fails to exit properly. fixes and limitations - wiki page for vacuum related issue http://jsoc.stanford.edu/trac/wiki/DbMaintenanceTasks - minor fixes in DRMS . describe_series.c for series that has no records . jsoc_main.c to return a status . jsoc_main_sock.c, jsoc_main_sock_c.c - read topics on autovacuum, transaction id wrap-around, backup, etc - notes on why DRMS is case-sensitive on series names. A proposal to make it case-insensitive. Details: drms_server =========== Currently a drms_server process does not exit properly when a client module connecting to it aborts. Hence we have lingering open transactions that are not doing anything. A client module can run in three different modes: M1. direct-connect where no drms_server process is started M2. socket-connect where a drms_server process is started as child process M3. socket-connect where a drms_server process is running independently. A few exit scenarios: E1. a client module encounters an error in its processing, it sets the abort_flag. drms_server exits in this situation. This is already implemented. E2. a client module exit because it receives a signal, e.g., SIGINT or ^C. The server thread may be in any of these states: a. idle, waiting for command to come across the socket. b. waiting for SUMS to return This is currently under development. The goal of proper exit is two fold: - stop the client module promptly - stop the drms_server as soon as possible We can't do anything to SUMS, i.e., if a SU read request has been issued to SUMS, we can't recall it. I worked on all combinations of client module modes and exit scenarios M1 E2a: already exits cleanly, nothing to be done. M1 E2b: Fix: The sums thread is waiting for result, thus hold lock on the queue sum_outbox, this mutex can't be destroyed. drms_free_env() fails. My solution is to put in an artificial SUMS reply in the sum_outbox to satisfy the sums thread. An exact mirror problem happens when the sums thread is waiting for queue sum_inbox, an analogous solution suffices. Without logging, a client module can exit properly. With logging, drms_server_close_session() can not complete since the log SU needs to be committed, which can not happen until the current SUMS requests gets served. A client module can not exit properly until these two outstanding SUMS requests are served M2 E1a: drms_server, as a child process receives the same SIGINT signal as the parent (the client module). The client module as well as drms_server can exit properly. M2 E1b: The client module can exit properly. drms_server can't exit until the outstanding SUMS request is fulfilled. This is because drms_server_abort() can not proceed due to a lock on a shared variable. This lock happens if the command code is one of the following: DRMS_ROLLBACK, DRMS_COMMIT, DRMS_NEWSERIES, DRMS_DROPSERIES DRMS_NEWSLOTS, DRMS_SLOT_SETSTATE, DRMS_GETUNIT The last one, DRMS_GETUNIT, is for getting SUs from SUMS. M3 E2a: The client module can exit properly. A possible fix: catch signal SIGINT, SIGTERM, etc and hand them to drms_server. Difficulty don't know the pid of drms_server. M3 E2b: same as above An alternative: catch signal SIGINT, SIGTERM, etc, instead of passing the signal to drms_server, call drms_abort(). This proved to be a treacherous road. The blocking on drms_server_abort() remains. In addition, since server thread is busy waiting for SUMS, it does not read from the socket while the client module has sent a disconnect command and waited for an echo. Now we have the client module hanging as well. We can get rid of this by removing the echo, this would allow the client to exit immediately, while drms_server waits until the outstanding SUMS request to finish. ---------------------------------------------------------------------- DB backup ========= Found answer to a long standing question from Keh-Cheng: file system snap shot for backup. Quotes from postgresql manual: http://www.postgresql.org/docs/8.2/static/continuous-archiving.html We do not need a perfectly consistent backup as the starting point. Any internal inconsistency in the backup will be corrected by log replay (this is not significantly different from what happens during crash recovery). So we don't need file system snapshot capability, just tar or a similar archiving tool. ---------------------------------------------------------------------- Case-sensitive names ==================== Why is DRMS case-sensitive on series name. SQL 1. String comparison for keywords can be either way, e.g., exact match '=' for case-sensitive, and ~* for case insensitive. 2. case-insensitive for namespace, table, and column names. DRMS seriesname are used as 1. values in drms_* tables 2. table name for a series keywords from jsd are used as 1. values in drms_keyword table 2. column names for the series table links and segments are skipped here because they follow the same rule. No. 2 items are case-insensitive because SQL notion 2. You can think of No. 1 as meeting the 'case preserving' requirement. DRMS use drms_* tables to get series names, keyword names, etc. For string comparison, However, exact match is used, hence it's case-sensitive. This is what I've presented during our meetings. I was told then to keep it the way it is. To make it case-insensitive, we can store series names and keyword names in all lower case for No 1., along with their original forms as an additional column for pretty printing. Then DRMS front end can convert all input to lower case for string comparison. ---------------------------------------------------------------------- On SQL identifiers: http://www.postgresql.org/docs/8.1/interactive/sql-syntax.html#SQL-SYNTAX-IDENTIFIERS My main objection to using quotes on an identifier is that it makes it case-sensitive. It's confusing because SQL table and column names are not case-sensitive.