04-24-08
--------

Summary
- Curious problem
- Fixes/Updates
- Perl interface to SUMS API
- Cursor memory consumption trial
- Plans for DB migration
- DB general status writeup

Curious problem
---------------

Todd had encountered a very curious problem with his .pgpass. There was an extra
white space at the end of line. psql works with it, but not our
modules which uses libpq. So beware of white spaces.

Fixes/Updates
-------------

1. log retention default to 10 days, would be overridden by cmdline
2. drms_query_string() now calls drms_keylist_memsize() to better
   estimate of limit in SQL query statement

Perl interface to SUMS API
--------------------------

gcc had trouble with char *dsname, segfault on it. had to add the
following

%typemap(memberin) char * dsname {
  $1 = (char *)malloc(strlen($input)+1);
  strcpy($1, $input);
}

messed up during committing, had to back out two files:
proj/su/Rules.mk doxygen_publ.cfg. verify with

Now support both icc and gcc. All code checked in. Welcome suggestions
where to put SUMSAPI.pm which depends on libsumsapi_perl.so.

Cursor memory consumption trial
-------------------------------

Got questions last time on memory consumption with SQL cursor. Tried
it with psql. Used vmstat to observe the amount of free memory on
hmidb. Did not notice any significant hogging of memory. "select *
from hmi_ground.lev0" which has a few hundred thousand records.


Plans for DB migration
----------------------

Prepared a plan together with Jennifer for DB migration.

removed rpm installation of 8.3.1 on hmidb2 

HMIDB__81 = our current DB server (production server) postgres v. 8.1
HMIDB2_81 = The new DB server postgres v. 8.1
HMIDB2_83 = The new DB server postgres v. 8.3

1.  Read the postgres install pages to get more comfortable with all
installation parameters
http://www.postgresql.org/docs/8.3/static/installation.html

2.  Read the postgres "migrate" pages to learn what will be required for
running migrate tool(s).
http://www.postgresql.org/docs/8.3/static/migration.html

For 1) and 2), please just skip through these man pages first. Some of
them may not make much sense until you start doing it. The purpose is
to get a feel of where things are so that you can look them up
later. Don't get stuck here.

3.  Locate software for 8.1 and 8.3 located on hmidb2
/usr/local/postgresql-8.3.1
/usr/local/postgresql-8.1.11

4.  Do the correct client version of initdb for both 8.1 and 8.3 (in
different directories!).  Where would you like each of those, by the
way?

5.  Establish aliases or links for the correct client versions of pg_ctl,
createdb, initdb and any other "pg" software tools I might need.

6.  Start HMIDB2_81 and HMIDB2_83 with correct pg_ctl version with logging
turned on.

/usr/local/postgresql-8.3.1/bin/pg_ctl 
/usr/local/postgresql-8.1.11/bin/pg_ctl

7.  Locate system variables in psql to tell me which server/version combo I
am running at any time. Something that can be checked via a script
when needed.

8.  Copy a table from HMIDB__81 into HMIDB2_81.

pg_dump on HMIDB_81 for a single small table, pg_restore on hmidb2_81

9.  Verify that the table came through correctly (select count(*) and a
couple of lines of data).

10.  Migrate that table to HMIDB2_83 using the tools referenced in step 2
above.

11.  Verify that the migration came through correctly (select count(*), a
couple of lines of data).


DB general status
-----------------
database replication technology survey

purpose
- high availability: warm standby ready to take over should primary
   fails 
- load sharing: web server off-load traffic from production server

options:
- synchronous replication and load-balancing: pgcluster, pgpool
   . two-phase commit
   . overhead
   . overkill 
- asynchronous replication using log shipping
   . warm standby server: whole database
   . slony replication: selected tables, log shipping option in latest version
     -> schema change
     -> adding new node
   . something in between for web service

----------------------------------------------------------------------

1. High availability: warm standby

http://www.postgresql.org/docs/8.3/interactive/warm-standby.html

Warm standby is basically point-in-time-recovery on a different
machine which has logs automatically shipped to it.

Jennifer has done some preliminary tests, she ran into various
problems. The most serious problem I understood had to do with
checkpoints. In my opinion, it's likely that her test was so small
that there was no checkpoint in the xlogs. Therefore the db does not
know how to recover.

At some point, I had recommended her to take a step back and tried
point-in-time-recovery on the single server. She ran into the same
problem as above. I thought over the problem then and gave her my
take on the problem as stated in the previous paragraph.

This is really one of the most basic operations of a modern
database. I don't see any hard problem.

----------------------------------------------------------------------

2. Remote DRMS
- Remote DB replicates a subset of HMI/AIA data series
   . Subscription based
   . Slony-I logging shipping
   . Minimum customization of DRMS code
   . Stanford may subscribe to series originated from remote sites

Slony-I log shipping status. I have sent out the following status
report on Apr 10. Since then Jennifer has successfully done tests on
series deletion, as well as tests on missing table on log shipping
slave. She started working on scripts to do each of the tasks. She
also ran into some old problems with special characters which prevent
the ingest script from processing the logs on the log-shipping
slave. This is quite alarming to me. I'd like to be be able to isolate
the problem.

My email from Apr 10:

From my email on Mar 19, I listed the following targest for Slony log
shipping:

1. the order of log application: search for the right log file using a
     similiar script given in the slony documentation.
2. dynamically add and delete a series to a running replication cluster.
3. run slony_logshipper tool.

Jennifer has since done 1 and part of 2 (add only, no delete yet).
Here is what I think should be done

1. run slony_logshipper tool (Jennifer, we talked about it before the
    group meething. I forgot to add to your handwritten notes. Please
    add it).
2. run log ingest script without target table on the log slave
    site. This is to handle adding series for the log slave. We'll need
    to be able to gracefully bailout the ingest script, manually put in
    the new table, and resume the ingest script.
3. write script for moving logs from log generating site to remote
    site. This can be simulated by moving logs between two
    directories using cp, which can be changed to scp/sftp later.
    when to move logs can be either timer based or log volume based,
    a better choice will be from both, whichever meets the threshold
    first. We'll also need to keep a log of what we have
    transferred.

After we have all these pieces ready, we should conduct a more
realistic tests. I have outlined it to Jennifer. Here it is more
formally.

Let M be the master DB, S the slony slave, and LS the log shipping
slave. Let T1 and T2 be two series on M

0. Run pgbench on M to update T1 and T2 continuously. 
1. Configure S to replicate T1 and generating logs (not being used)
2. Configure LS to replicate T1 (this is to test the creation of a new
    remote DRMS replica). This might use slony1_dump.sh to bring LS
    up-to-date, then use log shipping for incremental updates.
3. Add T2 to replication set on S
4. Add T2 to replication set on SL
5. Drop T2 from replication set on S
...

----------------------------------------------------------------------

3. Web db server to off-load traffic from production server

This is really the area we have not done much. We have discussed about
timing requirement at some point. We either use Slony or make use of
the warm standby in one form (frequent snapshot) or another (ZFS
replication).

Options:
- Slony-I replications
   DB admin intensive as Slony-I is not designed to replicate database
   schema, e.g., table definitions, it assumes they are already there.

- Frequent snapshot of the warm standby

   Between snapshots, we must shutdown the web db, which means down
   time for web server.

- file system replication:

   http://www.lethargy.org/~jesus/archives/92-PostgreSQL-warm-standby-on-ZFS-cr%5Cack.html

   ZFS replicating the warm standby to make it useable. It's quite
   elegant but requires Solaris OS. I think Keh-Cheng had pretty much
   shot down this option.