05-01-08
--------

- Bug fix
- Precision in DRMS query and insert
- Precision in PostgreSQL pg_dump 
  . Why study pg_dump?
  . Why not binary format?
  . Keh-Cheng's analysis
  . Keh-Cheng's experience with pg_dump -Fc on 8.3
  . My experience with pg_dump -Fc on 8.1

Bug fix
--------

1. fixed problem with drms_keylist_memsize()
It used to only allows keywords, added recnum upon Phil's request. For
now, treat recnum differently from the rest of the keywords.

In general, there are a couple of ways to query column names in a table

SELECT a.attname FROM pg_attribute a, pg_class c, pg_namespace n WHERE c.oid =
a.attrelid and c.relname='tablename' and n.nspname = 'tablenamespace'
and a.attnum > 0 and NOT a.attisdropped

select column_name, table_name, table_schema from
information_schema.columns where table_name = 'tablename' and table_schema
= 'tableschema'; 

2. fixed a memory bug in drms_keylist_memsize()

Precision in DRMS query and insert
----------------------------------

Checked precision for in and out params in DRMS query and insert.

db_bulk_insert_array_insert() uses  copy binary
db_bulk_insertv() is a variant of the above

query:
DB_Binary_Result_t *db_query_bin(DB_Handle_t  *dbin,  char *query_string)
in: always text format?
out: binary
this is what drms_record.c uses drms_retrieve_records() called by drms_open_records()

DB_Binary_Result_t *db_query_bin_array(DB_Handle_t  *dbin, 
                                       char *query, int n_args,
				       DB_Type_t *intype, void **argin
                                       )
DB_Binary_Result_t *db_query_binv(DB_Handle_t  *dbin, char *query, ...);
a variant of the above

in: query string contains ?'s or params, whose values are passed in binary
out: query result in either binary or text format

Precision in PostgreSQL pg_dump 
-------------------------------

Why study pg_dump?

I started looking into the precision issue in PostgreSQL, I chose
pg_dump for the following reasons:
- pg_dump is the primary tool to migrate between major PostgreSQL
  software releases
- pg_dump uses COPY command in text mode, the same as Slony-I
  (slony1-dump.sh for example, as well as others).

----------------------------------------------------------------------

Why not binary format?

pg_dump stopped supporting binary format for portability, i.e., binary
data can't move across computer architecture, one might run into
problems like big/little endianess. Binary is also problematic for
version change, e.g., suppose the internal format for a data type changed
between version. In summary, pg_dump is in text to preserve platform
and version independence.

http://archives.postgresql.org/pgsql-hackers/2007-12/msg00143.php

----------------------------------------------------------------------
Keh-Cheng's analysis

I learned quite a few things from Keh-Cheng who is an expert on
scientific computation.

Here is a quote from his email on Sat Apr 26, 2008

"Having examined the actual on-disk contents of a few
test tables and the corresponding pg_dump output,
I believe the current version (8.3.1) of postgresql
handles double precision correctly albeit very
inefficiently.  Unless there is some undocumented
way of dumping IEEE floating-point  numbers in binary
mode, it is SINGLE precision values that we need to
worry about.  Here is why:

Pg_dump reads double/single-precision IEEE numbers
in the table and converts them to ASCII with a
format string equivalent to "%.17g"/"%.8g".  For
IEEE double with a 53-bit mantissa, one unit in
the last place (ulp) has a worst-case relative
round-off error of 1/2 * 2^(-53) = 5.55e-17 
while an ASCII string with 17 significant figures
has a worst-case round-off error of 1/2 * 10^(-16)
= 5e-17.  Since the decimal representation has
slightly denser "ticmarks" than the binary 

representation, a conversion from binary to
decimal (during dump) followed by a conversion
from decimal back to binary (during restore) 
will always yield the same IEEE double precision
number that was originally in the table.  This
is OK but very inefficient.  (Binary/decimal
conversions typically cost thousands of CPU cycles.)

Doing the same analysis for IEEE single (24-bit
mantissa) and 8-significant-figure decimal numbers,
one finds the latter to have HIGHER round-off error
(5e-8 vs 2.98e-8).  In other words, an IEEE single
converted to decimal with the "%.8g" format CANNOT
be expected to be converted back to the original value.
If you compile and run the attached test program,
you'll see that of the 1 million randomly chosen
numbers between 0 and 1, approximately 0.7% of 
them fail the double conversion test.  (As expected,
all the failure cases have leading significant digits
10, 11, 12, or 13 because they have the highest 
relative round-off errors in the range.)  If you
give the decimal conversion just one more significant
figure by changing the "%.8g" format to "%.9g", then
all of them pass."

Keh-Cheng also located two places in the src code to rectify the
problem with floats.

I went on and browsed two related papers on this subject:

How to read floating point numbers accurately
William D. Clinger

How to print floating-point numbers accurately
Guy L. Steele, Jr., Jon L. White 

Discussion on this subject from PostgreSQL mailing list 
http://qaix.com/postgresql-database-development/203-042-float-output-precision-questions-read.shtml

----------------------------------------------------------------------
Keh-Cheng's experience with pg_dump -Fc on 8.3

Keh-Cheng reported promising results with pg_dump -Fc. This custom
option appearantly gets floats correctly in 8.3

----------------------------------------------------------------------
My experience with pg_dump -Fc on 8.1

I did not have such good luck with 8.1. The precision problem
remains. Unfortunately that's the issue we have to face, i.e., moving
from a 8.1 server to a 8.3 server.

I followed Keh-Cheng's advice to change the src code. I ran pg_dump
and pg_restore from this recompiled src, the precision problem was not
fixed. It could have something to do with I am still running 8.1
server without the src change. Shall try that next.