Meeting: 12Apr2011 * tape_do_arch.pl will write tapes as determined by the cadence given in a table like so: #/home/production/cvs/JSOC/base/sums/apps/data/arch_group.cfg #Configuration table for tape archive by group number. #Cadence in days at which tape_do_archive.pl will write archive tapes #for each group. #group cadence(days) 0 30 1 30 2 1 3 1 4 1 5 1 6 1 7 30 8 30 9 30 10 1 310 1 311 1 These numbers will be changed as we observe the arching behaviour. * Each tape file will contain only one dataset. * Each tape file will be a minimum of 500MB or a maximum of 512 sudirs. * Each tape will contain only one group. * tape_do_arch.pl will create a manifest file that is passed to a tapearcX instance for each group. The manifest has all the sudirs that go into a tape file. And all the files that go onto a tape. It may contain more than on tape's worth of data. * When tapearcX has completed it's manifest file it deletes the file. tape_do_arch.pl will not create a new manifest file for a group if the old one still exists. * The current tape_svc and driven_svc will be used where the jsoc_sums DB is updated for each tape file written. If we see that this prevents tape streaming on write, then we will go to a bulk update of the DB after all files for the tape have been written. * The idea of inspecting the creation date of each sudir to see if it was past the timeout period, went away, with the new cadence concept for archiving each group. ========================================================================= My ans to mail from Phil: Thanks. I will add this to the notes. > How much tape capacity is consumed as overhead for each file? I don't know. Maybe K. knows? I won't be surprised if a file mark is 50% or more of a small file? > I assume the tapearcX instance itself quits when it deletes the manifest file. Yes. Phil Scherrer wrote: >> Meeting: 12Apr2011 >> >> * tape_do_arch.pl will write tapes as determined by the cadence >> given in a table like so: >> >> #/home/production/cvs/JSOC/base/sums/apps/data/arch_group.cfg >> #Configuration table for tape archive by group number. >> #Cadence in days at which tape_do_archive.pl will write archive tapes >> #for each group. >> #group cadence(days) >> 0 30 >> 1 30 >> 2 1 >> 3 1 >> 4 1 >> 5 1 >> 6 1 >> 7 30 >> 8 30 >> 9 30 >> 10 1 >> 310 1 >> 311 1 >> >> These numbers will be changed as we observe the arching behaviour. > New tape group numbers will be assigned to some series or series will be > moved to other groups as we observe the behavior. > We will not reduce the hold time for a given group. > For series that are archived as backup we commit to remaking products > that are lost due to disk failure. > > perhaps we should have one catch-all group for series that have long > online retention > times and the archive is primarily backup, and another for series where the > retention is the expected near term usage, e.g. 30d, but that are more > likely to be retrieved > later as needed. The first of these could migrate to the "shelves" but > the latter > should stay in the T950. This could be a reason for both a 0 and a 1. > > (The system should track the number of times each tape is loaded for read. > If the number of reads gets large enough or errors start to occur we must > have a way to migrate the SUs to another tape.) > >> >> * Each tape file will contain only one dataset. > dataseries. >> >> * Each tape file will be a minimum of 500MB or a maximum of 512 sudirs. > This restriction is no longer possible once tracking the age is disabled. > If it is less than 500MB and the 30 days is up and we are no longer > tracking > the age of each series, we must admit to lots of smaller files. > > How much tape capacity is consumed as overhead for each file? > >> >> * Each tape will contain only one group. >> >> * tape_do_arch.pl will create a manifest file that is passed to a >> tapearcX >> instance for each group. The manifest has all the sudirs that go into >> a tape file. And all the files that go onto a tape. It may contain more >> than on tape's worth of data. >> >> * When tapearcX has completed it's manifest file it deletes the file. >> tape_do_arch.pl will not create a new manifest file for a group if the >> old one still exists. > I assume the tapearcX instance itself quits when it deletes the manifest > file. > Is that correct? > >> >> * The current tape_svc and driven_svc will be used where the jsoc_sums DB >> is updated for each tape file written. If we see that this prevents tape >> streaming on write, then we will go to a bulk update of the DB after >> all files for the tape have been written. > > As soon as possible, the current tape_svc will be modified to request a > drive > to use when it is presented with a manifest to archive. It will release > the > drive when finished. There will be a single queue of drives. > >> >> * The idea of inspecting the creation date of each sudir to see if it >> was past the timeout period, went away, with the new cadence concept for >> archiving each group. >> > > > On 4/12/2011 1:11 PM, Jim Aloise wrote: >> http://sun.stanford.edu/~jim/archive/meeting.notes