bos status ronald-ann -long -noauth bos status rosebud -long -noauth
fs df /afs/sipb/service/partitions/*
Volumes should be moved, before continuing, if any partition is close to full.
vos listvol ronald-ann -noauth vos listvol rosebud -noauth
Ideally, all the lines that start with "Total volumes" should have "Total volumes offLine 0 ; Total busy 0". Otherwise, find out if someone else is doing volume operations, or if some volumes need to be salvaged (see part D below).
Since the listvol output also gives disk-space usage, it be scanned for numbers that look way out of line, e.g., 100 Mb user volumes.
There will be a lot of messages that are actually fairly normal occurrences, and can usually be ignored. These include
Break call back failed for host CB: Call back connect back failed (in break delayed) CB: RCallBack (zero fid probe in host.c) failed for host CB: RCallBackConnectBack (host.c) failed for host Discarded a packet for ######## VAttachVolume: Cannot read volume header fssync: callbacks broken for volume ######### fssync: volume ######### moved to ########; breaking all call backs trans ######## is older than 300 secondsThe most common messages that are of interest are
Partition /vicepX that contains volume ######### is full Volume ######### needs to be salvagedBoth of these indicate a potential need to do bos salvage immediately.
ls -l /afs/sipb/project/newdump /usr/afs/backup(/usr/afs/backup is a local directory on the /var partition of hodge)
Each tape has a label (i.e., written to the tape media - not the one written on the tape package) that identifies the volume set and dump level that it is used for. The names of the volume sets are rann, rbud, and rsqr. These correspond, respectively, to A, B, and C above. That is, a volume set is the collection of all volumes having in common one of these server or server/partition locations.
The four groups of tapes correspond to four dump levels. The dump-level names are of the form /volume-set-name_# (e.g., /rann_1, /rann_2, /rann_3, /rann_4, /rbud_1, etc.). There is no particular reason for having separate dump-level names for each volume set, i.e., /foo_1, /foo_2, etc. could also have been used. In other words, the dump-level name is just an arbitrary string to distinguish the different tapes used for the same volume set. Regardless of the dump level, the full contents of all volumes in the volume set are written to tape. All levels are equivalent; none is higher or lower than any other.
The tape label is formed by putting together the volume-set name, dump-level name, and a sequence number. For example, the tape used for volume set rbud and dump level /rbud_3 has the label rbud.rbud_3.1. The ".1" means that it is the first tape of the sequence. For sipb-cell backups, there is always just one tape in a sequence, so the label names will always end in ".1".
(Incidentally, there is no particular reason for having cryptic four-letter volume-set names. Originally, there was only a single volume set for the sipb cell, called sipb. Once the cell had too much data to fit on one tape, two new volume sets replaced it, with names rann and rbud, for compatibility with the original four-letter name. When the R-Squared external drive was connected to rosebud, a third volume set was added, called rsqr.)
The information written on the tape package will indicate what tape is inside. For example, the rbud.rbud_3.1 tape will say "ROSEBUD 3", the rsqr.rsqr_4.1 tape will say "RSQR 4", etc.
The dump levels are used sequentially. That is, if one week the levels /rann_2, /rbud_2, and /rsqr_2 are used, then the next week /rann_3, /rbud_3, and /rsqr_3 would be used. If the tapes for the current dump levels are not in the SIPB office, they are with the IS Media Storage service in Building 11. They can be retrieved by bringing our tape receipt (storage number CTG007) to 11-226 Monday-Friday from 8 AM - 4 PM. It takes about 5 minutes for the people there to get the tapes. Remember that the tapes should be obtained by Friday if Monday is a holiday. When retrieving tapes, a new set of tapes should be brought there, to swap into storage. The tapes to bring are the second-to-most-recent ones. The most recent tapes should be kept in the office, in case they are needed to restore data.
If the tapes for the correct dump levels can't be obtained, it is OK to use different dump levels. Using the dump levels sequentially is just a convention (and it makes sense to overwrite older backups, not newer ones); none of the software requires it.
If it's necessary to use a new tape, or one that was previously used for something else, then a labeltape operation should be done prior to the backup. (Note: usually, the only tapes that should be reused are AFS tapes from previous years. The other tapes in the SIPB office generally have some archival value.)
To do labeling, start up two separate root shells on hodge. Each of them should be in a PAG with suitable root-instance tokens, and have the cwd /afs/sipb/project/newdump. In the first one, type
# @sys/backup -cell sipb.mit.eduand in the second, type
# @sys/butcIn the first shell there will be a "backup> " prompt. To label a tape as rbud.rbud_3.1, type
backup> labeltape rbud.rbud\_3.1There will be a prompt to hit return in the second shell. The second shell will eventually indicate that the labeling finished (it takes a few minutes). At this point, type quit in the first shell, control-C in the second, and (presumably) unlog and kdestroy in one or both of them.
Volume cloning is done once per server. Volume cloning must be done before volume dumping. It takes much less time (about one-tenth as much). It is probably best to do it immediately before. Volume cloning accesses all of the volumes that will be dumped. If there is a problem with accessing any volume, it will often be detected during cloning, and can be corrected (e.g., with a bos salvage) before dumping. If cloning is done too long before dumping (e.g., a day earlier) there's a possibility that some problem will develop in between the two times. This can cause the backup to be incomplete, since that volume will not be dumped.
The volume cloning is done via sh scripts. The general idea is to clone all volumes that have names not ending with ".nb" and not beginning with "disk.". First, use mkvbscript to create a script containing the series of vos backup commands:
cd /afs/sipb/project/newdump scripts/mkvbscript servernameThen, run this script, recording its output to a file:
script vb-script.server\_abbreviation.YYMMDD /usr/tmp/vb-script.servername.DDMonYYYY ^DCompress the output file, and save it in project.newdump:
compress vb-script.volume-set-name.YYMMDD mv vb-script.server\_abbreviation.YYMMDD.Z last/(The server abbreviations are rann and rbud.)
If any of the vos backup commands failed, do a bos salvage on that individual volume, then run that vos backup command again.
The volume dumping is done with a perl script. The perl script currently has some problems (mainly: it does not finish cleanly) but it is still a bit more convenient than other methods. The main function of the script is to run the backup program, giving it the command "dump volume-set-name dump-level-name", e.g., "dump rann /rann_2". The script also starts up the butc program, which controls I/O to the tape drive, and handles interaction between backup and butc using perl's chat2 facility. Before running it, put the correct tape into the tape drive, and wait until the green light on the front is lit. Then, just give the dump arguments on the command line, along with the -noclone switch, e.g.,
/afs/sipb/project/newdump/backup.pl rann /rann\_2 -nocloneFor volume set rann, expect to wait about 4 hours. For rbud or rsqr, it should take less than 2 hours. There will be on the order of 10-50 lines of output reporting the status of the backup. At the end, there should be a final message of the form "Finished doing dump rann.rann_2 successfully". At this point, hit control-C until the shell prompt comes back (it will take either 2 or 3 control-C's).
Once the backup of each volume set finishes, find the processes (on hodge) running "@sys/butc" and "@sys/backup -cell sipb.mit.edu" and kill them manually. There will be files in /usr/tmp whose names are backup and butc followed by the date. If the backup failed, they should be deleted. If it succeeded, they should be compressed and stored away in project.newdump, e.g.,
cd /afs/sipb/project/newdump/last compress -c /usr/tmp/backup.960108 > backup.rann.960108.Z compress -c /usr/tmp/butc.960108 > butc.rann.960108.ZThe most common failure mode is:
file\_tm: code = -1, errno = 5
file\_tm: I/O error Writing data
Error in receiving data
butm: tape I/O error
Dump volume-set-name.dump-level-name encountered an error
In this case, the tape should be marked "bad", a new tape should be
inserted, and backup.pl should be run again with the same arguments.
In general, it's necessary to re-run backup.pl whenever a "Dump ...
encountered an error" message is obtained.
Write an entry on the tape package listing the date of the successful backup, the dump-level name, and username, e.g.,
01-08-96 /rann\_2 paco
rm -rf /afs/sipb/project/newdump/hodge/* cd /usr/afs/backup; tar cf - . | (cd /afs/sipb/project/newdump/hodge; tar xpf -)This would usually be the time to unlog.