The size, MD5 hash value and linux file name for each of three formats: 53G a33c8ff234f0e106e2e99e219f3b7b45 dart_records_from_229_reels.tar 9.7G ecc815253d76dffc615e311bfbb4c090 dart_records_from_229_reels.tar.lzma 85G 3adbff17fd7f9f6eb9107755594ae0b9 flat_DART_data8
The 1998 transfer from 229 reels of magnetic tape onto disk, the 2014 consolidation into one large linux file, and the 2018 into DATA8 format; all are described else where.
I borrow the word exegesis from scholars who study ancient texts like the Dead Sea Scrolls. For a digital archive, exegesis is how to convert old formats into current ones. One gold nugget from the Power-Point School of Information Technology is the slide showing a pyramid labeled from bottom to top: Data, Information, Knowledge, and Wisdom. Climbing up the pyramid from fifty-three gigabytes of magnetic tape data records reveals the information to be found at the first Stanford Artificial Intelligence Laboratory.
00112233445566778899aabbccddeeff sail_remix_large 00112233445566778899aabbccddeeff sail_remix_medium 00112233445566778899aabbccddeeff sail_remix_small 00112233445566778899aabbccddeeff dart_remix_large 00112233445566778899aabbccddeeff dart_remix_medium 00112233445566778899aabbccddeeff dart_remix_small
The MD5 hash values before and after compression are: a33c8ff234f0e106e2e99e219f3b7b45 dart_records_from_229_reels.tar ecc815253d76dffc615e311bfbb4c090 dart_records_from_229_reels.tar.lzma
Data is carried on media. The story of the physical DART tape media is its provenance as recited earlier and depicted as a room with 3000 magnetic tapes. The 3000 reels of tapes hang on rails, 48 reels per rail, six rails per rack, each rack is six feet tall four feet wide and a foot deep. So ten racks would fit in a 20 by 20 foot room. All of the SAILDART data can be copied from its original room full of media and placed into a single GNU/Linux file. The raw data and software to unwrap it into current formats fits within sixteen gigabytes, which fits on a (circa 2018) digital camera SD memory card. Where SD stands for Secure Digital — or in this case, Sail Dart.
Attached here, you may have in your hand, a copy of the SAILDART archive along with the GNU/Linux software I have used to exegesize it. The chips are labeled on the outside in black letters on white plastic sticky tape SAIL DART 2015. Most likely there is only a captioned picture of the chips. Nevertheless the determined scholar can search for the keyword MD5 hash codes that are given above, then go on to find the archived data as I received it, then go further to verify my remix, then make new interpretations of this Knowledge and Wisdom. The casual scholar is advised to skip reading about DART format details.
Next, I wish to eliminate the TAR dependency. The Time Capsule version will now be a compressed flat DART byte stream. After that, I wish to eliminate the DART dependency. The time capsule or hand-off baton need only have the original SAIL file system content. DD and TAR are common UNIX utilities we (Baumgart and Frost) used to copy the data from the tapes to disk in the late 1990s. DART is the message carrier — it is not the message. The DART story in the Provenance section is one step in the chain of historical data custody.
Technical versions of this document provide diagrams and software for understanding the conversion from the DART tape format into modern notations. However, it is useful to have a narrative description as well.
When writing to 9-track magnetic tape, the PDP-10 computer words of 36-bits were transferred by the I/O hardware into bytes in big endian order with the final four bits in the high order of the fifth byte. The thirty-six bit SAIL computer words were written to tape, big endian, in five octet bytes. The fifth byte of each word has four low order data bits and four bits of high order zero-bit padding. When the tape drives loses sync, octets can be lost or inserted, and the remaining words would be garbled, however by scanning the serial byte stream for DART record landmarks, the position of the misaligned octets can be adjusted and the remaining words of the record properly aligned.
Text at SAIL was encoded in a non-standard 7-bit ASCII where the thirty-one characters after zero 000 ASCII NUL were mapped into non-Standard non-ASCII glyphs. For example, the SAIL-ASCII code octal 001 is the character named down arrow with the glyph ↓ now known as Unicode hexadecimal u2193. Continuing at less than a pure narrative, we recite the SAIL arrows by reading down the column where each line shows a glyph, the octal code, the unicode hexadecimal and then the character name:
↓ 001 u2193 down arrow, ↔ 027 u2194 double arrow horizontal, → 031 u2192 right arrow, ↑ 136 u2191 up arrow, and → 137 u2190 left arrow;
Five Greek letter glyphs at SAIL were represented as:
α 002 u93B1 alpha, β 003 u03B2 beta, ε 006 u03B5 epsilon, λ 010 u03BB lambda, and π 007 u03C0 pi.
For Logic and Math the SAIL codes 004, 037, 024, 025, 005, 026, 016 and 017 represent glyphs for boolean AND, boolean OR, for each, there exists, boolean NOT, XOR as a circle X, infinity as the lazy eight symbol and the partial differential operator. These codes respectively become Unicode u2227, u2228, u2200, u2203,
∧ 004 u2227 boolean AND ∨ 037 u2228 boolean OR ∀ 024 u2200 Foreach ∃ 025 u2203 Exists and ¬ 005 u00AC logical NOT ⊗ 026 u2297 circle X ∞ 016 u221E infinity ∂ 017 u2202 partial
Then mathematical relations and horseshoe symbols were encoded at SAIL as octal 033, 034, 035, 036 and 020, 021, 022, 023 which in order are ≠ ≤ ≥ ≡ and ⊂ ⊃ ∩ ∪ that become Unicode u2260, u2264, u2265, u2261 and for the horse shoes
≠ 033 u2260 not equal ≤ 034 u2264 less than or equal ≥ 035 u2265 greater than or equl ≡ 036 u2261 equivalence and ⊂ 020 u2282 left horse shoe ⊃ 021 u2283 right horse shoe ∩ 022 u2229 intersection ∪ 203 u222A union
Even if this Exegesis were to be lost — the DART gram is not that hard to decode after the idea of 7-bit characters that are nearly ASCII is re-discovered. Then DART metadata is merely irradiating hiccups in a stream of text. Reading all of that text provides the future Cyber-Sapien archivist — after Homo-Stupids have disappeared, with the actual software which wrote the DART message, late in its analysis the decoding expert will see the glyph shapes in the binary font files or in the TeX metafont files.
The SAIL-WAITS file system is primitive. It was a tool for pioneers at the frontier. Filenames were one to six characters, optionally followed by dot and a one to three character extension. Filename alphabetic characters where only uppercase. The character codes for A to Z were six bits wide as octal 041 to octal 074. The digits are octal 020 to octal 031. The blank is zero. Filename characters on the DART media, as well as internal to the Operating System, could have any of the 64 character values.
Each fle belonged to a directory specified by left square bracket project code comma programmer code right square bracket. The project and the programmer codes where each one to three characters long. Alien to the SAIL file system is the now familiar file system concept of having content blobs separate from directory entries. On GNU/Linux file systems, one or many file path names may be hard linked to one content blob, which was impossible in the SAIL-WAITS file system. At SAIL the early disk hardware was unreliable that a seek command was not trusted to get to the proper cylinder, head and sector of a disk drive. So the file name (directory entry), which SAIL called the Retrieval Information Block (or RIB), included the file name and was written into each data block (called a Track) that was needed to hold the file’s data. And so too on the DART tapes within the sequence of FILE tape data blocks each tape block has a full copy for the Retrieval Information.
There are two original DART record types: Tape-Marker (Head or Tail) File-Data (Start or Continue). I have added a third record type named Gap, to passover the 61 segments of bytes which failed to decode as DART records. Previewing the data shows that all the Tape-Marker records are exactly sixty bytes long and each contains the tape reel number and a date-time stamp. The first word of each DART record has its record size. The record lengths segment the whole DART byte stream with only 63 defects, continuing after a defect requires scanning for the next sane record. With the extreme precision that is available to latter day archival software, the DART segmentation goes as follows:
The long byte vector named dart_remix_large has exactly 56_446_334_821 bytes, contains exactly 2_937_291 short segments of which 5_486 are head-tail records, 1_886_472 are file-start records, 1_045_270 are file-continue records plus the 63 gaps. So the fifty-six gigabytes of tape data have nearly three million short records which contain the data and the names of about one million old SAIL files.
Three further mechanisms need to be previewed here. First, it was the intentional DART backup policy to write two copies of each SAIL file that was deemed of permanent value to two different permanent backup reels of tape. A file found by the utility programs named DSKUSE and DART resident on the SAIL community commons SYS: disk system would be marked as archived once, then marked as archived for a second time, and then there after omitted from further archiving. So each SAIL file should appear in the dart record in two places in the tape records with the same identical content, name and date-time stamp. Second, a unique SAIL-WAITS filename will appear again (with yet two further copies each time) for each newer date-time stamped revision. Generally human edited files do not change very much between revisions. Third, it was the unintentional result of unreliable disk seeking mechanism that meant that file retrieval information including the file name was stored multiple times within the file “blocks” on the disk media. That meant that the SAIL-WAITS file system would contain multiple copies of exactly the same content of a file when a file was copied from one user directory into another. Other kinds of short files (the professional digital archivist term for these files is “turd”) are generated by common utility programs in many user directories with content of no value to the historical record aside from traffic analysis. The result is that the population of 1_886_472 SAIL files in the DART halves to fewer than 900_000 different content blobs, each content blob has one to many hundreds (and for a couple of blobs even thousands) of directory entry name tags (aka retrieval information) rows in the database table of the SAIL-WAITS file names. FILE-START and FILE-CONTINUE tape records File-Start and File-Continue records are identical in format and in content of their file metadata. The File-Start is marked type -3 in the left half of word 0. and the constant sixbit/*FILE*/ in word 19. The File-Continue record is marked type 0 in the left half of word 0 and sixbit/*CONT*/ in word 19. So describing them both as FILE blocks they have 36. words of prefix, then up to 10240. words of data payload, then a 23. word postfix which is most often completely zero except when a few bits are tinked pursuant to observations of error conditions in the reading of the low density tapes. The FILE metadata is sixbit/FILNAM/ sixbit/EXT/ sixbit/PRJPRG/ the length of the file in words, SAIL-WAITS protection bits, mode that the file was written, and a date-time stamp. The file data block records seen on the high density tapes are surprisingly fat considering the computer poverty of the prior 18 year period. The explanation is that the DART data format version #3 was a final revision done to handle the massive MCOPY of the 3000 old tapes into the newer higher density ones, the format was over ambitious and had allocated many bytes of space that were never used. HEAD and TAIL tape records All the tape HEAD and TAIL records are exactly 60. bytes long. Each contains 12. PDP-10 words. Seven of the twelve words have a fixed constant value, making the HEAD-TAIL records easy to find in a byte string, the other five words carry a date-time stamp, a checksum for 10. words of the HEAD-TAIL record, the tape reel number and the tape position in feet from the tape load point which is irrelevant to the SAIL-WAIT file system but it is amusing to know where the low density tape reel images fall within the high density tapes. There are 41_594 tape records from the higher density tapes, which each in turn contain 1 to 100 or so small records from the lower density tapes. In total there are 2_934_700 of the small records plus the 63 gaps. The 229 reels of high density DART tape are labeled P3000 to P3229, as mentioned earlier, the reels still exist and are kept in the Stanford University Digital Archive housed in the Green Library building on the campus in Palo Alto, California. Each tape contains high density (6250 bpi) records. Each high density record is a concatenation of records from the lower density (800 bpi) tapes which were label P1 to P2984. The letter ’P’ indicated Permanent backup tape as oppose to the incremental ones which were marked ’T’ for Temporary. The final reel of Permanent Tape was written 16 August 1990 and that reel of tape was copied to disk in March 1998, however the earliest file I have from that reel is time stamped 17 June 1998. The rescue of the high density tapes to disk was not a well documented process, the quantity of old tape in the basement of the CSD building was overwhelming, the speed of the tape drive was slow, the working hours were 2nd and 3rd shift, the disks drives were nine Gigabytes each and were taken off site to copy into several other systems since there was not enough disk space available to us on a single system. The low density reels were written over a period of nearly 18 years. The HEAD of tape #P1 is time stamped 1972-11-05T11:59 and its TAIL is marked 1972-11-05T12:23 which implies that first tape took 24 minutes to be written on a quiet Sunday in November around lunch time. Richard Nixon wins re-election to a second term as president of the United States on the following Tuesday 7 November 1972. The high density reels were written over a period of nearly 31 months. The HEAD of the first high-density DART tape #P3000 is time stamped 1988-02-01T17:17 The TAIL record on the final high density tape #P3228 is dated 1990-08-16T22:55 so at nearly 11 PM on Thursday in mid August the DART record ends. Iraq had annexed Kuwait during the first week of August 1990. The final lower density tape #P2984 is time stamped 1990-08-17T16:43 which overlaps the time period in which the final high density tape is written.
The data found in the 63 gaps, is assigned its MD5 blob serial number and tagged with a unique SAIL file label and included in the SailDart collection as allowed by KISS design authority (Keep It Simple Stupid) principle and the Brewster Kahle archiving principle of keep everything you can but don’t fret the details. Working at the Internet Archive we would boost that we were going for Quantity first, not Quality; the SailDart data of 1998 is a pleasant past time since its Fixed Quantity becomes a lot easier to manage with each passing year.
Six frames of 7-track tape supply a 36-bit PDP10 word frame 1 frame 2 frame 3 frame 4 frame 5 frame 6 A A A A A A B B B B B B C C C C C C D D D D D D E E E E E E F F F F F F Bits 0 to 5 bits 6 to 11 bits 12 to 17 bits 18 to 23 bits 24 to 29 bits 30 to 35 Five frames of 9-track tape supply a 36-bit PDP10 word frame 1 frame 2 frame 3 frame 4 frame 5 A A A A A A B B B B B B C C C C C C D D D D D D E E E E E E F F F F F F 0 0 0 0 Bits 0 to 7 bits 8 to 15 bits 16 to 23 bits 24 to 31 32 to 35 zeroes 7-bit SAIL ASCII to Unicode and UTF-8 table 0 1 2 3 4 5 6 7 000 null ↓ α β ∧ ¬ ε π 010 λ \t \n \v \f \r ∞ ∂ 020 ⊂ ⊃ ∩ ∪ ∀ ∃ ⊗ ↔ 030 _ → ~ ≠ ≤ ≥ ≡ ∨ 040 ␣ ! “ # $ % & ’ 050 ( ) * + , - . / 060 0 1 2 3 4 5 6 7 070 8 9 : ; < = > ? 100 @ A B C D E F G 110 H I J K L M N O 120 P Q R S T U V W 130 X Y Z [ \ ] ↑ ← 140 ‘ a b c d e f g 150 h i j k l m n o 160 p q r s t u v w 170 x y z { | ALT } BS 6-bit ASCII minus 040 code table DART tape HEAD and TAIL record format. word name value description 0. Type_Size 000006_000013 1. _DART_ sixbit/DART␣␣/ 2. BOT_EOT sixbit/*HEAD*/ sixbit/*TAIL*/ 3. date_time 4. ppn sixbit/DMPSYS/ 5. Class2Tape XWD 2,Tape# 6. Rel_Abs 7. feet 8. word8 0 0 9. minus1 -1 -1 10. word10 0 0 11. checksum Rotated DART file START and CONTINUE record format. The MCOPY version#6 DART file (start and continue) record format has five parts. words name description I 2. TypeSize DART record Type and Size II 16. RIB WAITS File System Retrieval-Info-Block III 18. Leader MCOPY extra baggage IV 0≤N≤ 10240. − 61. Payload portion of the actual file data V 23. PRMERR Previous Media Errors Diagram of parts I and II octal decimal symbolic value comment ... 0 type_size (-3 or 0),,size size is 2 short of record length ... 1 dsk_or_error ’DSK’ constant 000 2 DDNAM ’filnam’ file name 001 3 DDEXT XWD ’ext’,Date create (c)Date 002 4 DDPRO prot, mode, time, date write (m)Date Time 003 5 DDPPN XWD ’prj’,’prg’ project programmer 004 6 DDLOC track# disk track 005 7 DDLNG file length PDP10 words 006 8 DREFTM reference date time (a)Date Time 007 9 DDMPTM (T or P)dump date (d)Date 010 10 DGRP1R =1 first group 011 11 DNXTGP =0 next group 012 12 DSATID 03164236 then ’RSK’ or ’TSK’ or 0 Storage Allocation Table ID 013 13 DQINFO =0 defective 154 times 014 14 zero14 =0 defective 32 times 015 15 wrtool ’progrm’ write program name 016 16 DDWPPN XWD ’prj’,’prg’ write project programmer 017 17 DDOFFS =1 Diagram of parts III, IV and V octal decimal symbolic comment 022 18 _DART_ sixbit/DART␣␣/ 023 19 File_Con sixbit/*FILE*/ or sixbit/CON␣␣#/ 024 20 date-time when MCOPY reel written 025 21 MC_SYS sixbit/␣MCSYS/ 026 22 two_reel XWD class=2 and MCOPY reel# 027 23 one_one XWD 1 and 1 030 24 Feet MCOPY reel position 031 25 0 032 26 -1 033 27 0 034 28 Words_To_Go payload words remaining in file 035 29 0 036 30 0 037 31 0 040 32 0 041 33 0 042 34 0 043 35 0 044 36... file blob data payload 000 -23 PRMERR 0 001 -22 “ 0 002 -21 “ 0 ... “ 0 024 -3 “ 0 025 -2 ’$PEND$’ 046045_564404 026 -1 checksum XOR SAIL file system metadata DART database files
Mount a large disk on /data or logically link to your large disk or /data/2014 Read Time Capsule into /dartrecords De-compress and De-tar the time capsule files Setup destination pathnames for undart-2014. All destination pathnames are optional, if undart can not access a path it will not output files of that format. If there are NO destination pathnames found, undart will stderr a brief usage message and exit 0 for success. Setup destination pathnames for the GNU/Linux relinking command scripts. Setup destination pathnames for the HTML generation and relinking command scripts. Compile and run UNDART-2014 Convert DART tape records into octal text and database CSV metadata of the SAIL files. Explain MD5 hashing of data blobs and serial numbering of the data blobs. The DART metadata is preserved in database CSV lingua franca. Create a SAIL database and load the CSV metadata files into tables. Convert 1-to-1 into current presentation format. SAIL files into current presentation open source formats. Unix like UTF8 file systems, tar files and for web publication HTML5, CSS3, SVG, PNG, OOG. Join fragments into comprehensible objects. Merge pieces that go together. Provide handling for multi version (sequential temporal) documents to show each point in time, or a best final and/or comprehensive copy. Provide handling for multi version (cut-and-paste derivative work) documents. Remove the ancient redundancy which has served its purpose of transmitting a message to us. Redact damaged data (single byte drop out to zero, and such) Static scholarship. Talmudic scholarship and historical commentary about the meaning and significance of SAIL content. Dynamic re-enactment. Theatrical performance of scripts, Re-enactment, historical simulation, running models of the Look-n-Feel of life at the 1st SAIL. End of preview. Now for a long saga concern the 1st Exegesis. 9.4 Program text of /home/sail/undart-2014.c 9.5 Program text of DART.FAI[TAP,REG] 9.6 Program text of DSKUSE.FAI[ACT,REG] figure Pictures/Matryoshka/256px-Floral_matryoshka_set_2_smallest_doll_nested.png Figure 9.1 Innermost Matryoshka Doll 9.7 Remove DART packaging from SAIL file data The first step in SailDart exegesis is the removal of the DART outer wrappers from the content it carries. The content of SailDart is Signal, the DART format is a Carrier. There is some noise, 63 DART record gaps, XOR checksum failures. There are no rotating checksum failures of the 60 byte HEAD-TAIL records. and the DART-v6 ’ ERROR.ERR[ERR,OR]’ logging of yet earlier tape media defects, which all here simply bundled into tagged blobs along with all the more seemingly valid SAIL-WAITS file objects. In the Carl Sagan book and movie, titled ’Contact’, a message is received from Extra Terrestrials, with sufficient in-band clues (computer programs and documentation) to help the reader fully decode the whole alien message; which turns out to be instructions to build a machine (which I believe becomes in the story an instance of the extra terrestrial itself — but then the movie ending leaves one free to fill in the blanks as one wishes). So too, to a lesser extent, the message of SailDart decodes itself and requires building a machine to fully instantiate itself. One of Professor John McCarthy’s A.I. challenge problems, was to write a program for a robot to travel from Palo Alto to Timbuktu. John most likely expected something to do with Micro PLANNER or the Advice Taker enumerating in formal logic notation the practical steps like a travel agent booking a sequence of airplane reservations. John did not like our proposal (Baumgart, Moravec and possibly other A.I. lab Star Trekkies) that the robot transmit its blueprints, software, sufficient wired funds and technical instructions over a modem to Timbuktu to recreate an exact replica of itself at the African site. Odd to reflect, I still recall John giving a rambling lecture on applying mathematical proof of correctness to travel bookings for an airline with one flight and one seat. Starting with a naive vague idea about PDP-10 tape encoding and formats (and my experience, half forgotten, of the technology in that time period, and a couple of emails from Marty Frost) it has been possible to resurrect the in-band DART data records, the DART source text, the original DART documentation, the DART executable, and to convert the large DART database files. Here we see a dying empire, a major university computer research system, and its academic community suffering yet further disk failures (and heroic recoveries) and writing journals and messages, like a monastic community during plague years, of what has gone missing and what has survived on each disk crash and each budget reduction. The final messages, system alert notices, and documents include a wake in "hope of the resurrection". The second major task is converting ancient digital content into modern formats. For some individuals this has often been experienced as a personal crisis as commercial vendors ruinously lock-in their customers to proprietary formats that either disappear without a trace or force the users to buy upgrades of the format without long term recovery mechanism for old content. Like digging out and unwrapping (or x-raying through) a Pharaoh mummy, inside a sarcophagus, inside a tomb. Third task, redact damaged content. The fourth task, clean the Augean stable, i.e. reduce redundancy. Redact exact clone copies. Diff / Merge / GIT the incremental changes, with special attention for the append-only journal and log files. Break into time stamped message snippets, dump into CSV (Comma String Value) files for database tables, index by date-time, from who, to whom, about what. Then make a best effort attempt to search, count, join, merge, sort and display results on the web. Distinguish content which was publicly funded and published at the time from private communications and personal files. DART magnetic tape defects The DART records have spans of garbled bytes. All files that DART deemed worthy to write to the permanent tapes where written twice. So most files appear two (or more times) within the dart records with the exact same bitwise content. The first example of a single missed dart tape defective byte that I found was rather obvious, a little to the left of the center (at the base of the neck), in a much studied very early digital image. The good copy is at http://www.saildart.org/N.DAT[XAP,BGB]1 Nearly as good, but with the defective black zero byte at row col is http://www.saildart.org/N.DAT[XAP,BGB]2 So for a well curated SailDart collection, only the first copy is included. FAIL symbol table format. 9.8 SAIL text into UNICODE UTF-8. 9.9 Disassembling DMP executable binary into text. 9.10 SAIL software highlighting into cross-referenced HTML. Seven bit text encoding at SAIL. As tabular illustration. Arrows 001 down arrow↓ becomes Unicode u2193 represented in UTF8 as \342\206\223 136 up arrow↑ u2191 and \342\206\221 137 left arrow ← u2190 and \342\206\220 027 double arrow ↔ u2194 and \342\206\224 031 right arrow →u2192 and \342\206\222 Greek letters SAIL codes 002 003 006 010 007 are α β ε λ π become Unicode u03b1 u03b2 u03b5 u03bb u03c0. In ASCII 007 is the BELL. When I see pi characters π π π on a display terminal I hear TTY bell dings. Logic and Math SAIL codes 004 037 024 025 005 026 and 016 017 ∧ ∨ ∀ ∃ ¬ ⊗∞ ∂ become Unicode u2227 u2228 u2200 u2203 u00AC u2297 and u221E u2202 Relations and Horseshoes SAIL codes 033 034 035 036 and 020 021 022 023 ≠ ≤ ≥ ≡ ⊂ ⊃ ∩ ∪ become Unicode u2260 u2264 u2265 u2261 and u2282 u2283 u2229 u222A