There are two implementations for working with the SAILDART Archive. The second implementation is the public SAILDART web site, which is built from the first, a private GNU/Linux file system. The SAILDART use of database software is substantial but it remains auxillary to file systems. I do miss Jim Gray (lost at sea, 2007), but I have not yet converted to his database-first world view.
The canonical and permanent, SAILDART file URL is simply the old SAIL PDP-10 file name, extension, project, programmer with the old punctuation marks optionally postfixed with a decimal version number — there are no curly braces around version numbers:
FILNAM[PRJ,PRG]
or
FILNAM.EXT[PRJ,PRG]
or
FILNAM.EXT[PRJ,PRG]{version}
To get a bitwise exact copy of a file, append "_octal" to the URL. For example:
wget -q http://www.saildart.org/BUCK75.FNT[XGP,SYS]_octal
serial numbering the data blob hash codes.
For each PRG code (well actually PRG+1 owner codes, which are equal to PRG codes for most everyone except when a code was reused for a different person) a SailDart home page exists at URL
http://www.saildart.org/BGB
or
http://www.saildart.org/[1,BGB]
I once had the SailDart files accessible by URLs in the form of www.saildart.org://{isodate}/FILNAM.EXT[PRJ,PRG] for accessing a revision without having to know its {revision number} since server side mechanism could select the correct revision existing on the given date. This is not a unique canonical URL, but rather provides a large set of URLs for each day in the span of the file revision’s existence. I could be encouraged to re-implement this form of access, and have appended it as a low priority exercise.
The copyright status of the almost one million items inside the SailDart archive varies and may be looked up per item. Most SailDart items were never published, others are public domain. SailDart is an archival collection with human curators. Compliance with the original ARPA, NSF and other contracts supporting academic research at Stanford University is continued best effort. Compliance with the Stanford University policy for archiving research data continues.
John McCarthy punted on the privacy issue. He said (paraphrasing) 1. Do not be in a hurry to contact Stanford officials, 2. Get advise from Les Earnest and Marty Frost, and memorably he repeated the cliché: 3. It is easier to ask for forgiveness than it is to ask for permission.
Stanford University has had continuous possession of the DART permanent tapes. The 229 reels of DART tape are now safely housed in the Digital Collection at the Green Library, on my initiative, with the assistance from Earnest, Frost and Hartwig.
Answer: The guardians must guard each other. Les Earnest has observed that at any computer project, there is an inner circle of system programmers who have access to everything. It is peer pressure from others that preserves privacy.
The URL https: //doresearch.stanford.edu /policies /research-policy-handbook /conduct-research /retention-and-access-research-data links to a page concerning Stanford University policy on the retention of and access to research data. I am aware of this policy now, and I was aware of the issues and ambiguities of an unsorted bulk data collection in 1998 when working with John McCarthy and Ted Selker on long term digital preservation for data mining at the IBM Almaden Research Center. From the Stanford policy, I wish to quote four sentences verbatim:
I claim a wide interpretation for sentence #1, starting with my PhD thesis work on which I indeed hold a 1974 copyright and which arguably is my intellectual property and not that of Stanford University. I am in compliance with policy sentence #2 since the original media is still at Stanford. John McCarthy seemed aware of the Stanford University policy ideas in sentences #3 and #4, and he took it that some folks might exist that assumed the three year retention period was a maximum after which old data should be destroyed in order to avoid difficulties and to cut off the possibility of belated reviews or whistle-blowing. John McCarthy was of the opinion that A.I. should be like Astronomy where research records are kept forever.
The SailDart collection that has been on the web for the past decade is too large, too fragmented and too redundant for the search engines to make much sense of it. The search engines downgrade sites that are as large and as illegible as SailDart has been. However search for keyword SailDart appended with a couple of your special keywords will turn up SailDart stuff. For example, search “SailDart ZORK” returns a set of SAIL files referring to Don Woods game Adventure.
The digital curators (such as myself) who have a copy of the SailDart in a file system can navigate the million files using find and grep. I have built and used a full word index (a concordance) from time to time, but I do not have one built at the moment. Frequency histograms of N-glyphs and N-grams are a routine way of finding stuff, however I do not have a SailDart search tool kit to hand off. Semantic networks of the documents using the same vocabulary (especially names) might be useful.