Message-Id: <9206080349.AA00415@pixel.convex.com> Subject: MIME for global hypertext To: www-talk@nxoc01.cern.ch, wais-talk@think.com Date: Sun, 07 Jun 92 22:49:51 CDT From: Dan Connolly <connolly@pixel.convex.com> [This was posted to several newsgroups, but someone from wais-talk suggest I forward it there also.] The WAIS, gopher, and world-wide-web projects are all client/server information retrieval systems. All three deliver plain text information quite well, and they each have evolving mechanisms for delivering other forms of information. The MIME RFC defines a system for processing multi-part, multimedia messages on the internet. I would like to see these systems, along with USENET news and internet mail, interoperate with MIME as the substrate. The clients for these systems go something like this: 0 user invokes client (and chooses a starting point) 1 client displays user's request 2 user reads page, chooses a reference to more info 3 user informs client of choice (e.g. "show me item #1," or "search for googoo") 4 go to step 1 These systems often consist of a hierarchy of menus with text files at the leaf nodes. The system allows the user to interactively navigate the menus and browse leaf nodes. But 1) the format of the menus is particular to the system (USENET newsgroups/articles, unix directories/files, WAIS source/database/document). And 2) once a user is at a leaf node, the system can no longer interactively follow references. The novel aspect of hypertext is that the distinction between the menu pages and the text pages disappears. In the world-wide-web, text documents have machine-readable links inside them, and all menus are represented as hypertext documents. The WWW format works well, but it would benefit from use of MIME's features. For a common hypertext document format, I propose we define a subtype of the MIME multipart message: X-HYPERTEXT. The first part of a multipart/X-HYPERTEXT message is the content of the document, and the remaining parts are multimedia attachments and links to other documents. The content part contains references (by Content-ID) to the attachments and links. The client software allows the user to interactively choose references to display/follow. The remaining parts may be attached image/audio/video using MIME's various types and transfer encodings (text attachments would work too) or they may be references to information accessible elsewhere using MIME's message/external-body type. The parameters to the external-body content-type provide the same information as WWW's Universal Document Indentifier. (MIME only defines ANON-FTP, FTP, TFTP, LOCAL-FILE and AFS. The remaining access-types (WAIS, gopher, etc) would be experimental (X-WAIS, X-GOPHER) until standardized.) The emerging standard for structured, platform-independent text is SGML. The WWW project defines an SGML document type with traditional elements (title, heading, paragraph, list) and new hypertext elements (anchor). Soon it will have multimedia elements (image, audio). The current design places external document references (to files, WWW servers, WAIS documents, gophers, etc.) inside the SGML as attributes. There are lexical incompatibilities, and the design is under strain. I suggest that we implement references as as SGML entities that identify message/external-body parts by content-id. Representing document content in SGML allows the same information to be accessed using different user interface paradigms (e.g. dumb terminals vs. curses style vs. x windows point-and-click). Short of full SGML parsing, we could adopt the MIME text/richtext format, with the addition of a <REF ID="xxx">...</REF> tag. In fact, any representation that allows the user to interactively indicate one of the attached body parts by content-id will do. For example, plain text with one-line descriptions would do. The Andrew ez data stream would also work, but only Andrew sites could parse it. This brings up the issue of format negociation. No one format is optimal for all information. Clients are likely to be able to process information in several formats, and servers are likely to be able to provide different representations. The various formats can be enclosed in a MIME multipart/alternative message. And rather than including the data for all formats in the message, the data could be in message/external-body parts. The client chooses the type of data it likes and retrieves the corresponding external-body. This (modified) example from the MIME rfc may help explain: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=42 --42 Content-Type: message/external-body; name="BodyFormats.ps"; site="thumper.bellcore.com"; access-type=ANON-FTP; directory="pub"; mode="image"; Content-type: application/postscript --42 Content-Type: message/external-body; name="/u/nsb/writing/rfcs/RFC-XXXX.ez"; site="thumper.bellcore.com"; access-type=AFS; Content-type: application/x-ez --42 Content-Type: message/external-body; name="BodyFormats.txt"; site="thumper.bellcore.com"; access-type=ANON-FTP; directory="pub"; Content-type: text/plain --42-- The client can choose between postscript, ez, and plain text, and retrieve the corresponding message body. The question then becomes: how do these systems interoperate? By making information available as multipart/X-HYPERTEXT MIME messages. The WWW client interfaced to the other systems by defining "addressing schemes" and implementing the various protocols and translating the data into HTML. Gopher has a similar typing scheme -- one character is reserved to indicate the access type and the data type. WAIS clients have yet another method of resolving types, though they only support one protocol. The NewsGrazer application has its own encapsulation mechanism. This is becoming a mess. In the short term, global hypertext viewers will have to support the access-type and content-type of each system with which it interoperates (so we have X-WAIS, X-HTTP, X-GOPHER, X-NNTP, as well as X-WAIS-SRC, X-HTML, X-GOPHER-1 thru X-GOPHER-9). Some of the access types will become standard, and some will die out. But all the data types should be encapsulated in MIME messages. Any data that has machine-readable pointers to other data should be made into a multipart/X-HYPERTEXT message. For example, a WAIS question should have attachments for each of the result documents (the content part can stay application/x-wais-question, or it could be converted to a text type, or both), at least in the case where those documents are available by some standard access method. [I wrote a perl script that will change an HTML document into a MIME message with attachments.] Leaf documents, i.e. documents with no external links, can stay in single part types. e.g. Plain text files become MIME messages by simply adding a blank line at the beginning (to separate the headers (none) from the body). Under this model, a mail message can point to a news article which references a WAIS document which contains several drawings and pointers to several more available by FTP, and a user could just point-and-click between them. The only need for protocols like gopher and HTTP is to encapsulate data that's not already MIME compliant. This is clearly a pipe dream, but it's the kind of thing we can work towards today. Dan