Re: The spec evolves...

Dan Connolly (connolly@pixel.convex.com)
Fri, 04 Dec 92 18:07:49 CST


Message-Id: <9212050007.AA23595@pixel.convex.com>
To: Edward Vielmetti <emv@msen.com>
Cc: www-talk@nxoc01.cern.ch
Subject: Re: The spec evolves... 
In-Reply-To: Your message of "Fri, 04 Dec 92 17:46:23 EST."
             <m0mxlnB-00009qC@garnet.msen.com> 
Date: Fri, 04 Dec 92 18:07:49 CST
From: Dan Connolly <connolly@pixel.convex.com>


>Is there an SGML reason (apart from a W3 reason) not to also recommend
>that we do a
>   <A HREF="ftp://wuarchive.wustl.edu:/graphics/gif/f/fishies"
>      CONTENTTYPE="image/gif"> 
>   This is a link to a picture of some fishies.</A>
>where the CONTENTTYPE matches the MIME/IANA registry of same?  This
>would be a simple enough way to stick in links to graphics.  

There's no SGML reason. The reason I didn't generalize to arbitrary
MIME entities is that the A tag has never had those semantics, and
it would be problematic to introduce them now.

Imagine what would happen if you fed that sample to the current linemode
browser: it would gladly ftp to wuarchive and barf gif data on
your screen.

This is not so much of a problem as long as the referent entity
is some subtype of text/* -- that's the reason for the two-level
hierarchy of mime types in the first place.

I'm trying to keep up with all sorts of HTML ideas.  Some things can be
added to html.dtd without significant changes to W3 code (like adding a
BLOCKQUOTE tag for a new paragraph style). But for things that will
require changes to the architecture, I'm developing a separate DTD from
the descriptive html.dtd.

First, I'm suggesting a change in terminology. The representation
of a node, which used to be called a document, and is sometimes
now called a resource (e.g. Universal Resource Locator), should
be called an Entity. This coincides with the SGML and MIME
usage of the term for "a unit of retreival."

Then the term "document" is not used for a unit of retrieval.
The WAIS protocol, for example, allows you to retrieve individual
"chunks" -- paragraphs, lines, etc. The term "entity" is well
suited to these chunks.

In stead, a "document" is a collection of entities that share
some context. This context is what the client uses to translate
relative URL's into absolute URLs. So the document that a node
belongs to consists of all the nodes you can reach from that node
by following only local links (i.e. a maximally-connected subgraph
of the web).

This allows the author to differentiate between links between
nodes of the corpus s/he's writing and links outside to
other works.

From my new DTD...

<!-- I think the A tag is overloaded. I'd like to deprecate
     it in favor of the XREF and SEE elements.
  --> 

<!ELEMENT XREF - - (#PCDATA)
 -- This element is for links within an HTML document. (a document
    is a collection of entities, or a web of nodes that share context).
 -->

<!ATTLIST XREF
        CONTEXT CDATA #IMPLIED -- defaults to the entity containing the XREF --
        -- SGML purists would make this attribute an ENTITY reference,
           and put the URL in the SYSTEM identifier in the prologue.
           For expediency, we put the URL right in the attribute.
        --
        ORIGIN CDATA #IMPLIED
        -- another URL, used as an identifier, rather than a locator.
           Ala the WAIS original-server,database,local-id triple.
        --
        REF IDREF #REQUIRED  -- ID of referent element --
        >

<!ELEMENT SEE - - (#PCDATA)
 -- This element is for links from an HTML document to any entity
    in the global web. The address and content-type of the entity
    are sufficient to resolve the reference.

    The other attributes could be specified in the text of the
    SEE content, but by making them attributes, the client software
    can process them, for example, to display a table of references
    sorted by date.
 -->
<!ATTLIST SEE
        ADDRESS CDATA #REQUIRED -- URL of referent entity --
        CONTENT-TYPE CDATA #REQUIRED -- MIME Content-Type for the entity --
        TARGET CDATA #IMPLIED
        -- This is the analogue of the #anchor mechanism.
           If CONTEXT is an SGML entity, this could be an ID,
           though it won't be validated.
           However, if CONTEXT is a text file, this could be a line number
	   to scroll to.
           The meaning depends on the content-type.
        --
        ORIGIN CDATA #IMPLIED
        -- another URL, used as an identifier, rather than a locator.
           Ala the WAIS original-server,database,local-id triple.
        --
        FROM CDATA #IMPLIED -- email address or name of author/provider --
        DATE NUMBER #IMPLIED -- in ISO format: YYYYMMDDHHMMSSZ --
        BYTES NUMBER #IMPLIED -- useful in many cases --
        MD5 CDATA #IMPLIED -- data signature --
        >

Comments are solicited...


Dan