- From: W. Eliot Kimber <eliot@isogen.com>
- Date: Thu, 02 Jan 1997 09:49:21 -0900
- To: w3c-sgml-wg@www10.w3.org
- Message-Id: <3.0.32.19970102094913.00b1829c@uu10.psi.com>
All, Attached to this note is my initial draft of a "house of sticks" (as opposed to straw man) XML linking proposal. It is drafted using the DTD used for the original XML spec. It's not complete enough for publication (and I never intended that it would just be picked up and used), but I tried to make it as formal and complete as I could in the time I had. I think it meets the requirements for which a concensus seems to have emerged from the discussion so far. I do expect that this proposal is larger than the minimum limits suggested by Tim: I would expect it to be trimmed down as necessary. I have not addressed issues specifying hyperlink behavior or presentation style. The basic points of the proposal are: 1. Three linking forms (or "templates", if you prefer): hlink, for general independent or "anchored" hyperlinks with arbitrary link end roles; alink, "anchored link", for links that are always one of their own link ends (and are thus always "anchored" in the sense that we have defined anchor here, provides default link end roles; and llink, "list link", which simply links to a list of objects. 2. Four addressing forms: name-space loc, treeloc, dataloc, and queryloc. URL is required, TEI extended pointers recommended (but not required). HyTime "refloc" facility included so that location address mechanisms can be associated with any attributes that are semantically references (this lets you explicitly associate query notations with referential attributes in a general way). Explict refloc to URLs is not required for any XML-defined referential attribute that is declared with a value prescription of "CDATA" (in other words, an XML processor assumes XML-defined referential attributes are URLs unless declared to be IDREF(S), ENTITY(IES), or refloced to something else). I have probably provided more addressing functions that absolutely necessary--I expect them to be trimmed down. Cheers, E.
<!DOCTYPE Body SYSTEM 'spec.dtd' [ <!ENTITY lt CDATA "<" > <!ENTITY gt CDATA ">" > ]> <body> <div1 id="archs" type="section"><head>Architectural Definition</head> <p> An <term>architecture</term> is a general set of rules to which an application must conform. For XML, architectures are used to define "meta document types" to which individual documents conform. These meta document types define the XML-specific rules and semantics for things like hyperlinks without completely constraining how specific documents are constructed. In particular, individual documents can use whatever element types and content models they want as long as they meet the minimal requirements of the XML architecture. </p> <p> Architectures are formally defined using the same element and attribute declaration syntax used for document types. These declarations form the meta-document type for the architecture. An element type declared in a meta-DTD is called an <term>architectural form</term>, meaning that it defines the architectural form of an element type, from which element types in individual documents. (You can think of the term "form" in the sense of creating things with molds, where the form is a template or pattern from which new objects are made.) </p> <p> Documents are said to be <term>derived</term> from architectures. The derivation of a document from a particular architecture is indicated by the use of a special processing instruction, <code><?ArcBase ?></code>, which lists the architectures from which a document is derived, e.g.: <code> <!DOCTYPE MyDoc > <?ArcBase XML-Link ?> </code> </p> <p> You can think of the element and attribute declarations in the meta-DTD as templates for the elements in you own document. You can create new element types from meta element types by literally copying the declarations from the meta-DTD and then modifying them as needed to reflect your own documents. </p> <div2 type="section"><head>Deriving Element Types from Architectural Types</head> <p> When you derive element types from architectural types, you can make any of the following changes: <list type="bullets"> <item><p>Change the element type name to be different from the architectural type. </p></item> <item><p>Declare multiple element types derived from the same architectural type. </p></item> <item><p>Make content models that are more restrictive than the architectural content model. </p></item> <item><p>Omit elements derived from architectural types that are optional. </p></item> <item><p>Omit any attributes for which there is a default value in the architectural DTD. This means you must only include those attributes that are either required or have constant values in the meta-DTD. </p></item> <item><p>Change the default values for any attributes. </p></item> <item><p>Restrict the keywords allowed for keyword attributes. </p></item> <item><p>Add additional attributes not defined in the meta-DTD. </p></item> </list> </p> <p>You may not do any of the following: <list type="bullets"> <item><p>Omit required attributes or attributes. </p></item> <item><p>Omit required elements. </p></item> <item><p>Define content models that are less restrictive than the meta content model if doing so would allow instances that do not conform to the architectural restrictions. <!-- NOTE: this restriction is added to avoid having to talk about the distinction between validation of instances and validation of DTDs. Strictly speaking, you can define any content model you want - it's up to authors to avoid breaking the architectural rules. --> </p></item> </list> </p> </div2> </div1> <div1 type="section" id="linking-and-addressing"><head>XML Linking and Addressing</head> <p> <term>Linking</term> is the process of connecting two or more objects together for the purpose of relating them together. <term>Addressing</term> is the process of specifying the locations of objects. Linking uses addressing as you must address objects in order to link them together. However, the relationships represented by links are insensitive to the methods by which the objects linked are addressed. Thus, linking and addressing, while related, are separate tasks. </p> <p> Keeping linking and addressing distinct is important for at least two reasons: <list type="bullets"> <item><p>It allows the addressing method to be changed without affecting the link </p></item> <item><p>It allows addressing to be used in situations that are not considered linking. </p></item> </list> </p> <div2 type="section" id="linking"><head>XML Linking</head> <p> While all possible relationships that might exist among objects can be considered "links," XML limits the term "link" to those relationships that are not inherent in the structure of SGML markup (hierarchy and property-of relationships). In addition, XML assumes that most hyperlinks are created to enable navigation of information at reader preference, rather than defining a fixed presentation order of the information (although it is possible to define static presentations of XML documents that depend, in part, on hyperlinks, e.g. "transclusion" relationships or presentation styles). </p> <p> Links connect <term>link end</term>s. For example, in the typical HTML "A" link, the A element is one end of the link and the page or named A element addressed by the HREF attribute is the other link end. Link ends may, potentially, consist of multiple objects. For example, a link that relates a glossary entry to all of the mentions of the term defined would be a two-end link where one end is the glossary entry and the other end is a multi-object link end consisting of all the mentions. </p> <p> XML reserves the term <term>anchor</term> for elements or other objects that are primarily intended or otherwise enabled for addressing as link ends, usually by putting a unique identifier on the anchor element. Anchors may be elements that have no purpose other that to serve as an anchor or may be elements with other purposes that also serve as anchors (such as a paragraph that also has an ID). </p> <p> Each end of a link has an associated "link end role", which describes the role the link end plays in the relationship represented by the hyperlink, much as an element type describes the role the element plays in the document type. Link end roles may be more or less general depending on the generality of the link type. Link end role names must be unique within a given link type. </p> <p> Movement from one end of a link to another end of the same link is called <term>traversal</term>. Link ends accessed by traversal are said to be <term>traversed to</term>. Traversal is typically enabled by making a link end selectable (say by clicking). However, traversal need not be interactive but could be done automatically, for example to provide a "transclusion" behavior for a particular link type. If initial traversal is allowed from a particular link end it is called an <term>initiation</term> link end, meaning that traversal may be initiated from it. For example, for the HTML A link type, the A element is an initiation link end, but the other link end is not. Whether or not an end of a link is an initiation link end may be defined either through the link's built-in <term>traversal rules</term> or by a processing application or style sheet. </p> <p> In XML, hyperlinks are first-class objects represented by XML elements. The element type of the linking element is the "link type" of the link. Link types may be more or less general as needed. </p> <p> A link may use itself as one of its own link ends. The HTML A element is an example of such a link, where the link is between the A element itself and whatever the HREF attribute addresses. Alternatively, a link need not name itself as an link end, in which case the relationship is among the link ends addressed, but <emph>not</emph> between the link element and its link ends. In other words, in the first case, the link element itself is an initiation link end but in the second case the link is not a link end and therefore would not normally be selectable for traversal. </p> <p> Links that are not one of their own link ends are <term>independent links</term>, meaning that they are independent of any of their link ends. Links that are one of their own link ends are <term>anchored links</term>, meaning that the link itself is anchored in the document in which it occurs (specifically, the link acts as an anchor in the sense of an element intended to be used as a link end). </p> <p> Note that for independent link the link element, as a normal XML element, would still be potentially accessible as part of the document in which it occurs, irrespective of any links it defines. In addition, XML processors are free to provide services for examining independent links they may want to provide, including allowing movement from a link to its link ends (although such movement would not be considered link traversal in the strict sense). </p> <p> Not all links need to represent a formal semantic relationship, sometimes it is enough to simply connect things together in a list to enable traversal among the members of the list. For this purpose, XML defines a special form of anchored link, the <term>list link</term>. A list link simply relates the link element (the "list") to one or more objects to be listed together (the "members"). Traversal among the members is enabled by specifying the appropriate <term>list traversal rules</term>. The members of the list are treated as a linked list of objects. The order of the members of list is determined by the order in which they are addressed. </p> <p> Traversal is either between the list of all the members and any member or from one member to either of its adjacent members. The list traversal rules may allow "wrapping" from one end of the list to the other. List traversal can be disallowed entirely or be forward, backward, or bi-directional. When list traversal is disallowed, traversal is always from the aggregate to a member or from a member to the list (in other words, when selecting a member as an initiation link end, you are always presented with the list of the other members). </p> <p> An end of any link, not just list links, may be a list, in which case list traversal rules can be defined as for list links. </p> <div3 type="section" id="hlink"><head>Hyperlink (hlink) Element Form</head> <p> The hlink element form is the most general form of hyperlink and is the model from which anchored links and list links are derived. The attributes of the hlink form are: <list type="gloss"> <label>-xml-roles</label> <item><p> Names the link end roles for the link type. Each link end role name must be unique within the link type. Each link end role name can be followed by one of the keywords <code>#SELF</code> or <code>#LIST</code>. The keyword <code>#SELF</code> indicates that the link end role is satisfied by the link element itself. The keyword <code>#LIST</code> indicates that the link end may be a list of objects. </p> <p> The -xml-roles attribute is required and must be the same for all instances of a link type. Each link end role name is used as the name of an attribute that will specify the address of the members of the link end. </p> </item> <label>-xml-init</label> <item> <p>Specifies which link end or link ends are traversal initiation link ends. One of the keywords <code>FIRST</code>, <code>LAST</code>, or <code>ALL</code>. The default is <code>ALL</code>. </p> <p>The -xml-init attribute is optional. </p> </item> <label>listtrav</label> <item> <p> For link end roles that allow lists, specifies the list traversal rules. The attribute value is either one keyword, which applies to all #LIST link ends, or one for each #LIST link end, in the order specified on the link end-roles attribute. </p> <p>The list traversal keywords are: <list type="gloss"> <label>N</label> <item><p>No list traversal. Movement always between list of members and a single member.</p> </item> <label>L</label> <item><p>Movement is from a member to the preceding member in the list</p> </item> <label>R</label> <item><p>Movement is from a member to the next member in the list</p> </item> <label>A</label> <item><p>Movement is from a member to either adjacent member in the list</p> </item> <label>LW</label> <item><p>Movement is from a member to the preceding member in the list, wrapping from first to last at the beginning of the list.</p> </item> <label>RW</label> <item><p>Movement is from a member to the following member in the list, wrapping from last to first at the end of the list.</p> </item> <label>AW</label> <item><p>Movement is from a member to either adjacent member in the list, wrapping from first to last or last to first.</p> </item> </list> </p> </item> </list> </p> <p> For each link end role named in the -xml-roles attribute, there must be an attribute declared with the same name (or a name mapped to an link end role name using the general -xml-names renaming attribute). The attributes so declared may be declared as IDREF, IDREFS, ENTITY, ENTITIES, or as CDATA attributes whose value must be a URL. Other addressing methods can also be used by using a refloc attribute to associate an attribute with the addressing method (see <ref target="refloc">Refloc Attribute Form</ref>). </p> <p>The XML meta-DTD declaration for the hlink form is: <code> <![CDATA[ <!element hlink - O (%ArcCFC;)* -- Semantic of link content defined by application -- > <!attlist hlink link end-roles CDATA #REQUIRED -- constant -- -xml-initiation-link end (first|last|all) all listtrav NAMES #IMPLIED -- attributes for link end role names go here -- > ]]> </code> </p> </div3> <div3 type="section" id="alink"><head>Anchored link (alink) Element Form</head> <p> The alink (anchored link) element form is a special case of hyperlink where the link element is always a self link end and the link end roles are defaulted by XML. It is useful for representing cross-reference and simple link elements in typical document types. </p> <p> The alink form has the same attributes as hlink except that the link end-roles attribute has the architectural default value "refmark #SELF refsub #LIST", where "refmark" is the "reference mark" (i.e., the cross-reference element) and "refsub" is the "reference subject", the thing or things being referred to. </p> <p> The default -xml-init value for clink is "FIRST". </p> <p> The default listtrav value for clink is "AW", adjacent with wrap. </p> <p>The architectural meta-declaration for alink-form elements is: <code> <![CDATA[ <!element alink - O (%ArcCFC;)* -- Semantic of link content defined by application -- > <!attlist alink -xml-role CDATA "refmark #SELF refsub #LIST" refsub IDREF #REQUIRED -- May also be declared as IDREFS, ENTITY, ENTITIES, or CDATA. If CDATA, value must be either a URL or a value conforming to the addressing method specified with the common refloc attribute. -- -xml-init (first|last|all) first listtrav NAMES "AW" > ]]> </code> </p> </div3> <div3 type="section" id="llink"><head>List Link (llink) Element Form</head> <p> The list link element form represents simple aggregation of objects purely for the purpose of enabling list traversal among the members of the list. It can be combined with other link elements to form more complex selection or navigation structures. </p> <p> The list link form has the same attributes as hlink except that the link end-roles attribute cannot be specified and is taken to have the fixed value "list #SELF members #LIST", where "list" is the link end representing the entire list of members of the members link end. </p> <p> The default -xml-init value for clink is "FIRST". If the value is set to "LAST" then the list itself cannot be selected for traversal (but may be traversed to via list traversal). </p> <p> The default listtrav value for agglink is "AW", adjacent with wrap. </p> <p>The architectural meta-declaration for llink is: <code> <![CDATA[ <!element llink - O (%ArcCFC;)* -- Semantic of link content defined by application -- > <!attlist llink members IDREFS #REQUIRED -- May also be declared as IDREFS, ENTITY, ENTITIES, or CDATA. If CDATA, value must be either a URL or a value conforming to the addressing method specified with the common refloc attribute. -- -xml-init (first|last|all) first listtrav NAMES "AW" > ]]> </code> </p> </div3> </div2> <div2 type="section" id="addressing"><head>Addressing</head> <p> Addressing is the process of specifying the location in some address space of an object or objects. Objects in XML may be addressed by name (element ID or entity name), by position in a tree or list (node addressing), or by a query of their defined properties (query addressing). Objects may be addressed directly (say by normal ID reference), or indirectly. Indirect addresses make addresses more flexible and easier to maintain. </p> <div3 type="section" id="loc-addrs"><head>Location Address Element Forms</head> <p> Any of the addressing methods defined by XML may be used as direct addresses or as indirect addresses. The use of addressing methods other than ID references or URLs requires the use of the refloc attribute to associate a specific addressing method with an attribute (see <ref target="refloc">Refloc Attribute Form</ref>). </p> <note> The indirect addressing elements are "resources" in the sense that they only have meaning when used by reference; they have no meaningful existence in isolation. In particular, addressing elements are <emph>not</emph> hyperlinks and are never used for traversal or navigation. </note> <note> Each of the indirect addressing forms has an equivalent SDQL specification, meaning that anything that can be done with indirect addressing elements can be done with an equivalent SDQL query. </note> <p> Every location address element has a <term>location source</term>, which is the object or objects against which the address is applied. In most cases the location source can be omitted and is implied to be the document element of the document that contains the location address element. However, the implied location source can also be specified as being the non-location address element that refers to the location address element (the <term>referrer</term>). </p> <p> The location source is specified using the the <code>location-source</code> attribute. The location-source attribute addresses the object or objects that are the location source. If the location-source attribute is omitted, the location source is implied according to the rule indicated by the <code>implied-location-source</code> attribute. </p> <p> The implied-location-source attribute takes one of the values <code>REFERRER</code> or <code>DOCELEM</code>. The value REFERRER indicates that the implied location source is the referrer element. The value DOCELEM indicates that the location source is the <term>document element</term>, the document's root element. </p> <p> The default value for implied-location-source is "DOCELEM". </p> <p> The location source for a location address element may be another location address. The first location address thus selects objects from the list of objects addressed by the second location address. Using one location address as the location source for another location address creates a <term>location ladder</term>. </p> <p>The meta-declaration for location source attributes is: <code> <![CDATA[ <!attlist (nmsploc|treeloc|dataloc|queryloc) locsrc IDREF #IMPLIED implied-location-source (referrer|docelem) docelem > ]]> </code> </p> </div3> <div3 type="section" id="name-addressing"><head>Name Addressing</head> <p> Objects can be addressed by name directly or indirectly. Direct name addressing is done with attributes declared as IDREF, IDREFS, ENTITY, or ENTITIES. </p> <p> Objects may be addressed indirectly by name using the name-space location address element form (nmsploc). The content of the nmsploc element is a blank-delimited list of zero or more names (names that contain blanks must be enclosed in LIT or LITA delimiters). </p> <p> The name space to which the names apply is indicated by the <code>namespace</code> attribute, one of the keywords <code>ELEMENTS</code> or <code>ENTITIES</code>. The default name space is ELEMENTS. When the name space is "elements", the names are interpreted as element unique identifiers in the document that is the location source for the name-space location (by default, the document it occurs in). When the name space is "entities", the names are interpreted as the names of data entities declared in the document that is the location source for the name-space location. </p> <p> If a location source is specified for nmsploc, it must be either a document element or an XML document entity. </p> <p>The meta-declaration for the nmsploc element form is: <code> <![CDATA[ <!element nmsploc - O (#PCDATA) -- Content is a blank-delimited list of names or literals -- > <!attlist nmsploc namespace (elements|entities) elements -- location source attributes -- > ]]> </code> </p> </div3> <div3 type="section" id="query-addressing"><head>Query Addressing</head> <p> Objects can be addressed by queries. Any query notation may be used with query location addresses. However, XML requires support for URLs and recommends support for TEI extended pointers. Reference attributes that use URLs are declared as CDATA attributes and do not need to be indicated as referential attributes using the common refloc attribute (in other words, XML processors assume that any referential attribute that is not an ID or entity reference and that is not otherwise mapped with refloc is a URL). </p> <p> Objects may be addressed indirectly by query using the query location address element form (queryloc). The content of the queryloc element is a query conforming to the query notation specified by the notation attribute. The query notation may or may not allow subelements depending on the definition of the notation. In any case, the entire content of the queryloc element is provided to the query notation processor for interpretation following parsing of the document (in other words, the content of the queryloc must be valid XML regardless of any additional semantics or constraints the notation specification may impose on the data). Query specifications that cannot be parsed as valid XML may be contained in <![CDATA[ blocks. </p> <p>A query location address must indicate the notation to which the query conforms using the <code>notation</code> attribute. XML pre-defines the notation "URL", representing HTTP URL addresses. Other values must be notations declared in the document. </p> <note> When the TEI notation (or its equivalent) is supported, query location addresses may be used in place of combinations of other location address forms. </note> <p>The meta-declaration for the queryloc element form is: <code> <![CDATA[ <!element queryloc - O (#PCDATA) -- Content is a query conforming to the governing notation -- > <!attlist queryloc notation NAME "URL" -- Must be declared as a NOTATION attribute if a notation other than URL is used. -- -- location source attributes -- > ]]> </code> </p> </div3> <div3 type="section" id="tree-addressing"><head>Tree Addressing</head> <p> Objects can be addressed by position in a tree. Objects may be addressed directly by tree position by using the common refloc attribute to map the referential attribute to the addressing method "treeloc". </p> <p> Objects may be addressed indirectly by tree position using the tree location address element form (treeloc). The content of the treeloc element is a list of integers, one for each level of the tree, starting with the tree root (the location source of the tree location address), and specifying the child numbers of the ancestor of the object addressed. For example, the third child of the fourth child of the root of the tree would be addressed as "1 4 3". The child number of the tree root is normally "1" (but may be other than one if the location source for the treeloc is a list of nodes). </p> <p> The objects addressable by a tree location address are determined by the tree rooted at the location source. When the location source is an XML element, the objects addressed are either elements or <term>pseudo-elements</term>. Psuedo-elements are contiguous sequences of character data in the content of elements, spanning from a tag close to a tag open (start tag close to end tag open, start tag close to start tag open, end tag close to start tag open, or end tag close to end tag open). The children of pseudo-elements are the individual characters making up the pseudo-elements. </p> <p> When the location source of a tree location consists of multiple objects, the tree location is applied to each object in turn. The result is a list of nodes, one from each tree. </p> <p>The meta-declaration for the treeloc element form is: <code> <![CDATA[ <!element treeloc - O (#PCDATA) -- Content is the tree position of the node addressed -- > <!attlist treeloc -- location source attributes -- > ]]> </code> </p> </div3> <div3 type="section" id="data-addressing"><head>Data Addressing</head> <p> The character data content of elements is normally only addressable as complete pseudo-elements or as individual characters. However, it is often useful to address character data as a list of tokens, derived through some tokenization processes. For example, the content of an element might be tokenized into blank-delimited words, a list of numbers, or some other token type. Addressing data in this way requires first tokenizing it and then addressing the resulting list of tokens. The data location address (dataloc) element form combines both steps into a single element. </p> <p> The data to be addressed is the location source of the dataloc element. The tokenization process is indicated by the value of the <code>filter</code> attribute. XML defines the following filter types: <list type="gloss"> <label>STR</label> <item> <p>Unnormalized string. The "tokens" are the individual data characters of the location source. </p> </item> <label>NORM</label> <item> <p>Normalized text. Tokens are contiguous strings separated by white space. Norm is the default filter. </p> </item> <label>WORD</label> <item> <p>Tokens that consist only of name characters. </p> </item> <label>NAME</label> <item> <p>Tokens that are SGML names (start with a name start character and consist only of name characters). </p> </item> <label>SINT</label> <item> <p>Signed integer. </p> </item> <label>DATE</label> <item> <p>UTC date. </p> </item> <label>TIME</label> <item> <p>UTC time. </p> </item> <label>UTC</label> <item> <p>UTC data and time pairs. </p> </item> <label>LINE</label> <item> <p>Each token is a "line" of data as determined by the record end or line break rules for the location source. </p> </item> </list> </p> <p>The meta-declaration for the dataloc element form is: <code> <![CDATA[ <!element dataloc - O (#PCDATA) -- Content is the tree position of the node addressed -- > <!attlist dataloc filter (str|norm|word|name|sint|date|time|utc|line) norm -- location source attributes -- > ]]> </code> </p> </div3> <div3 type="section" id="refloc"><head>Reference Location Attribute Form</head> <p> For referential attributes that are not ID references, entity references, or URLs, an XML processor must be told what form of address is being used for a particular attribute. This is done with the common reference location address (refloc) attribute form. The refloc attribute may be used for any element type that has attributes that are, semantically, referential. </p> <p> The value of the refloc attribute is a list of name/value pairs where the name is the name of an attribute declared for the element type and the value is one of the location type keywords defined below. If the location type is a query, the value is followed by the name of the notation to which the query conforms. </p> <p>The location type keywords are: <list type="gloss"> <label>IDLOC</label> <item> <p>The value of the attribute is one or more ID references, resulting in a single list of elements. Equivalent to referring to a nmsploc with a namespace value of "elements" where the nmsploc contains the IDs in the attribute value. </p> </item> <label>ENTLOC</label> <item> <p>The value of the attribute is one or more entity names, resulting in a single list of entities. Equivalent to referring to a nmsploc with a namespace value of "entities" where the nmsploc contains the entity names in the attribute value. </p> </item> <label>TREELOC</label> <item> <p>The value of the attribute is a tree location specification. Equivalent to referring to a treeloc whose implied location source is "referrer". </p> </item> <label>QUERYLOC notation-name</label> <item> <p>The value of the attribute is a query specification. The keyword QUERYLOC must be followed by either the name "URL" or the name of a query notation declared in the document. Equivalent to referring to a queryloc. </p> </item> </list> </p> <note> Using refloc with "query URL" is necessary only when the attribute named is not already defined as being referential by the XML-defined semantics of an XML-defined element form. </note> </div3> </div2> </div1> </body>
-- W. Eliot Kimber (eliot@isogen.com) Senior SGML Consulting Engineer, Highland Consulting 2200 North Lamar Street, Suite 230, Dallas, Texas 75202 +1-214-953-0004 +1-214-953-3152 fax http://www.isogen.com (work) http://www.drmacro.com (home) "Rats in the morning, rats in the afternoon...if they don't go away, I'll be re-educated soon..." --Austin Lounge Lizards, "1984 Blues"
Received on Thursday, 2 January 1997 11:51:18 UTC