- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Fri, 24 Jun 2005 17:28:20 +0100
- To: www-tag <www-tag@w3.org>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --=-=-= I've written up some preliminary thoughts about this, in rather more detail, but still very much a work-in-progress, something I would have blogged except I don't have a blog. Please bear this in mind when responding -- there's very little here, particularly in the more speculative sections towards the end, which I'm firmly convinced of. So feedback is very much in order. ht - --=-=-= Content-Type: text/html; charset=iso-8859-1 Content-Disposition: attachment; filename=names.html Content-Transfer-Encoding: quoted-printable Content-Description: Names, namespaces and languages <?xml version=3D"1.0" encoding=3D"utf-8"?><html xmlns=3D"http://www.w3.org/= 1999/xhtml"><head><meta HTTP-EQUIV=3D"Content-type" CONTENT=3D"text/html; c= harset=3DUTF-8"/><title>Names, Namespaces and Languages</title><style type= =3D"text/css"> PRE.code {font-family: monospace} PRE {MARGIN-LEFT: 0em} OL OL {list-style-type: lower-alpha} </style></head><body STYLE=3D"font-family: times"> <div xmlns=3D"" style=3D"text-align: center"> <h1>Names, Namespaces and Languages</h1> <div>Henry S. Thompson</div> <div>24 June 2005</div> </div> =20 =20=20 <h2 xmlns=3D"">1.=20 =C2=A0 <a name=3D"intro">Introduction</a></h2> <p xmlns=3D"">This is very much a work-in-progress, something I would ha= ve blogged except I don't have a blog. Please bear this in mind when responding -- there's very little here, particularly in the more speculative sections tow= ards the end, which I'm firmly convinced of. So feedback is very much in order.= </p> =20=20 =20=20 <h2 xmlns=3D"">2.=20 =C2=A0 <a name=3D"background">Background</a></h2> <p xmlns=3D"">TAG issues <a href=3D"http://www.w3.org/2001/tag/issues.ht= ml?type=3D1#namespaceDocument-8">namespaceDocument-8</a> and <a href=3D"htt= p://www.w3.org/2001/tag/issues.html?type=3D1#abstractComponentRefs-37">abst= ractComponentRefs-37</a> were the topic of=20 <a href=3D"http://www.w3.org/2001/tag/2005/06/14-16-minutes.html">extended = discussion</a> at the last TAG f2f. There is considerable overlap between these two issu= es, and both are related to <a href=3D"http://lists.w3.org/Archives/Public/www-xml-schema-comments/2005= JanMar/0080.html">Dan Connolly's comment</a> on the recently published Last= Call Working Draft of <a href=3D"http://www.w3.org/TR/2005/WD-xmlschema-ref-20050329/">XML Schema: Component Designators</a>. Although a number of prior misunderstandings were identified and overcome in the discussion, more work is needed to make the background assumptions about what the problems are we= 're trying to solve and what the space of possible solutions is. This note is = an attempt to begin that work.</p> =20=20 =20=20 <h2 xmlns=3D"">3.=20 =C2=A0 <a name=3D"namespaces">XML Namespaces: An evolving understanding</a></h2> <p xmlns=3D"">The <a href=3D"http://lists.w3.org/Archives/Public/www-tag= /2005Feb/0017.html">recent discussion</a> about whether the <a href=3D"http= ://www.w3.org/TR/2005/CR-xml-id-20050208/">xml:id</a> spec. 'changes' the X= ML namespace by 'adding' a new name to it helped clarify that the minimalis= t reading of the <a href=3D"http://www.w3.org/TR/xml-names11/">XML Namespac= es</a> REC has achieved dominance in the intellectual marketplace. By "the= minimalist reading" I mean I mean the reading on which an XML namespace is= primarily a syntactic mechanism for distinguishing one class of uses of a = particular simple name from all other uses thereof. This means a namespace= is <i>not</i> a finite set of names, nor a more complex structured object = as suggested by the (in)famous now-deleted non-normative <a href=3D"http://= www.w3.org/TR/REC-xml-names/#Philosophy">Appendix A: The Internal Structure= of XML Namespaces</a> of version 1.0.</p> <p xmlns=3D"">The minimalist reading is the only one consistent with act= ual usage -- people mint new namespaces by simply <i>using</i> them in an expanded name or namespace declaration, without thereby incurring any obligation to define the boundaries of some set. You could say that a namespace springs into li= fe the first time anyone uses a URI as a namespace name, but on balance I pref= er an understanding which doesn't reify a namespace as such at all. I don't object to using phrases such as "[some name] in the [some URI] namespace", = but that's just another was of saying "the expanded name <code>< some_URI, some_name ></code>".</p> <p xmlns=3D"">On this account it makes sense to ask questions about name= space names, e.g. "What namespace name will XSLT 2.0 use?" and about expanded names, e.g. "Does XSLT 2.0 change the definition of the element named <code>< http://www.w3.org/Style/1998/Transform, output ></code>?", but questions about namespaces as such are rarely if ever useful (unless of cou= rse they're understood as questions about namespace <i>names</i> or about some otherwise-defined set of expanded names with a namespace name in commo= n).</p> =20=20 =20=20 <h2 xmlns=3D"">4.=20 =C2=A0 <a name=3D"languages">From namespaces to languages</a></h2>=20=20=20 <p xmlns=3D"">Taking the argument one step further, it is a necessary co= nsequence of the position outlined above that it is incoherent to understand e.g. "Such-and-such a type is defined in the XML Schema namespace" to mean that = the XML Schema namespace contains types (or type definitions). Considering thi= ngs carefully, we must understand this sentence as meaning that the XML Schema language assigns the expanded name <code>< http://www.w3.org/2001/XMLSch= ema, such-and-such ></code> to some type definition. This perspective actual= ly works well with our overall understanding of XML Schema: a schema document for a particular target namespace corresponds to a schema which assigns ele= ment declarations, type definitions, etc. to expanded names all of whose namespace name is that target namespace.</p> <p xmlns=3D"">So it's <i>languages</i> (or as we used to say, <i>applications</i>, in the SGML sense) which assign expanded names <i>to</i> things. That assignment may be unique and unequivocal, but evidently it is often one-to-many. And of course it's the language which determines what there is to be named, its own little (or large) ontology.</= p> <p xmlns=3D"">Many languages of course <i>do</i> provide only one thing = to be named using a particular namespace name (e.g. <a href=3D"http://www.w3.org/= TR/xpath-functions/">XQuery Functions and Operators</a>), and others, altho= ugh naming more than one sort of thing, constrain their use of names to be = unambiguous (e.g. <a href=3D"http://www.w3.org/TR/SVG/">SVG</a>, <a href=3D= "http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/">RDF</a>). In = both these cases, just an expanded name is sufficient to identify something= , and constructing a URI for something is therefore straightforward.</p> <p xmlns=3D"">On the other hand there are many examples of languages whe= re the mapping is one-to-many. The most immediate example is XML itself. The low-level syntax of XML distinguishs two sorts of things which are identified by expanded name: elements and attribu= tes. Since there is no prohibition on using the same expanded name for both an element and an attribute, an expanded name is not sufficient to uniquely id= entify a named aspect of an XML document (or document type, in the ordinary language sense= ) -- you need to know what I've been calling the <i>sort</i> as well, i.e. <b>element</b> or <b>attribute</b>. For example, all of the following names:</p> <ul xmlns=3D""> <li><code>abbr</code></li> <li><code>cite</code></li> <li><code>code</code></li> <li><code>dir</code></li> <li><code>label</code></li> <li><code>link</code></li> <li><code>object</code></li> <li><code>span</code></li> <li><code>style</code></li> <li><code>title</code></li> </ul> <p xmlns=3D"">can be used for either elements or attributes in XHTML 1.0= (transitional) documents, and at least three of these (<code>abbr</code>, <code>cite</code> and <code>title</code>) survive as ambiguous in XHTML Basic 1.0.</p> <p xmlns=3D"">When we expand our scope to XML validation, we suddenly ge= t a <i>much</i> more complex situation, in which there are in principle an unbounded number of things which share a name, only disambiguateable by context: we have element declarations (max. one per expanded name), and attribute declarations (max. as many as there are element declarations). For example, there are four distinct attributes definitions called <b>align</b> and five distinct attribute definitions called <b>type</b> in the <a href=3D"http://www.w3.org/TR/xhtml1/DTD/xhtml1= - -transitional.dtd">XHTML transitional DTD</a>. W3C XML Schema not only has= a richer set of what it calls "symbol spaces", so that there are seven thi= ngs whose definitions can be named (it adds types, attribute and element gr= oups, notations and identity-constraints along side elements and attributes= ), it also allows elements as well as attributes to be defined in context.<= /p> <p xmlns=3D"">Finally we should note that a language may encompass quite= a range of variation in terms of the things it assigns a particular expanded = name to. There can be variation over time, as new versions of a language are re= leased, and even alternative variants released at the same time. The HTML <code>P</code> element has a long and complex history, and even the XHTML <code>p</code> element has three distinct variants in version 1.0 (strict, transitional and basic), none of which is exactly the same as the one in ve= rsion 1.1.</p> <p xmlns=3D"">None of this should come as a surprise. Ordinary language= uses names in ways which are both ambiguous and context-determined, and whose use changes over time. But its consequence for the Web are more serious, particularly = as we consider the use of names for things on the Web intended for automatic processing, where appeal to context for disambiguation may not be straighforward at all. At the very least it is clear that it is no longer trivial to specify an approach to constructing URIs for things which will cover all the cases just discussed.= </p> =20=20 =20=20 <h2 xmlns=3D"">5.=20 =C2=A0 <a name=3D"abstractions">What abstractions to choose</a></h2> <p xmlns=3D"">Broadly speaking there are three ways one could respond to= the situation outlined above:</p> <ol xmlns=3D""> <li>Only expect to have a systematic approach to naming things with URI= s when the language or application involved has a single flat story about naming (e.g.= <a href=3D"http://www.w3.org/TR/SVG/">SVG</a>, <a href=3D"http://www.w3.or= g/TR/2004/REC-rdf-syntax-grammar-20040210/">RDF</a>).=20 Abstract over variations. We might call this the <a name=3D"simple"><b>sim= ple</b></a> (or <b>simplistic</b>) view.</li> <li>Demand a systematic approach in all cases, and over all variations, but acknowledge that this means that in complex cases (e.g. WSDL, XML Schem= a) the resulting URIs will themselves be complex, requiring new media types an= d/or using new XPointer schemes. We might call this the <a name=3D"rich"><b>rich</b></a> (or <b>overkill</b>) view, exemplified by <a href=3D"http://www.w3.org/TR/2005/= WD-xmlschema-ref-20050329/">XML Schema: Component Designators</a>.</li> <li>Look for a middle ground, which adopts the <a href=3D"simple">simpl= e</a> view wherever possible, otherwise an approximation to it which abstracts over all variation and as much application-specific detail as possible, with the option to fall back to the <a href=3D"rich">rich</a> view as and when this is necessary. We might call this the <a name=3D"middle"><b>middle</b>= </a> (or <b>80/20</b>) view.</li> </ol> <p xmlns=3D"">It's important to note that there's an unspoken common ass= umption to all three of the above views: We're going to construct the URI for some named = thing by adding some variety of fragment identifier to the namespace name of its expanded n= ame. There is no space here for the possibility that two distinct languages might use the <i>same</i> expanded name for two evidently distinct things.=20 This is intimately bound up with another assumption with respect to variati= on, namely that it's possibly to tell reliably when a change in something count= s as a variation, as opposed to a fundamental change of identity. If I change t= he named definition of a type by nudging its min or max a bit, that pretty clearly just produces a variant of the same type. But if I change the definition assigned to a name from being an integer to being a date, it's equally pretty clear that that's no longer the same type at all. Those are= the easy cases, there will be many which are much harder to call.</p> <p xmlns=3D"">I expect that both of these assumptions will want to be recast as Good Practice notes goi= ng forward (e.g. "Don't use the same expanded name for two different things of= the same sort in different languages under your control"; "As a language evolve= s, use new expanded names for new things, don't recycle old ones").</p> =20=20 =20=20 <h2 xmlns=3D"">6.=20 =C2=A0 <a name=3D"details">More details on the <a href=3D"middle">middle</a> gr= ound</a></h2> <p xmlns=3D"">Without more detailed examination of real usage scenarios,= it's hard to be sure of what general principles to establish here, but on the basis of my limited experience to date it seems likely that something along the followi= ng lines is a reasonable starting point.</p> <p xmlns=3D"">It's up to the owner of a language, for each of the namesp= aces involved in that language, to provide a constructive definition of the way in which things which have expanded names can also be named with URIs. I've identif= ied the following guidelines for such definitions:</p> <ul xmlns=3D""> <li>Use the namespace URI as the basis of the constructed name;</li> <li>Where part of the complexity of a language's name structure comes From=20giving expanded names to more than one sort of thing, include the so= rt in the URI;</li> <li>Where evolution over time and or simultaneous language variants are= a possibility, be clear that simple URIs are <i>not</i> capable of capturing this;</li> <li>Try to provide retrievable representations so that the namespace URI(s) you construct a) have a widely used media type and b) yield a useful result when the fragment identifier is resolved.</li> </ul> =20=20 =20=20 <h2 xmlns=3D"">7.=20 =C2=A0 <a name=3D"example">The W3C XML Schema example</a></h2> <p xmlns=3D"">The <a href=3D"http://www.w3.org/2001/tag/2005/06/14-16-mi= nutes.html#item031">position</a> that emerged at the end of the recent TAG = f2f is consistent with the above guidelines, but obviously lacking in detai= l. On balance my prefered approach would look something like this:</p> <blockquote xmlns=3D""><div>URI names are provided for everything define= d or declared by name at the top level which have some conceptual identity independent of the det= ails of W3C XML Schema, i.e. elements, attributes and simple and complex types.<= /div></blockquote> <blockquote xmlns=3D""><div>The URI name for something of one of the abo= ve four sorts is constructed by concatenating the namespace name of its expanded name, a <code>/</code> if that does not already end with one, its sort (i.e. <b>attribute</b>, <b>complexType</b>, <b>element</b> or <b>simpleType</b>) a <code>/#</code> and the local name of its expanded name.</div></blockquote> <blockquote xmlns=3D""><div>URI names for languages which don't use name= spaces are based on a URI designated for the purpose in the language specification, e.= g. <a href=3D"http://www.w3.org/2002/xmlspec/">http://www.w3.org/2002/xmlspec/= </a> for the W3C's 'specprod' language.</div></blockquote> <p xmlns=3D"">It would be the responsibility of language owners to provi= de retrievable representations of resources at each sort-determined sub-URI of the namespa= ce URI to make this work (but see httpRange-14 below under <a href=3D"#issues"= >Outstanding issues</a>).</p> <p xmlns=3D"">So for example the URI for the W3C XML Schema's own dateTi= me type would be</p> <blockquote xmlns=3D""><div><pre class=3D"code">http://www.w3.org/2001/XMLS= chema/simpleType/#dateTime</pre></div></blockquote> <p xmlns=3D"">and perhaps, for the DAML+OIL example cited in <link>Dan C= onnolly's feedback</link>, we would get the following ('perhaps' because there's no n= amespace involved in the example as published):</p> <blockquote xmlns=3D""><div><pre class=3D"code">http://www.w3.org/TR/200= 1/NOTE-daml+oil-walkthru-20011218/simpleType/#over12</pre></div></blockquot= e> <p xmlns=3D"">(My inspiration for this approach is at least in part the = IANA structuring of their <a href=3D"http://www.iana.org/assignments/media-types= /">registry of media types</a>, which give us e.g.</p> <blockquote xmlns=3D""><div><pre class=3D"code">http://www.iana.org/assi= gnments/media-types/application/mathematica</pre></div></blockquote> <p xmlns=3D"">for <code>application/mathematica</code> (although irritat= ingly give us nothing for e.g. <code>text/html</code>).</p> =20=20 =20=20 <h2 xmlns=3D"">8.=20 =C2=A0 <a name=3D"issues">Outstanding issues</a></h2> <p xmlns=3D"">This is by no means a fully-baked story. Some things I <i= >know</i> are shaky are</p> <dl xmlns=3D""> <dt><b>httpRange-14</b></dt><dd>The TAG's recent resolution of this iss= ue leaves the question of what sort of resource a namespace URI identifies, and wheth= er you should be able to retrieve any representation of it at all, very much up in the air. The knock-on implications of this wrt fragment identifiers, sub-URIs, etc. are = even more unclear.</dd> <dt><b>Schema Component Designators</b></dt><dd>As presented there is a= complete disconnect between this story and SCDs. Maybe that's the best that we can = do, but it would certainly be better if we could get a solution which shared mo= re.</dd> <dt><b>Languages vs. namespaces</b></dt><dd>This notion of a language a= s distinct From=20a namespace is only just (at least for me) in the process of being w= orked out. It may yet be the case that we would do better to use some kind of 'language URIs' as the base, rather than namespace URIs. The continued widespread use of languages such as Docbook which don't use namespaces shouldn't be ignored.</dd> </dl> =20=20 =20 </body></html> - --=-=-= - -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh Half-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] - --=-=-=-- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFCvDSkkjnJixAXWBoRAif2AJ4kn61mYZdu/9uaGZqbSP693gQxlgCfeCN3 VyR0Ki0Hv81rraWEn5WaPro= =2jFt -----END PGP SIGNATURE-----
Received on Friday, 24 June 2005 16:28:30 UTC