W3C home > Mailing lists > Public > www-tag@w3.org > June 2005

Names, namespaces and languages

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Fri, 24 Jun 2005 17:28:20 +0100
To: www-tag <www-tag@w3.org>
Message-ID: <f5b8y0z1zyz.fsf@erasmus.inf.ed.ac.uk>
>Hash: SHA1
>- --=-=-=
>I've written up some preliminary thoughts about this, in rather more
>detail, but still very much a work-in-progress, something I would have
>blogged except I don't have a blog.  Please bear this in mind when
>responding -- there's very little here, particularly in the more
>speculative sections towards the end, which I'm firmly convinced of.
>So feedback is very much in order.
>- --=-=-=
>Content-Type: text/html; charset=iso-8859-1
>Content-Disposition: attachment; filename=names.html
>Content-Transfer-Encoding: quoted-printable
>Content-Description: Names, namespaces and languages
><?xml version=3D"1.0" encoding=3D"utf-8"?><html xmlns=3D"http://www.w3.org/=
>1999/xhtml"><head><meta HTTP-EQUIV=3D"Content-type" CONTENT=3D"text/html; c=
>harset=3DUTF-8"/><title>Names, Namespaces and Languages</title><style type=
>       PRE.code {font-family: monospace}
>       PRE {MARGIN-LEFT: 0em}
>       OL OL {list-style-type: lower-alpha}
>    </style></head><body STYLE=3D"font-family: times">
> <div xmlns=3D"" style=3D"text-align: center">
>  <h1>Names, Namespaces and Languages</h1>
>  <div>Henry S. Thompson</div>
>  <div>24 June 2005</div>
> </div>
>   <h2 xmlns=3D"">1.=20
>   =C2=A0
>   <a name=3D"intro">Introduction</a></h2>
>   <p xmlns=3D"">This is very much a work-in-progress, something I would ha=
>ve blogged
>except I don't have a blog.  Please bear this in mind when responding --
>there's very little here, particularly in the more speculative sections tow=
>the end, which I'm firmly convinced of.  So feedback is very much in order.=
>   <h2 xmlns=3D"">2.=20
>   =C2=A0
>   <a name=3D"background">Background</a></h2>
>   <p xmlns=3D"">TAG issues <a href=3D"http://www.w3.org/2001/tag/issues.ht=
>ml?type=3D1#namespaceDocument-8">namespaceDocument-8</a> and <a href=3D"htt=
>ractComponentRefs-37</a> were the topic of=20
><a href=3D"http://www.w3.org/2001/tag/2005/06/14-16-minutes.html">extended =
> at the last TAG f2f.  There is considerable overlap between these two issu=
>es, and both are related to
><a href=3D"http://lists.w3.org/Archives/Public/www-xml-schema-comments/2005=
>JanMar/0080.html">Dan Connolly's comment</a> on the recently published Last=
> Call Working Draft
>of <a href=3D"http://www.w3.org/TR/2005/WD-xmlschema-ref-20050329/">XML
>Schema: Component Designators</a>.  Although a number of prior
>misunderstandings were identified and overcome in the discussion, more work
>is needed to make the background assumptions about what the problems are we=
>trying to solve and what the space of possible solutions is.  This note is =
>attempt to begin that work.</p>
>   <h2 xmlns=3D"">3.=20
>   =C2=A0
>   <a name=3D"namespaces">XML Namespaces: An evolving understanding</a></h2>
>   <p xmlns=3D"">The <a href=3D"http://lists.w3.org/Archives/Public/www-tag=
>/2005Feb/0017.html">recent discussion</a> about whether the <a href=3D"http=
>://www.w3.org/TR/2005/CR-xml-id-20050208/">xml:id</a> spec. 'changes' the X=
>ML namespace by 'adding' a new name to it helped clarify that the minimalis=
>t reading of the <a href=3D"http://www.w3.org/TR/xml-names11/">XML Namespac=
>es</a> REC has achieved dominance in the intellectual marketplace.  By "the=
> minimalist reading" I mean I mean the reading on which an XML namespace is=
> primarily a syntactic mechanism for distinguishing one class of uses of a =
>particular simple name from all other uses thereof.  This means a namespace=
> is <i>not</i> a finite set of names, nor a more complex structured object =
>as suggested by the (in)famous now-deleted non-normative <a href=3D"http://=
>www.w3.org/TR/REC-xml-names/#Philosophy">Appendix A: The Internal Structure=
> of XML Namespaces</a> of version 1.0.</p>
>   <p xmlns=3D"">The minimalist reading is the only one consistent with act=
>ual usage --
>people mint new namespaces by simply <i>using</i> them in an expanded name
>or namespace declaration, without thereby incurring any obligation to define
>the boundaries of some set.  You could say that a namespace springs into li=
>the first time anyone uses a URI as a namespace name, but on balance I pref=
>an understanding which doesn't reify a namespace as such at all.  I don't
>object to using phrases such as "[some name] in the [some URI] namespace", =
>that's just another was of saying "the expanded name <code>&lt; some_URI,
>some_name &gt;</code>".</p>
>   <p xmlns=3D"">On this account it makes sense to ask questions about name=
>space names, e.g. "What
>namespace name will XSLT 2.0 use?" and about expanded names, e.g. "Does XSLT
>2.0 change the definition of the element named <code>&lt;
>http://www.w3.org/Style/1998/Transform, output &gt;</code>?", but
>questions about namespaces as such are rarely if ever useful (unless of cou=
>they're understood as questions about namespace <i>names</i> or about
>some otherwise-defined set of expanded names with a namespace name in commo=
>   <h2 xmlns=3D"">4.=20
>   =C2=A0
>   <a name=3D"languages">From namespaces to languages</a></h2>=20=20=20
>   <p xmlns=3D"">Taking the argument one step further, it is a necessary co=
>nsequence of the
>position outlined above that it is incoherent to understand e.g.
>"Such-and-such a type is defined in the XML Schema namespace" to mean that =
>XML Schema namespace contains types (or type definitions).  Considering thi=
>carefully, we must understand this sentence as meaning that the XML Schema
>language assigns the expanded name <code>&lt; http://www.w3.org/2001/XMLSch=
>such-and-such &gt;</code> to some type definition.  This perspective actual=
>works well with our overall understanding of XML Schema:  a schema document
>for a particular target namespace corresponds to a schema which assigns ele=
>ment declarations, type definitions, etc. to expanded names all
>of whose namespace name is that target namespace.</p>
>   <p xmlns=3D"">So it's <i>languages</i> (or as we used to say,
><i>applications</i>, in the SGML sense) which assign expanded names
><i>to</i> things.  That assignment may be unique and unequivocal, but
>evidently it is often one-to-many.  And of course it's the language which
>determines what there is to be named, its own little (or large) ontology.</=
>   <p xmlns=3D"">Many languages of course <i>do</i> provide only one thing =
>to be
>named using a particular namespace name (e.g. <a href=3D"http://www.w3.org/=
>TR/xpath-functions/">XQuery Functions and Operators</a>), and others, altho=
>ugh naming more than one sort of thing, constrain their use of names to be =
>unambiguous (e.g. <a href=3D"http://www.w3.org/TR/SVG/">SVG</a>, <a href=3D=
>"http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/">RDF</a>).  In =
>both these cases, just an expanded name is sufficient to identify something=
>, and constructing a URI for something is therefore straightforward.</p>
>   <p xmlns=3D"">On the other hand there are many examples of languages whe=
>re the mapping
>is one-to-many.  The
>most immediate example is XML
>itself.  The low-level syntax of XML distinguishs two
>sorts of things which are identified by expanded name: elements and attribu=
>Since there is no prohibition on using the same expanded name for both an
>element and an attribute, an expanded name is not sufficient to uniquely id=
>entify a named
>aspect of an XML document (or document type, in the ordinary language sense=
>) --
>you need to know what I've been calling the <i>sort</i> as well, i.e.
><b>element</b> or <b>attribute</b>.  For example, all of the
>following names:</p>
>   <ul xmlns=3D"">
>    <li><code>abbr</code></li>
>    <li><code>cite</code></li>
>    <li><code>code</code></li>
>    <li><code>dir</code></li>
>    <li><code>label</code></li>
>    <li><code>link</code></li>
>    <li><code>object</code></li>
>    <li><code>span</code></li>
>    <li><code>style</code></li>
>    <li><code>title</code></li>
>   </ul>
>   <p xmlns=3D"">can be used for either elements or attributes in XHTML 1.0=
> (transitional)
>documents, and at least three of these (<code>abbr</code>, <code>cite</code>
>and <code>title</code>) survive as ambiguous in XHTML Basic 1.0.</p>
>   <p xmlns=3D"">When we expand our scope to XML validation, we suddenly ge=
>t a
><i>much</i> more complex situation, in which there are in principle an
>unbounded number of things which share a name, only disambiguateable by
>context:  we have element declarations (max. one per expanded
>name), and attribute declarations (max. as many as there are
>element declarations).  For example, there are four distinct attributes
>definitions called <b>align</b> and five distinct attribute definitions
>called <b>type</b> in the <a href=3D"http://www.w3.org/TR/xhtml1/DTD/xhtml1=
>- -transitional.dtd">XHTML transitional DTD</a>.  W3C XML Schema not only has=
> a richer set of what it calls "symbol spaces", so that there are seven thi=
>ngs whose definitions can be named (it adds types, attribute and element gr=
>oups, notations and identity-constraints along side elements and attributes=
>), it also allows elements as well as attributes to be defined in context.<=
>   <p xmlns=3D"">Finally we should note that a language may encompass quite=
> a
>range of variation in terms of the things it assigns a particular expanded =
>to.  There can be variation over time, as new versions of a language are re=
>and even alternative variants released at the same time.  The HTML
><code>P</code> element has a long and complex history, and even the XHTML
><code>p</code> element has three distinct variants in version 1.0 (strict,
>transitional and basic), none of which is exactly the same as the one in ve=
>rsion 1.1.</p>
>   <p xmlns=3D"">None of this should come as a surprise.  Ordinary language=
> uses names in
>ways which are both ambiguous and context-determined, and whose use changes
>over time.  But its consequence for the Web are more serious, particularly =
>we consider the use of names for things on the Web intended for automatic
>processing, where appeal to context for disambiguation may not be
>straighforward at all.  At the very least it is clear that it is no longer
>trivial to specify an approach to
>constructing URIs for things which will cover all the cases just discussed.=
>   <h2 xmlns=3D"">5.=20
>   =C2=A0
>   <a name=3D"abstractions">What abstractions to choose</a></h2>
>   <p xmlns=3D"">Broadly speaking there are three ways one could respond to=
> the situation
>outlined above:</p>
>   <ol xmlns=3D"">
>    <li>Only expect to have a systematic approach to naming things with URI=
>s when the
>language or application involved has a single flat story about naming (e.g.=
> <a href=3D"http://www.w3.org/TR/SVG/">SVG</a>, <a href=3D"http://www.w3.or=
>Abstract over variations.  We might call this the <a name=3D"simple"><b>sim=
>ple</b></a> (or
><b>simplistic</b>) view.</li>
>    <li>Demand a systematic approach in all cases, and over all variations,
>but acknowledge that this means that in complex cases (e.g. WSDL, XML Schem=
>the resulting URIs will themselves be complex, requiring new media types an=
>d/or using new XPointer
>schemes.  We might call this the <a name=3D"rich"><b>rich</b></a> (or
><b>overkill</b>) view, exemplified by <a href=3D"http://www.w3.org/TR/2005/=
>Schema: Component Designators</a>.</li>
>    <li>Look for a middle ground, which adopts the <a href=3D"simple">simpl=
>view wherever possible, otherwise an approximation to it which abstracts
>over all variation and as much application-specific detail as possible, with
>the option to fall back to the <a href=3D"rich">rich</a> view as and when
>this is necessary.  We might call this the <a name=3D"middle"><b>middle</b>=
></a> (or <b>80/20</b>) view.</li>
>   </ol>
>   <p xmlns=3D"">It's important to note that there's an unspoken common ass=
>umption to all
>three of the above views:  We're going to construct the URI for some named =
>thing by adding
>some variety of fragment identifier to the namespace name of its expanded n=
>There is no space here for the possibility that two distinct languages might
>use the <i>same</i> expanded name for two evidently distinct things.=20
>This is intimately bound up with another assumption with respect to variati=
>namely that it's possibly to tell reliably when a change in something count=
>s as
>a variation, as opposed to a fundamental change of identity.  If I change t=
>named definition of a type by nudging its min or max a bit, that pretty
>clearly just produces a variant of the same type.  But if I change the
>definition assigned to a name from being an integer to being a date, it's
>equally pretty clear that that's no longer the same type at all.  Those are=
> the
>easy cases, there will be many which are much harder to call.</p>
>   <p xmlns=3D"">I expect that
>both of these assumptions will want to be recast as Good Practice notes goi=
>forward (e.g. "Don't use the same expanded name for two different things of=
> the
>same sort in different languages under your control"; "As a language evolve=
>use new expanded names for new things, don't recycle old ones").</p>
>   <h2 xmlns=3D"">6.=20
>   =C2=A0
>   <a name=3D"details">More details on the <a href=3D"middle">middle</a> gr=
>   <p xmlns=3D"">Without more detailed examination of real usage scenarios,=
> it's hard to
>be sure of what general principles to establish here, but on the basis of my
>limited experience to date it seems likely that something along the followi=
>lines is a reasonable starting point.</p>
>   <p xmlns=3D"">It's up to the owner of a language, for each of the namesp=
>aces involved
>in that language, to provide a constructive definition of the way in which
>things which have expanded names can also be named with URIs.  I've identif=
>the following guidelines for such definitions:</p>
>   <ul xmlns=3D"">
>    <li>Use the namespace URI as the basis of the constructed name;</li>
>    <li>Where part of the complexity of a language's name structure comes
>From=20giving expanded names to more than one sort of thing, include the so=
>rt in
>the URI;</li>
>    <li>Where evolution over time and or simultaneous language variants are=
> a
>possibility, be clear that simple URIs are <i>not</i> capable of
>capturing this;</li>
>    <li>Try to provide retrievable representations so that the namespace
>URI(s) you construct a) have a widely used media type and b) yield a useful
>result when the fragment identifier is resolved.</li>
>   </ul>
>   <h2 xmlns=3D"">7.=20
>   =C2=A0
>   <a name=3D"example">The W3C XML Schema example</a></h2>
>   <p xmlns=3D"">The <a href=3D"http://www.w3.org/2001/tag/2005/06/14-16-mi=
>nutes.html#item031">position</a> that emerged at the end of the recent TAG =
>f2f is consistent with the above guidelines, but obviously lacking in detai=
>l.  On balance my prefered approach would look something like this:</p>
>   <blockquote xmlns=3D""><div>URI names are provided for everything define=
>d or declared by name
>at the top level which have some conceptual identity independent of the det=
>of W3C XML Schema, i.e. elements, attributes and simple and complex types.<=
>   <blockquote xmlns=3D""><div>The URI name for something of one of the abo=
>ve four sorts is
>constructed by concatenating the namespace name of its expanded name, a
><code>/</code> if that does not already end with one, its sort
>(i.e. <b>attribute</b>, <b>complexType</b>, <b>element</b>
>or <b>simpleType</b>) a <code>/#</code> and the local name of its
>expanded name.</div></blockquote>
>   <blockquote xmlns=3D""><div>URI names for languages which don't use name=
>spaces are
>based on a URI designated for the purpose in the language specification, e.=
><a href=3D"http://www.w3.org/2002/xmlspec/">http://www.w3.org/2002/xmlspec/=
></a> for the W3C's 'specprod' language.</div></blockquote>
>   <p xmlns=3D"">It would be the responsibility of language owners to provi=
>de retrievable
>representations of resources at each sort-determined sub-URI of the namespa=
>URI to make this work (but see httpRange-14 below under <a href=3D"#issues"=
>>Outstanding issues</a>).</p>
>   <p xmlns=3D"">So for example the URI for the W3C XML Schema's own dateTi=
>me type would be</p>
><blockquote xmlns=3D""><div><pre class=3D"code">http://www.w3.org/2001/XMLS=
>   <p xmlns=3D"">and perhaps, for the DAML+OIL example cited in <link>Dan C=
>feedback</link>, we would get the following ('perhaps' because there's no n=
>amespace involved in the example as published):</p>
>   <blockquote xmlns=3D""><div><pre class=3D"code">http://www.w3.org/TR/200=
>   <p xmlns=3D"">(My inspiration for this approach is at least in part the =
>structuring of their <a href=3D"http://www.iana.org/assignments/media-types=
>/">registry of media types</a>, which give us e.g.</p>
>   <blockquote xmlns=3D""><div><pre class=3D"code">http://www.iana.org/assi=
>   <p xmlns=3D"">for <code>application/mathematica</code> (although irritat=
>ingly give us
>nothing for e.g. <code>text/html</code>).</p>
>   <h2 xmlns=3D"">8.=20
>   =C2=A0
>   <a name=3D"issues">Outstanding issues</a></h2>
>   <p xmlns=3D"">This is by no means a fully-baked story.  Some things I <i=
>are shaky are</p>
>   <dl xmlns=3D"">
>    <dt><b>httpRange-14</b></dt><dd>The TAG's recent resolution of this iss=
>ue leaves
>the question of what sort of resource a namespace URI identifies, and wheth=
>er you should be able to
>retrieve any representation of it at all, very much up in the air.  The
>knock-on implications of this wrt fragment identifiers, sub-URIs, etc. are =
>more unclear.</dd>
>    <dt><b>Schema Component Designators</b></dt><dd>As presented there is a=
> complete
>disconnect between this story and SCDs.  Maybe that's the best that we can =
>but it would certainly be better if we could get a solution which shared mo=
>    <dt><b>Languages vs. namespaces</b></dt><dd>This notion of a language a=
>s distinct
>From=20a namespace is only just (at least for me) in the process of being w=
>out.  It may yet be the case that we would do better to use some kind of
>'language URIs' as the base, rather than namespace URIs.  The continued
>widespread use of languages such as Docbook which don't use namespaces
>shouldn't be ignored.</dd>
>   </dl>
>- --=-=-=
>- -- 
> Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
>                     Half-time member of W3C Team
>    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
>            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
>                   URL: http://www.ltg.ed.ac.uk/~ht/
>[mail really from me _always_ has this .sig -- mail without it is forged spam]
>- --=-=-=--
>Version: GnuPG v1.2.6 (GNU/Linux)
Received on Friday, 24 June 2005 16:28:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:36 GMT