Names, namespaces and languages

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- --=-=-=

I've written up some preliminary thoughts about this, in rather more
detail, but still very much a work-in-progress, something I would have
blogged except I don't have a blog.  Please bear this in mind when
responding -- there's very little here, particularly in the more
speculative sections towards the end, which I'm firmly convinced of.
So feedback is very much in order.

ht


- --=-=-=
Content-Type: text/html; charset=iso-8859-1
Content-Disposition: attachment; filename=names.html
Content-Transfer-Encoding: quoted-printable
Content-Description: Names, namespaces and languages

<?xml version=3D"1.0" encoding=3D"utf-8"?><html xmlns=3D"http://www.w3.org/=
1999/xhtml"><head><meta HTTP-EQUIV=3D"Content-type" CONTENT=3D"text/html; c=
harset=3DUTF-8"/><title>Names, Namespaces and Languages</title><style type=
=3D"text/css">
       PRE.code {font-family: monospace}
       PRE {MARGIN-LEFT: 0em}
       OL OL {list-style-type: lower-alpha}
    </style></head><body STYLE=3D"font-family: times">
 <div xmlns=3D"" style=3D"text-align: center">
  <h1>Names, Namespaces and Languages</h1>
  <div>Henry S. Thompson</div>
  <div>24 June 2005</div>
 </div>
=20
=20=20
   <h2 xmlns=3D"">1.=20
   =C2=A0
   <a name=3D"intro">Introduction</a></h2>
   <p xmlns=3D"">This is very much a work-in-progress, something I would ha=
ve blogged
except I don't have a blog.  Please bear this in mind when responding --
there's very little here, particularly in the more speculative sections tow=
ards
the end, which I'm firmly convinced of.  So feedback is very much in order.=
</p>
=20=20
=20=20
   <h2 xmlns=3D"">2.=20
   =C2=A0
   <a name=3D"background">Background</a></h2>
   <p xmlns=3D"">TAG issues <a href=3D"http://www.w3.org/2001/tag/issues.ht=
ml?type=3D1#namespaceDocument-8">namespaceDocument-8</a> and <a href=3D"htt=
p://www.w3.org/2001/tag/issues.html?type=3D1#abstractComponentRefs-37">abst=
ractComponentRefs-37</a> were the topic of=20
<a href=3D"http://www.w3.org/2001/tag/2005/06/14-16-minutes.html">extended =
discussion</a>
 at the last TAG f2f.  There is considerable overlap between these two issu=
es, and both are related to
<a href=3D"http://lists.w3.org/Archives/Public/www-xml-schema-comments/2005=
JanMar/0080.html">Dan Connolly's comment</a> on the recently published Last=
 Call Working Draft
of <a href=3D"http://www.w3.org/TR/2005/WD-xmlschema-ref-20050329/">XML
Schema: Component Designators</a>.  Although a number of prior
misunderstandings were identified and overcome in the discussion, more work
is needed to make the background assumptions about what the problems are we=
're
trying to solve and what the space of possible solutions is.  This note is =
an
attempt to begin that work.</p>
=20=20
=20=20
   <h2 xmlns=3D"">3.=20
   =C2=A0
   <a name=3D"namespaces">XML Namespaces: An evolving understanding</a></h2>
   <p xmlns=3D"">The <a href=3D"http://lists.w3.org/Archives/Public/www-tag=
/2005Feb/0017.html">recent discussion</a> about whether the <a href=3D"http=
://www.w3.org/TR/2005/CR-xml-id-20050208/">xml:id</a> spec. 'changes' the X=
ML namespace by 'adding' a new name to it helped clarify that the minimalis=
t reading of the <a href=3D"http://www.w3.org/TR/xml-names11/">XML Namespac=
es</a> REC has achieved dominance in the intellectual marketplace.  By "the=
 minimalist reading" I mean I mean the reading on which an XML namespace is=
 primarily a syntactic mechanism for distinguishing one class of uses of a =
particular simple name from all other uses thereof.  This means a namespace=
 is <i>not</i> a finite set of names, nor a more complex structured object =
as suggested by the (in)famous now-deleted non-normative <a href=3D"http://=
www.w3.org/TR/REC-xml-names/#Philosophy">Appendix A: The Internal Structure=
 of XML Namespaces</a> of version 1.0.</p>
   <p xmlns=3D"">The minimalist reading is the only one consistent with act=
ual usage --
people mint new namespaces by simply <i>using</i> them in an expanded name
or namespace declaration, without thereby incurring any obligation to define
the boundaries of some set.  You could say that a namespace springs into li=
fe
the first time anyone uses a URI as a namespace name, but on balance I pref=
er
an understanding which doesn't reify a namespace as such at all.  I don't
object to using phrases such as "[some name] in the [some URI] namespace", =
but
that's just another was of saying "the expanded name <code>&lt; some_URI,
some_name &gt;</code>".</p>
   <p xmlns=3D"">On this account it makes sense to ask questions about name=
space names, e.g. "What
namespace name will XSLT 2.0 use?" and about expanded names, e.g. "Does XSLT
2.0 change the definition of the element named <code>&lt;
http://www.w3.org/Style/1998/Transform, output &gt;</code>?", but
questions about namespaces as such are rarely if ever useful (unless of cou=
rse
they're understood as questions about namespace <i>names</i> or about
some otherwise-defined set of expanded names with a namespace name in commo=
n).</p>
=20=20
=20=20
   <h2 xmlns=3D"">4.=20
   =C2=A0
   <a name=3D"languages">From namespaces to languages</a></h2>=20=20=20
   <p xmlns=3D"">Taking the argument one step further, it is a necessary co=
nsequence of the
position outlined above that it is incoherent to understand e.g.
"Such-and-such a type is defined in the XML Schema namespace" to mean that =
the
XML Schema namespace contains types (or type definitions).  Considering thi=
ngs
carefully, we must understand this sentence as meaning that the XML Schema
language assigns the expanded name <code>&lt; http://www.w3.org/2001/XMLSch=
ema,
such-and-such &gt;</code> to some type definition.  This perspective actual=
ly
works well with our overall understanding of XML Schema:  a schema document
for a particular target namespace corresponds to a schema which assigns ele=
ment declarations, type definitions, etc. to expanded names all
of whose namespace name is that target namespace.</p>
   <p xmlns=3D"">So it's <i>languages</i> (or as we used to say,
<i>applications</i>, in the SGML sense) which assign expanded names
<i>to</i> things.  That assignment may be unique and unequivocal, but
evidently it is often one-to-many.  And of course it's the language which
determines what there is to be named, its own little (or large) ontology.</=
p>
   <p xmlns=3D"">Many languages of course <i>do</i> provide only one thing =
to be
named using a particular namespace name (e.g. <a href=3D"http://www.w3.org/=
TR/xpath-functions/">XQuery Functions and Operators</a>), and others, altho=
ugh naming more than one sort of thing, constrain their use of names to be =
unambiguous (e.g. <a href=3D"http://www.w3.org/TR/SVG/">SVG</a>, <a href=3D=
"http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/">RDF</a>).  In =
both these cases, just an expanded name is sufficient to identify something=
, and constructing a URI for something is therefore straightforward.</p>
   <p xmlns=3D"">On the other hand there are many examples of languages whe=
re the mapping
is one-to-many.  The
most immediate example is XML
itself.  The low-level syntax of XML distinguishs two
sorts of things which are identified by expanded name: elements and attribu=
tes.
Since there is no prohibition on using the same expanded name for both an
element and an attribute, an expanded name is not sufficient to uniquely id=
entify a named
aspect of an XML document (or document type, in the ordinary language sense=
) --
you need to know what I've been calling the <i>sort</i> as well, i.e.
<b>element</b> or <b>attribute</b>.  For example, all of the
following names:</p>
   <ul xmlns=3D"">
    <li><code>abbr</code></li>
    <li><code>cite</code></li>
    <li><code>code</code></li>
    <li><code>dir</code></li>
    <li><code>label</code></li>
    <li><code>link</code></li>
    <li><code>object</code></li>
    <li><code>span</code></li>
    <li><code>style</code></li>
    <li><code>title</code></li>
   </ul>
   <p xmlns=3D"">can be used for either elements or attributes in XHTML 1.0=
 (transitional)
documents, and at least three of these (<code>abbr</code>, <code>cite</code>
and <code>title</code>) survive as ambiguous in XHTML Basic 1.0.</p>
   <p xmlns=3D"">When we expand our scope to XML validation, we suddenly ge=
t a
<i>much</i> more complex situation, in which there are in principle an
unbounded number of things which share a name, only disambiguateable by
context:  we have element declarations (max. one per expanded
name), and attribute declarations (max. as many as there are
element declarations).  For example, there are four distinct attributes
definitions called <b>align</b> and five distinct attribute definitions
called <b>type</b> in the <a href=3D"http://www.w3.org/TR/xhtml1/DTD/xhtml1=
- -transitional.dtd">XHTML transitional DTD</a>.  W3C XML Schema not only has=
 a richer set of what it calls "symbol spaces", so that there are seven thi=
ngs whose definitions can be named (it adds types, attribute and element gr=
oups, notations and identity-constraints along side elements and attributes=
), it also allows elements as well as attributes to be defined in context.<=
/p>
   <p xmlns=3D"">Finally we should note that a language may encompass quite=
 a
range of variation in terms of the things it assigns a particular expanded =
name
to.  There can be variation over time, as new versions of a language are re=
leased,
and even alternative variants released at the same time.  The HTML
<code>P</code> element has a long and complex history, and even the XHTML
<code>p</code> element has three distinct variants in version 1.0 (strict,
transitional and basic), none of which is exactly the same as the one in ve=
rsion 1.1.</p>
   <p xmlns=3D"">None of this should come as a surprise.  Ordinary language=
 uses names in
ways which are both ambiguous and context-determined, and whose use changes
over time.  But its consequence for the Web are more serious, particularly =
as
we consider the use of names for things on the Web intended for automatic
processing, where appeal to context for disambiguation may not be
straighforward at all.  At the very least it is clear that it is no longer
trivial to specify an approach to
constructing URIs for things which will cover all the cases just discussed.=
</p>
=20=20
=20=20
   <h2 xmlns=3D"">5.=20
   =C2=A0
   <a name=3D"abstractions">What abstractions to choose</a></h2>
   <p xmlns=3D"">Broadly speaking there are three ways one could respond to=
 the situation
outlined above:</p>
   <ol xmlns=3D"">
    <li>Only expect to have a systematic approach to naming things with URI=
s when the
language or application involved has a single flat story about naming (e.g.=
 <a href=3D"http://www.w3.org/TR/SVG/">SVG</a>, <a href=3D"http://www.w3.or=
g/TR/2004/REC-rdf-syntax-grammar-20040210/">RDF</a>).=20
Abstract over variations.  We might call this the <a name=3D"simple"><b>sim=
ple</b></a> (or
<b>simplistic</b>) view.</li>
    <li>Demand a systematic approach in all cases, and over all variations,
but acknowledge that this means that in complex cases (e.g. WSDL, XML Schem=
a)
the resulting URIs will themselves be complex, requiring new media types an=
d/or using new XPointer
schemes.  We might call this the <a name=3D"rich"><b>rich</b></a> (or
<b>overkill</b>) view, exemplified by <a href=3D"http://www.w3.org/TR/2005/=
WD-xmlschema-ref-20050329/">XML
Schema: Component Designators</a>.</li>
    <li>Look for a middle ground, which adopts the <a href=3D"simple">simpl=
e</a>
view wherever possible, otherwise an approximation to it which abstracts
over all variation and as much application-specific detail as possible, with
the option to fall back to the <a href=3D"rich">rich</a> view as and when
this is necessary.  We might call this the <a name=3D"middle"><b>middle</b>=
</a> (or <b>80/20</b>) view.</li>
   </ol>
   <p xmlns=3D"">It's important to note that there's an unspoken common ass=
umption to all
three of the above views:  We're going to construct the URI for some named =
thing by adding
some variety of fragment identifier to the namespace name of its expanded n=
ame.
There is no space here for the possibility that two distinct languages might
use the <i>same</i> expanded name for two evidently distinct things.=20
This is intimately bound up with another assumption with respect to variati=
on,
namely that it's possibly to tell reliably when a change in something count=
s as
a variation, as opposed to a fundamental change of identity.  If I change t=
he
named definition of a type by nudging its min or max a bit, that pretty
clearly just produces a variant of the same type.  But if I change the
definition assigned to a name from being an integer to being a date, it's
equally pretty clear that that's no longer the same type at all.  Those are=
 the
easy cases, there will be many which are much harder to call.</p>
   <p xmlns=3D"">I expect that
both of these assumptions will want to be recast as Good Practice notes goi=
ng
forward (e.g. "Don't use the same expanded name for two different things of=
 the
same sort in different languages under your control"; "As a language evolve=
s,
use new expanded names for new things, don't recycle old ones").</p>
=20=20
=20=20
   <h2 xmlns=3D"">6.=20
   =C2=A0
   <a name=3D"details">More details on the <a href=3D"middle">middle</a> gr=
ound</a></h2>
   <p xmlns=3D"">Without more detailed examination of real usage scenarios,=
 it's hard to
be sure of what general principles to establish here, but on the basis of my
limited experience to date it seems likely that something along the followi=
ng
lines is a reasonable starting point.</p>
   <p xmlns=3D"">It's up to the owner of a language, for each of the namesp=
aces involved
in that language, to provide a constructive definition of the way in which
things which have expanded names can also be named with URIs.  I've identif=
ied
the following guidelines for such definitions:</p>
   <ul xmlns=3D"">
    <li>Use the namespace URI as the basis of the constructed name;</li>
    <li>Where part of the complexity of a language's name structure comes
From=20giving expanded names to more than one sort of thing, include the so=
rt in
the URI;</li>
    <li>Where evolution over time and or simultaneous language variants are=
 a
possibility, be clear that simple URIs are <i>not</i> capable of
capturing this;</li>
    <li>Try to provide retrievable representations so that the namespace
URI(s) you construct a) have a widely used media type and b) yield a useful
result when the fragment identifier is resolved.</li>
   </ul>
=20=20
=20=20
   <h2 xmlns=3D"">7.=20
   =C2=A0
   <a name=3D"example">The W3C XML Schema example</a></h2>
   <p xmlns=3D"">The <a href=3D"http://www.w3.org/2001/tag/2005/06/14-16-mi=
nutes.html#item031">position</a> that emerged at the end of the recent TAG =
f2f is consistent with the above guidelines, but obviously lacking in detai=
l.  On balance my prefered approach would look something like this:</p>
   <blockquote xmlns=3D""><div>URI names are provided for everything define=
d or declared by name
at the top level which have some conceptual identity independent of the det=
ails
of W3C XML Schema, i.e. elements, attributes and simple and complex types.<=
/div></blockquote>
   <blockquote xmlns=3D""><div>The URI name for something of one of the abo=
ve four sorts is
constructed by concatenating the namespace name of its expanded name, a
<code>/</code> if that does not already end with one, its sort
(i.e. <b>attribute</b>, <b>complexType</b>, <b>element</b>
or <b>simpleType</b>) a <code>/#</code> and the local name of its
expanded name.</div></blockquote>
   <blockquote xmlns=3D""><div>URI names for languages which don't use name=
spaces are
based on a URI designated for the purpose in the language specification, e.=
g.
<a href=3D"http://www.w3.org/2002/xmlspec/">http://www.w3.org/2002/xmlspec/=
</a> for the W3C's 'specprod' language.</div></blockquote>
   <p xmlns=3D"">It would be the responsibility of language owners to provi=
de retrievable
representations of resources at each sort-determined sub-URI of the namespa=
ce
URI to make this work (but see httpRange-14 below under <a href=3D"#issues"=
>Outstanding issues</a>).</p>
   <p xmlns=3D"">So for example the URI for the W3C XML Schema's own dateTi=
me type would be</p>
<blockquote xmlns=3D""><div><pre class=3D"code">http://www.w3.org/2001/XMLS=
chema/simpleType/#dateTime</pre></div></blockquote>
   <p xmlns=3D"">and perhaps, for the DAML+OIL example cited in <link>Dan C=
onnolly's
feedback</link>, we would get the following ('perhaps' because there's no n=
amespace involved in the example as published):</p>
   <blockquote xmlns=3D""><div><pre class=3D"code">http://www.w3.org/TR/200=
1/NOTE-daml+oil-walkthru-20011218/simpleType/#over12</pre></div></blockquot=
e>
   <p xmlns=3D"">(My inspiration for this approach is at least in part the =
IANA
structuring of their <a href=3D"http://www.iana.org/assignments/media-types=
/">registry of media types</a>, which give us e.g.</p>
   <blockquote xmlns=3D""><div><pre class=3D"code">http://www.iana.org/assi=
gnments/media-types/application/mathematica</pre></div></blockquote>
   <p xmlns=3D"">for <code>application/mathematica</code> (although irritat=
ingly give us
nothing for e.g. <code>text/html</code>).</p>
=20=20
=20=20
   <h2 xmlns=3D"">8.=20
   =C2=A0
   <a name=3D"issues">Outstanding issues</a></h2>
   <p xmlns=3D"">This is by no means a fully-baked story.  Some things I <i=
>know</i>
are shaky are</p>
   <dl xmlns=3D"">
    <dt><b>httpRange-14</b></dt><dd>The TAG's recent resolution of this iss=
ue leaves
the question of what sort of resource a namespace URI identifies, and wheth=
er you should be able to
retrieve any representation of it at all, very much up in the air.  The
knock-on implications of this wrt fragment identifiers, sub-URIs, etc. are =
even
more unclear.</dd>
    <dt><b>Schema Component Designators</b></dt><dd>As presented there is a=
 complete
disconnect between this story and SCDs.  Maybe that's the best that we can =
do,
but it would certainly be better if we could get a solution which shared mo=
re.</dd>
    <dt><b>Languages vs. namespaces</b></dt><dd>This notion of a language a=
s distinct
From=20a namespace is only just (at least for me) in the process of being w=
orked
out.  It may yet be the case that we would do better to use some kind of
'language URIs' as the base, rather than namespace URIs.  The continued
widespread use of languages such as Docbook which don't use namespaces
shouldn't be ignored.</dd>
   </dl>
=20=20
=20
</body></html>
- --=-=-=


- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]

- --=-=-=--
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCvDSkkjnJixAXWBoRAif2AJ4kn61mYZdu/9uaGZqbSP693gQxlgCfeCN3
VyR0Ki0Hv81rraWEn5WaPro=
=2jFt
-----END PGP SIGNATURE-----

Received on Friday, 24 June 2005 16:28:30 UTC