What is a namespace, anyway?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Seems like there was a problem in the background of our discussion of
namespaceState-48 which is worth foregrounding: we don't have a
consensus understanding of what a namespace is, independently of or
prior to our understanding of _XML_ namespaces (please work hard to
understand the word 'namespace' in the rest of this message in that
general, not-restricted-to-XML, sense).

Wikipedia [1] says (of *Namespace (computer science)*):

 "A namespace is a context for identifiers."

What that means, I'm pretty confident, is that any discussion of an
identifier is incomplete/underspecified unless it specifies a
context, that is, a namespace.

Turning for a moment to identifiers, we can take either a top-down or
a bottom-up view: Bottom-up, the computer I'm typing on has an
identifier, 'erasmus'.  And I myself have a number of identifiers,
such as 'Henry Swift Thompson', my UK National Insurance number, my US
SSN, etc.  The function which will I'll run when I'm ready to send
this message has the identifier 'message-send-and-exit'.  In each
case, the context for the above identifiers is pretty clear.  The
first of those contexts itself has a clearcut nsid
(i.e. 'uk.ac.ed.inf'), but the others don't.

[I'm using the name 'nsid' to denote the identifier of a namespace, in
order to avoid confusion with the identifiers for which a namespace is
the context.]

Top-down, my previous postings about computer languages and XML [2] [3]
provide examples -- it is a property of Python as designed that each
class is a namespace, that is, provides a context for a set of names,
and that each method _within_ a class corresponds to a further
namespace.

Some tentative observations:

 1) Identifiers need not identify anything -- if we consider the
    namespace of SSNs, it's clear that it includes both numbers which
    once did, or still do today, identify individuals, and also
    numbers which don't, either because they were issued in error or
    because they haven't (yet) been used.

 2) It doesn't follow from anything I've said _yet_ that identifiers
    are unique in their context.  There are three people named 'John'
    within a few tens of metres of me as I type, and 'John Brown'
    identifies at least three distinct members of staff at the
    University of Edinburgh.

    Most systematic namespaces, that is, ones defined top-down, do
    eventually narrow things down to a point where uniqueness is
    guaranteed, but they _don't_ always provide nsids all the way
    down.

    For example we could start out by observing that the Java language
    spec. defines three kinds of namespace for which a well-defined
    nsid is defined, as follows

     Context     Things identified by name therein

     Package             Class
     Class               Class, Method, Variable
     Method              Variable

     As noted in the earlier email [2] the context established by a
     Java class is not itself a namespace within which identifiers are
     necessarily unique.  We can _describe_ three as-it-were
     sub-namespaces of that namespace, (the middle row above), "the
     namespace for classes within a class", "the namespace for methods
     within a class" and "the namespace for variables within a class".
     Within _those_ namespaces identifiers _are_ unique, but
     interestingly Java doesn't give us a well-defined way to
     _assign nsids_ to those namespaces.

What does this have to do with the architecture of the Web, and the
namespaceState-48 issue?

First of all, we can observe that _XML_ namespaces as defined fit with
the story given above, as does the draft finding [4].  XML namespace
names are nsids, and XML namespace local names are identifiers.

It's worth noting that wrt point (2) above the XML namespaces REC as
it stands does _not_ require identifiers in an XML namespace to be
unique.  We can furthermore see that within the XML namespace
_identified_ by an XML Namespace name there may be some number of
unidentified sub-namespaces within which identifiers _are_ unique,
parallel to the Java case discussed above.

Second of all, the question does naturally arise as to how the above
analysis fits with the WebArch imperative to name things with URIs.
Our analysis gives us two problems:

 1) Some namespaces don't automatically come with nsids;
 2) Not all namespaces guarantee uniqueness for their identifiers.

Even when we have a namespace with a well-defined nsid and a
uniqueness guarantee therein, there are at least three further things
in the way of mapping to URIs by the most transparent means, i.e. the
mapping which looks like this:

  URI(identifier in context of a namespace) ==
           URI(nsid) #? identifier

 3) The nsid itself may not map directly to a valid URI;

 4) The identifier may not be a valid fragment id per RFC3986, or for
    the media type associated with the information resource identified
    by URI(nsid);

 5) URI(nsid) may not match the following production, which I wish was
    available in RFC 3986 [5]

    core-URI = scheme ":" hier-part [ "#" ]

    That is, there's no straightforward way of gluing the two parts
    together to get a valid URI.

It's worth noting that as specified XML namespaces cannot suffer from
problems (1), (3) or (4), but they are vulnerable to (2) and (5).

That's enough for one posting -- I'll return to the still-open
question as to where RDF's notion of namespace fits in all this in a
subsequent message.

ht

[1] http://en.wikipedia.org/wiki/Namespace_%28programming%29
[2] http://lists.w3.org/Archives/Public/www-tag/2005Dec/0065.html
[3] http://lists.w3.org/Archives/Public/www-tag/2005Dec/0070.html
[4] http://www.w3.org/2001/tag/doc/namespaceState-2005-12-16.html
[5] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#collected-abnf
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFDqaDDkjnJixAXWBoRAm/7AJ90Y23mpMVSzEvFy/luxDDel4M6TACeJHfZ
PREICi7Y/Rp2kdT0Nfcf2AM=
=AlKS
-----END PGP SIGNATURE-----

Received on Wednesday, 21 December 2005 18:36:58 UTC