Re: How namespace names might be used from Simon St.Laurent on 2000-06-16 (xml-uri@w3.org from June 2000)

From: Simon St.Laurent <simonstl@simonstl.com>
Date: Fri, 16 Jun 2000 14:11:48 -0400
To: xml-uri@w3.org
Message-Id: <200006161809.OAA26389@hesketh.net>
At 10:42 AM 6/16/00 -0500, Al Gilman wrote:
>The problem is both sides are assuming there is a 1:1 relationship and are
>arguing over how to define it.  There is no answer in that space.  There is
>no 1:1 relationship between namespaces and languages, between Qnames and
>element types.

I'm afraid it's a little more complex than that - not everyone is aiming
for a 1:1 relationship - many-to-one and one-to-many are important
possibilities I think have been somewhat lost in previous discussions of
architectural planning.

>The actual operational requirement is for the lower layers to distinguish
>by namespace within a document and for the upper layers to associate by
>language across documents and between documents and processors.  In
>particular note that the namespace does not necessarily uniquely identify
>the language nor definitively identify the types and attributes so named.

Where does this 'operational requirement' come from?

I think it's reasonable to assert that 'distinguish' is the only part we're
arguing over in the 'absolutize' | 'forbid' | 'literal' discussion.

>A Qname does not completely identify an element type or attribute for all
>XML processing.  Qnames suffice to keep lower level processing from
>identifying types occuring in one document which should be distinguished.
>But they do not suffice to identify the element type or attribute for all
>purposes, i.e. across documents and in the matching of documents to
>processors.  For this, the full language definition is in general required.
> The Qname is sufficient when used as an index into the language
>definition, but not by itself because it is legal (and widely done) to
>reuse Qnames in related dialects, viz: HTML.

I'm not sure the Qname should be taken as 'an index into the language
definition' unless you take a very very very broad understanding of
languages, one that allows for synonyms, loan words, and overlaps between
languages.

>Would you consider asking if a language is a namespace?

I'm not sure if that's even a relevant question when it's clear that the
parties involved have different understandings of 'language' in addition to
'namespace'.

>The issue of whether the leaf-level element types and attributes in this
>document are the same as those in another document is not a question of
>syntax, but of usage.  It is a question "is the language in use here and
>there the same?"  To compare across documents, you have to compare
>languages, not namespaces.

In my understanding of language, that doesn't make sense.  To compare
across documents, I'd compare context and understanding, not necessarily
'language'.  There's too much contingency involved here for me to accept
any overarching 'language' concept.

>Element and attribute names, uttered within markup, are not atoms, but
>indices into some language schema.  This schema may or may not be
>represented in a document, but the case where there is such a document and
>there are constraints as well as tokens associated with the nodes in the
>InfoSet for that language has to be included a_priori.

Great.  So we're completely and utterly on different tracks.  To me,
elements and attribute names are atoms, and the 'language schemas' are
occasionally useful tools.

>Namespaces are OK for sorting things out locally, but namespace processing
>does not yield a conclusive answer to the cross-document comparison of the
>markup.  

All it should do is provide a full name for the element or attribute.

>The upper layers need to know and care about what language is being used
>where those names are being used.  The lower layers just need to build an
>compliant infoset structure.

Agreed on the lower layers, but I'm not sure the upper layer dream is
viable, for reasons stated above.

>Assuming that an element type Qname, or a namespace of them, is an
>ontological atom, in a space with a discrete topology, breaks the orderly
>allocation of functions between these two layers.  The type-name of an
>element, even when qualified as to namespace, does not fully identify its
>type.  It merely indexes _which type in the language_ is indicated.
>Without knowing the language context, the type is undefined.

Er... why are you so concerned with defining types on that broad a scale?
I'm not sure it's necessary, and suspect the underlying approach is
severely compromised by this dream of 1:1 mapping you described above.

>In the upper-layer processing, the same set of InfoSet nodes that has been
>segregated "by namespace" in the lower layers needs to be handled as bound
>to a particular language definition, a distinction finer than the
>namespacing done by the lower layers.  It's the same filter of the InfoSet,
>only the identification is refined.

At this point, I think we've got almost nothing in common in our upper
layers, so maybe we'd better focus strictly on the lower layers.
(Something like respecting the 'signifier/signified' distinction, and
talking only about signifiers.)

>The upper layers refine the identification of what that filter of nodes is
>associated with.  "A namespace" is just the starting point.

I don't see a namespace - I see lots of elements and attributes marked with
namespace identifiers.  Seems to me like a much more practical starting point.

>The lower layers should not need nor presume to recognize the namespace.
>Only distinguish the different namespaces appearing in one parse or one
>document.

We agree, for once!  (There's still the ugly question of how to
distinguish...)

>Match patterns in stylesheets refer to names in the space of the document
>that is being style-processed.  They are name acceptors, not name creators.

Okay.

>Common processing of "the same names in different documents" should not be
>automatic.  Only common processing of "the same language in different
>documents."  That is to say common processing above the layer that builds
>the InfoSet.

I'm not sure "common processing of 'the same language in different
documents"' is coherent.  I'm much happier with common processing of the
same names in different document contexts.

>There is no reason why an identifier of the language could not be used as
>the discriminator in lower-layer processing of a namespaced filter of
>markup within a document.  Conversely, there is also no reason why the
>language in that filter should not be identified incrementally by separate
>namespace and schema location indications.

I'd argue that the reuse of the identifier by the upper layers shouldn't
bind the lower layer to additional processing that goes well beyond its
needs or the needs of alternative upper layers.

>[Wave DRUMS flag - strict in what you transmit, loose in what you accept]
>
>In the "how many namespaces for XHTML" debate we realized that it was
>useful to have two characterizations of the language in a doucument: a
>general characterization and a precise characterization.  The analogy to
>MIME type/subtype nomeclature is strong.  The casual processor only needs
>to know that the document is HTML; a validating parser needs to know what
>technical definition down to the jot and tittle you are using as a
>reference for this HTML.

I think the way XHTML/HTML are actually being processed in XML contexts in
browsers is on an element-by-element basis, without any strong attachment
to 'this is the HTML language'.  Even Microsoft's heavy-duty binding of the
html: prefix to HTML elements and its apparent preference for HTML as
documents and XML as data retains enough flexibility to let XML developers
get a lot of work done using HTML atoms, without grave concerns for the
integrity of the HTML 'language'.

>Different processors need to know different levels of precision in
>identifying the language that they are processing.

But those levels of precision aren't bound to particular layers of
processing.  My personal vision of XML processing doesn't talk about
'language'.  It talks about atoms and their occasional relations.  I know
I'm not alone in that.

>Language identification is not atomic.  It is at least as rich as Boolean.
>
>Given the rich lattice of sublanguages it is impractical to assume that the
>coarse and fine descriptions of the language in use in a particular
>namespaced filter of markup in a particular document are the same.  So the
>atomic solution where the language identification is atomic and is used as
>the discriminant for the namespace (a.k.a. namespace name) is not
>practical.  It is a bad fit to the actual need, as the HTML example
>demonstrates.
>
>The ns-attr and schema location attribute give us a mechanism to indicate
>both a coarse and fine description of the language in use within a local
>namespaced filter of markup.   [not necessarily canonical, but workable]

Give who a mechanism?  Lots of people aren't looking for language
descriptions per se.  A cluster of atomic descriptions might be handy, but
there's no reason for it to be limited to descriptions of atoms sharing a
single namespace.

>The actual operational requirement is for the lower layers to distinguish
>by namespace within a document and for the upper layers to associate by
>language across documents and between documents and processors.  In
>particular note that the namespace does not necessarily uniquely identify
>the language.

As noted above, I don't think 'requirement' is necessary.

>The layering of processing needs to provide for this progressive refinement
>in the identification of the types used in the markup.

There is no 'progressive refinement' except in certain systems which like
to think of themselves as 'progressive' and/or 'refined'.  I'd suggest we
stop applying such value judgments to different processing models, and get
over this obsession with 'language'.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
http://www.simonstl.com - XML essays and books
Received on Friday, 16 June 2000 14:09:35 UTC