Re: essential test cases? from keshlam@us.ibm.com on 2000-06-14 (xml-uri@w3.org from June 2000)

From: <keshlam@us.ibm.com>
Date: Wed, 14 Jun 2000 14:22:42 -0400
To: Dan Connolly <connolly@w3.org>
cc: xml-uri@w3.org
Message-ID: <852568FE.0064F0C9.00@D51MTA03.pok.ibm.com>
Pure brainstorming on how the Namespace decision might impact the DOM:

DOM Level 2 currenly assumes what amounts to the Literal interpretation --
the namespace name is just stored and retrieved as a string.

Forbid would not require any redesign, since absolute URIRefs can also be
stored and compared literally. We _could_ add a syntax-check to make sure
that attempts to use relative syntax were caught and rejected, or we could
leave that to the caller, or we could leave that as a
quality-of-implementation issue.



Absolutize opens a few can of worms and seems to require some real redesign
work.  Note that this all has storage/computation implications which may
impact the suitability of the DOM as a model for some tasks.

1) What are we absolutizing in terms of? DOM L2 doesn't guarantee that the
base URI can be retrieved. If you have Entity Reference nodes, you can
search upward until you find one; if you hit the top of tree, you can then
assume the base URI of the Document. But folks insisted that the DOM allow
"flattening" of Entity References; if that's done, these wrapper nodes are
discarded and there's no good place to hang the context-change information.
This also runs into the annoying question of what the base URI is of an
entirely synthetic document and how to absolutize in that case.

2) Who's absolutizing? Arguably, since the absolute URI is "the real
namespace identity" in this scenario, nobody should be asking the DOM to
deal with anything else. Among other things, asking us to recheck a name
that's been previously checked is a waste of cycles. On the other hand, if
someone went to the trouble of specifying a relative name they might expect
to be able to use it even though it _ISN'T_ the real name.

3) What's stored? For round-tripping purposes, a namespaced node would have
to know both the absolute URI (because that would be the "real" namespace
identity that lookups and attribute-conflict detection would want to use)
and the "as typed" form (because when we serialize the DOM to XML syntax,
we can _NOT_ lose the fact that it was relative, since that would nail down
an interpretation that the document's author explicitly decided to leave
fuzzy). Nodes probably do NOT want to carry more pointers than they have
to, and we don't want to risk these strings getting out of synch (last
thing you want is to let someone claim that http://foo/bar/baz has the
serialized form "../somethingElse"!)... which seems to suggest that
namespace declarations now become objects in their own right, carrying both
strings. That means an additional layer of indirection each time you want
to check the namespace, either in your own code or inside the DOM. And it
seems to be more complexity than the Infoset is tracking....

4) Inconsistant serialization can result. What happens if you have a node
whose as-written namespace is "..\foo", and DOM editing moves/copies it to
a portion of the tree where a different base URI is in effect, followed by
writing this out as XML syntax? As far as I can tell, the answer is that we
output the relative form and simply accept the fact that when the document
is read back in the node will be in a different namespace -- just as if the
whole document had been relocated in the meantime. That's ugly but
apparently inherent in relative namespaces.



Undefined opens a somewhat different can of worms: It isn't clear that a
definitiion could be installed later without breaking significant numbers
of documents/code when an interpretation does become agreed upon... and
we've just seen how much pushback there can be over a behavioral change.


______________________________________
Joe Kesselman  / IBM Research
Received on Wednesday, 14 June 2000 14:23:27 UTC