Re: A new proposal (was: Re: which layer for URI processing?)

From: David G. Durand <david@dynamicdiagrams.com>
Date: Wednesday, May 24, 2000 5:39 PM
[...]
>>Supose I use XSLT to filter a document to ensure it doesn't have
>>any of an http://example.com/detonator namespace in it, because processing
>>this
>>would allow the document to destroy the chemical plant.
>>The XSLT sees "/detonator" in an incoming document
>>http://example.com/doc.xml
>>but it does not notice it as it does not absolutize it. The checked result
>>is
>>passed to the main control system. However, when
>>this "upper layer" runs it absolutizes it to find out what in upper layer
>>terms it really means, and
>>instantiates a chemical plant handler to handle the
http://example.com/foo.
>>Bang.
>
>This is an excellent example of why software should be able to avoid
>absolutizing URIs because that process depends on the knowledge of a
>base URI, and this is demonstrably fragile information, especially
>with regards to the unique identification of singular objects. The
>use cases around relative URIsfor namespaces all revolve around the
>rare applications that can deal with namespaces that are not globally
>unique.


No, this is not such an example.  The chemical plant did not
blow up because of a "fragile" base URI.  It blew up because the
base URI which was clear to all parties was not used to absolutize the
relative URI. It blew up because the definitions of idenity to the "upper
layers" and the "lower layers" were different.  It refutes the argument that
the comparisons can be done differently by different layers.

>The situation you describe can more readily be used to argue that
>allowing relative URIs _at all_ was the real problem.

Let us reason carefully.  The argument demonstrates that if you
use relative URIs  AND compare them lietarlly then
the situation is untenable.   That means that it argues that we must
EITHER forbid relative URIs OR forbid string comparison.
You can't us it to select betwen those two.

For an argument that relative URIs in fact are useful sometimes,
see my "Database..." example.  In the first half of this message
I show how it is quite reasonable for every single web-available
relational database to define its own namespace, and for
engineering reasons it is pragmatic only to make the URI
for that namespace (for which an online virtual schema is
indeed available) very close to that of the rest of the data
from the database.


  >This is
>especially true, since the stated goal of namespaces was to supply
>globally-unique names to enable definition management in a very
>loosely coupled distributed system.


In the database example, the namespace is as globally unique as
the data, and the persistence properties are the same as that of the
database.  It makes sense to use the same URI scheme.

[..]
>>You can do different consistent things betwen different layers but you
>>cannot mess with identity of things common to both layers.
>
>There is a claim that we can't forbid relative URIs, and the bug you
>describe, because documents exist that use them. We have no way of
>determining the extent of this problem.


Not me, AFAIK it was Microsoft who made that argument, but when
pressed for a use case thy produced one whose schema URI
turned out to be absolute.  (Weird, but absolute  ;-).

I make two independent arguments:

(1) Thought experiments such as the database example, and
(2) The technical absurdity of making an exception for namespaces.
*[JPP-flag]

>There are also documents in existence that depend on the current
>definition of identity in the namespaces specification. We can't
>quantify the number of those documents either.


I have pressed for a real live example but none has been forthcoming.

>Requiring absolutization will fix this problem, but will break an
>unknown number of documents that were made in the belief that the
>namespaces spec. means what it says.


It will not break them all.  It will only break where
- relative URIs are used
- XML processing is done which involves morethan one document and in more
than one base URI.
- it happens that the relative URI of two documents is the same, but they
absolutize to different URIs, or the relative URIs are different but they
refer to the same absolute URI even though the instanceauthor didn't intend
it.

I have seen no rush of people pointing out examples, so I am not convinced
of the damage.


>Eliminating URI reference syntax and requiring absolute URIs for
>namespaces also fixed the problem.
yes
>It does not, under any reasonable
>interpretation conflict with the relevant RFCs. There is at least one
>extant example in the HTML BASE tag, as well as in the http protocol
>iteself. However, this solution also breaks documents, in this case,
>those that contain relative URI references.


Yes, if there are any,
Forbidding their useis also forbigs thye database example, and hte
self-referential
schema. I have argued this before.

>Keeping things as they are but explicitly deprecating relative URIs
>and documenting the possible problems, the solution chosen earlier
>inside the W3C, avoids the evil of turning conformant data
>non-conformant,

As Larry Masinter points out, that does not in fact hurt any systesms. It is
just words.
What hurts is the systems which fail as in the chemical works explosion.
It would not be evil to make documents non-conforming because they are
in fact theortically liable to fail - and this would protec the users.

>and recognizes the potential for erroneous behavior
>where relative namespace IDs are used.
>
>There are no perfect solutions to the situation we are in.


There are asymetrically perfect solutions if you make th clean decision now,
and take the cost before it rises. (absolutize).

>The use of absolute URIs only is the best fit to the goals of the
>namespace specification, but breaks documents.
>
>The requirement of full relative URI processing as a part of
>namespaces is not that close a fit to the design goals of namespaces,

No - it is just a function callin the right place.

>and breaks a different set of documents in a different way. It does
>leave open an attractive evolutionary path to a more dynamic
>namespace mechanism, one based on retrieval of content. This
>evolution depends critically on the completion of other
>infrastructure is in place, e.g. a standard format for namespace
>definitions,

No, the power of defining this as a URI is that you can define it now
and leave the open quetion of cool namespace description langauges
for the future.  Think of the URI of a photo which started as a GIF
and later became a PNG, and then a video.  The face was the same,
the representation changed - or multiple ones became available.

> and metadata about them, a way to support multiple kinds
>of definition for a single namespace, etc.


Content negotiation already exists, for example.
Also, you can put RDF inside an xml-schema to tell an RDF-aware
processor more. We already have schema validators...this is
all here waiting for a URI hook to be attached to..

>I think the best way to get to the "fully evolved" version of
>namespaces is to do the following:
>
>+ instantly deprecate relative URIs in namespaces. (but preserve the
>status quo in all its yuckiness). We thus preserve a version of
>namespaces that does not break existing documents.
>
>+ instantly create a new revision, namespaces 1.1, that allows only
>absolute URIs. We instantly provide a way for people to avoid getting
>burned by the possible unintended consequences of using relative URIs
>carelessly.
>
>+ publicly declare that namespaces 2.0 will re-introduce relative URI
>references, in a consistent way,

That could work.  Certianly.  We just don't need much technology for:

coordinated with the creation of a
>standard resource body format for referencing multiple namespace
>definitions, that can be stored at a namespace URI. We now provide a
>clean, path to a new meaning for the relative URI syntax.


"New"  is only the standard meaning of a relative URI it has had for the
last 10 years.  The question of the file formats for describing namespaces
is (fortunately) orthognal.  That is the importance of flexibility points
such
as the URI.

>+ declare that the next revision of XML will include a normative
>reference to a specific version of the namespace specification. In
>this way, the XML version can be used to determine what sort of
>namespace processing is appropriate.

Sound cool to me.

> This won't eliminate all
>confusion, but will make it easier for software to check for errors
>and report them. By linking XML and namespaces, we recognize the fact
>that most XML projects are using namespaces, and we introduce a way
>for an application to tell what sort of namespace processing is
>appropriate. We also provide an incentive for people to move to the
>most recent namespace specification over time.


I definitely think that folding NS into XML is a good idea. I know there
have been a lot of discussions about it elsewhere and to completely
discuss it here would be the wrong list. But the thingto do now
would be to commit to the reintroduction of relative names with proper
processing in XML 2.0.

>I'm _way_ behind on this list, so I apologize if this option has been
>raised before, but I don't think it was explicitly raised in the
>internal discussion. This is another kind of compromise: it doesn't
>ease the confusion right away, but it sets a clear direction, and
>gives software authors a way to deal with a complex transition
>strategy.
>
>It is conceivable that one could go right from deprecating relative
>URIs to absolutizing them, but to do that responsibly, the relevant
>namespace retrieval infrastructure and policies would have to be
>created almost immediately. It would clearly be better if time is
>taken to make those decisions carefully.


I don't think you have to wait for a namespace retrieval mechanism,
as I have said.  The only cause for dlay would be to allow any
documents which would break under XML 2.0 to be phased out.
(XML 1.2?)

This suggestion looks pretty good to me. (Maybe I will be more critical
in the morning)

>   -- David
>
>>Tim BL
__________________________


*[JPP-flag]
<story optional>
Once when Kevin Rogers and I were programming some bit
of the CERN Proton Synchrotron Booster control systems in 1980,
we were charged with making a program to make a snapshot
of the state of the controls. Each piece of equipment had
a driver called an Equipment Module.  All the Equipment Modules
could be read out in a loop, except for one.  The magnet power
module wouldn't work, as two of its parameters were reversed.
Think this was a bug, we asked its author, one Jean-Pierre Poitier,
if my memory serves me,
to switch them back. He refused, on the grounds that it was
a deliberate move to make his module difficult to access.
The power supply was an important, dangerous, thing,
and he didn't want anyone to access it without being aware
of how special it was. We argued that there were lots
of dangerous parts of the system, and security didn't come
from breaking standards, but in vain.  In the end, we added a
column to the database of equipment modules. It was a binary
"JPP" flag.  Each equipment module call had two alternative forms
conditional on the JPP flag. The flag was 0 for all modules
except one. We had generalized around it - but
at a cost of a mess.  I hope M. Poitier is he reads this is happy to laugh
about it after 20 years. But since I have always called an
excursion to get around an arbitrary departure from the norm,
brought on by a exaggerated sense of specialness, a JPP flag.
</story>  -TimBL

Received on Wednesday, 24 May 2000 19:10:11 UTC