- From: Joshua Allen <joshuaa@microsoft.com>
- Date: Wed, 30 Apr 2003 08:45:03 -0700
- To: <robin.berjon@expway.fr>
- Cc: <www-tag@w3.org>
> > This dance of "strcmp for namespaces and canonicalization for anyUri" > > has led to a status quo that is buggy and ambiguous, particularly for > > any non-trivial scenario. > > Could you exemplify more clearly? OK. Based on what I've seen of our XML APIs, there are a surprising variety of places where URI equivalence testing is encountered. Some of these places include (I am sure this is not comprehensive, but good enough for illustration): 1) checking identity constraints 2) testing namespace equivalence; for example for purposes of duplicate attribute detection 3) checking to see if a particular schema or entity has already been cached/compiled in a collection 4) testing for circular references And each of these has the facet of relative vs. absolute: a) Some scenarios allow URIs to be absolutized before comparing b) Some scenarios assume that the URI is absolute already c) Some URIs are impossible to evaluate to determine if they are relative or not Now start mixing these factors. One of the examples I have seen posted on this list is something like: <e xmlns:ns1="http://bar" ns1:a="1" xmlns:ns2="http://bar" ns2:a="2" /> This illustrates one case in the very common scenario of "compound document processing". When people construct XML documents from multiple sources, you see variations on this (namespaces redundantly declared with different prefixes, elements or attributes jammed inline, etc.) And I would guess that about 90% of existing processors would do the right thing here (throw an error). However, interop gets put to the test if the namespaces include any URL-encoded characters or worse yet, Unicode, or even worse yet, MBCS. Since the parts of the document are coming from multiple sources, it is impossible to guarantee that the all choose to "normalize" their namespace URIs the same way. If you built a test matrix including these sorts of combinatorial cases, I expect you would find consistency of behavior across implementations drop very quickly. In fact, this particular scenario gets more difficult, IMO, as people are swayed by the RDDL folks into using HTTP identifiers as namespace names. Two different divisions in the same company may use what they *think* is the same namespace for their documents (and when they click on the namespace name it sure enough connects them to the right place, so how are they to know differently?). Furthermore, all of their XML documents work fine within their own department. It is only two years later when corporate IT tries to combine those documents that things break. In fact, depending on how much of the data is affected, it's quite possible that IT won't even notice, and will just blindly throw away bits of data. And in this RDDL scenario, it is rather difficult for a vendor to argue with the burnt customer and say "screw you, you should have realized that we were going to do strcmp" as if the guy even knows what that means. You *could* argue that the above scenario is adequately specified, but that's little consolation. We have seen "bugs" like this with respect to identity constraints as well. When you hit scenarios that mix the different URI comparison conventions, the customer expectations can become confused and the consistency across implementations becomes less trustworthy (exacerbating customer confusion about expectations - "my Perl regex parser handles this just fine, and so does Xalan; why the heck does MSFT have so much money and can't even write a simple XML processor?"). Thanks, Joshua
Received on Wednesday, 30 April 2003 11:45:21 UTC