Re: 2. Effect of normalisation step on the DOM/Infoset from Chris Lilley on 2005-01-20 (public-xml-id@w3.org from January 2005)

From: Chris Lilley <chris@w3.org>
Date: Thu, 20 Jan 2005 21:35:27 +0100
To: Ian Hickson <ian@hixie.ch>
Cc: Norman Walsh <Norman.Walsh@Sun.COM>, public-xml-id@w3.org
Message-ID: <1101901087.20050120213527@w3.org>

On Thursday, January 20, 2005, 5:52:52 PM, Ian wrote:

IH> On Thu, 20 Jan 2005, Norman Walsh wrote:
>> |> 
>> |> On the contrary, I think the purpose of attribute value normalization is
>> |> so that down-stream processes will see the normalized value.
>> |
>> | This causes a backwards-compatibility issue. A document processed by a
>> | DOM-aware XML processor will create a different DOM than one processed by
>> | a DOM-aware XML processor with XML ID support.
>> 
>> This issue already exists. Consider:
>> 
>> <!DOCTYPE test SYSTEM "test.dtd">
>> <test id=" test "/>
>> 
>> Assuming that test.dtd defines the 'id' attribute as an ID, then some
>> parsers will see that attribute value as " test " and some will see it
>> as "test" depending on whether or not they process the external 
>> declaration.

IH> However, in practice, Web browsers don't validate,

Validation is orthogonal to ID assignment (and fixed attribute
processing and entity expansion, etc). In practice, some browsers
will fetch test.dtd and some will not (and some will pretend to but use
an internal copy that may or may not be the same).

IH> so the problem is only theoretical.

No, its a very practical and real problem that causes significant lack
of interop.

IH> xml:id makes the problem much more relevant by requiring Web
IH> browsers to do normalisation on an attribute where legacy
IH> implementations do not.

Yes, but at the same time it also requires them to change other aspects
of that attribute too, such as its type.

>> The xml:id specification improves the situation by encouraging uniform
>> behavior (irrespective of validation or processing of the external 
>> subset) for attributes named "xml:id".

IH> This assumes that all implementations, including the current installed
IH> base, implements xml:id at the same time. This is obviously not going to
IH> happen. During the transition period, and even after the transition period
IH> if not all UAs support xml:id, differences will be visible to script.

Yes - either xml:id will be a (reserved) random attribute, or it will be
an ID.

IH> Previous situations of a similar nature have proved to cause _huge_ 
IH> problems to authors. (For example, CSS selectors being case insensitive in
IH> HTML but case sensitive in XHTML is a massive source of confusion.)

>> Adoptiong the resolution that I believe you would prefer, namely that
>> xml:id processing would use the value presented in the infoset without
>> any additional normalization, perpetuates the existing interoperability
>> problems.

IH> What existing interoperability problems?

Web HTML browsers are not the only XML implementations. Even in that
space, there are variations - for example, the XML parser in Win/IE
fetches the external DTD subset while that used in Mozilla and Opera
does not. I'm not sure what happens in Konqueror or Safari.

-- 
 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 Member, W3C Technical Architecture Group

Received on Thursday, 20 January 2005 20:35:27 UTC