Re: Status of Bugzilla Bug 10, Round-tripping various information in properties

Wilfredo Sánchez Vega wrote:
> ...
>> - there's no W3C spec that would license an XML processor to throw 
>> away prefixes
> 
>   I showed you an example of python code that parses XML and spits it 
> right back out and the namespace prefixes were edits.  You said that 
> wasn't a bug in PyXML.

*Parsers* are not allowed to throw away prefixes. *Serializers* are 
separate specs, and have been neglected by the W3C for quite some time. 
I think DOM Level 3 talks about serialization.

>   If PyXML is not a compliant XML library because it rewrites the 
> prefixes, then I agree completely.  PyXML is broken and should be fixed, 
> and we should move on.  But if that's not true, I don't think a server 
> write should have switch to another library or do some haquery in order 
> to preserve them.

That's hard to tell unless PyXML claims to conform to some particular 
specification that actually defines serialization.

>   Geoff asserts that namespace preserving parsers are easy to come by.  
> That may be true for Geoff, and maybe he has a lot of flexibility in 
> parser choices because he has no other such constraints.  I don't think 
> that's true for everyone.

Didn't we find out last time that it's not the parser (input), but the 
serializer (output)?

>   Furthermore, in the case where a namespace is defined outside of the 
> property container, some rewriting is required in order to maintain the 
> association with the target URI.  So now we have language in 4.4 saying 
> that changing the property container's prefix, unlike it's contents, are 
> not significant, and suggest in 4.4.1 suggesting that we should rewrite 
> that tag to put all of the namespaces in there (seems like that should 
> be explicitly stated in 4.4 as a SHOULD as well, for consistency, unless 
> we think tossing the namespace declarations out is OK, which is the 
> opposite of what I'd expect).

The spec doesn't say where the namespace declarations need to sit, as 
long as they're there. And no, I wouldn't want to mandate this, because 
that reallys seems to be irrelevant.

>   Aside from the mild weirdness of "it's significant, except for over 
> here"... now it's not just good enough to use a parser that can 
> preserves prefixes, but also I need it to let me specify that I can't 
> edit them, "except for this in one element, where I need you to do the 
> rewriting that you may otherwise have elsewhere."  What DOM methods 
> exist for that sort of thing?

What does this have to do with editing?

>> - there are multiple W3C specs that use prefixes in attribute values 
>> (or even text content), including XSLT and XML Schema
> 
>   And I'm all for servers that care about enabling the use of these 
> specs bending over to accommodate them, but I don't think that other 
> servers need to care.

So you'd be happy would be "SHOULD" be a "MAY"?

>   Additionally:
> 
>     http://www.w3.org/TR/REC-xml-names/
> 
>   Section 3: "Note that the prefix functions only as a placeholder for a 
> namespace name." (Note "only" is emphasized in the text...)

Too bad that other W3C specs have started to use the prefix anyway, and 
thus XML Infoset also considers it significant.

>   It's not unreasonable to think that having parsed that data, when 
> rendering it back out, the placeholder might be relabeled without 
> negative consequences.  Regardless of that assumption, that sentence 
> makes it pretty clear than using the prefix for any other purposes is 
> inconsistent with REC-xml-names.

And that means that the world of W3C XML specs internally is 
inconsistent. Yes. I know. Everybody knows. There's nothing I can do 
about it.

The whole discussion started because the draft made the incorrect 
statement that prefixes do not need to be preserved because they are 
insignificant. Well, they aren't for certain applications, so this 
needed to be fixed.

>> A property value serialized as #PCDATA (thus as escaped XML) is 
>> something else than a property value serialized as XML. If you control 
>> the format, such as when you define the property in a spec, you sure 
>> have the freedom to say it's text, instead of XML. But this requires 
>> that senders and recipients agree on that. But in general, a client 
>> doesn't have that choice.
> 
>   If you define the property, senders and receivers always have to agree 
> to honor your definition.  That's not unique here.  To me that's an 
> argument for saying that such definitions SHOULD use #PCDATA.
> 
>   Geoff, I'm not following your argument that some clients might not 
> de-serialize the data.  If you have XML there, you have structured 
> data.  Either the client understands the property and does handles it 
> properly or it doesn't.

There are generic clients out there that display *every* property. How 
are they supposed to know that some text indeed consists of escaped XML?

>> Furthermore, putting escaped XML into property values *will* have 
>> negative effects on generic clients (which will display angle brackets 
>> to the user) and some protocol extensions (such as DASL).
> 
>   What's a generic client going to show the user for structured XML data 
> that it doesn't understand?  Generic client tends to refer to a tool for 
> l33t haxx0rz.

In general, I'd expect them to only display the text content.

> 
>   Perhaps I shouldn't get all bent out of shape over a SHOULD.

:-)

Best regards, Julian

Received on Friday, 23 December 2005 09:33:42 UTC