Re: [widgets] white space handling

On Dec 18, 2009, at 16:36 , Cyril Concolato wrote:
> Le 18/12/2009 15:58, Robin Berjon a écrit :
>> I don't think that looking at XHTML is the best idea if you want a normative definition for XML :)
> I agree but the XML spec is so indigestible sometimes that it's hard to find the proper info. It was a bit digested in XHTML :)

Heh, fair enough. I think ed5 is decently readable, but maybe that's due to years reading the previous versions...

>> P+C doesn't tie processors to a particular version of XML, and lists its white space characters accordingly (and defensively). If you're certain that you will only ever get content that comes from a conforming XML 1.0 implementation, then you probably don't need to check for this.
> I don't read it like that. P&C explicitely references XML 1.0 and never mentions 1.1. So I thought the behavior was conformant to 1.0. It's fine if the spec also handles 1.1 but it should be mentioned. Also the rationale for the choices of space characters should also be indicated and the differences between XML 1.0 and XML 1.1 should be present.

I beg to differ. I think that we should build specifications that can handle future changes to the stack without listing all the versions that are supported. P+C is built for XML 1.0, and it's great that it has the resilience to handle changes to 1.1 without a hitch — but who knows what XML 4.2 might add? We can't guarantee that it'll work, but we can try (and if it does work, I don't think that we should list it either). I certainly don't think that it's the right place to document potential differences between versions of XML — as your XHTML example shows, that kind of information goes stale.

Furthermore, I didn't say that the differences between XML 1.0 and 1.1 are the rationale for this choice — I was merely indicating that using 1.1 you could get such characters and that P+C's robustness against that was a plus. I wasn't in Marcos's brain when that part was written but my specification exegesis antennae suspect that the listed class of characters corresponds to the Unicode white space character class (and therefore to what Unicode-aware processors would consider white space, notably \s in regular expressions).

-- 
Robin Berjon - http://berjon.com/

Received on Friday, 18 December 2009 17:07:59 UTC