- From: Geoffrey Sneddon <foolistbar@googlemail.com>
- Date: Wed, 11 Feb 2009 16:07:22 +0000
- To: elharo@metalab.unc.edu
- Cc: Henri Sivonen <hsivonen@iki.fi>, "Henry S.Thompson" <ht@inf.ed.ac.uk>, Anne van Kesteren <annevk@opera.com>, David Orchard <orchard@pacificspirit.com>, www-tag@w3.org
On 11 Feb 2009, at 14:53, Elliotte Harold wrote: > I do agree that the state of XML serialization is rather pathetic, > though. XML is more complex than it appears and the amount of bad > XML generating and escaping code out there is a problem. I tend to > think the response is better libraries, and perhaps integrating some > checks into staic analysis tools. But has this not been the response for the past eleven years? It remains true, eleven years (and one day!) after XML 1.0 was published, that serializers by and large make it possible to output a byte-stream that does not match the XML production. What is to say this will improve over the next eleven years? I know that at least one major issue in PHP (which, to my knowledge, has no fully working serializer), is with the DOM extension which simply implements the DOM Level 3 Load and Save, which actually goes as far as to state: > For nodes of type Document or Entity, well-formed XML will be > created when possible (well-formedness is guaranteed if the document > or entity comes from a parse operation and is unchanged since it was > created). When we have W3C specified serializers that do not guarantee well- formedness, what hope is there? I would guess that the majority of XML produced dynamically online is done through PHP, and when PHP 5 has no working serializer, yet alone the PHP 4 the majority of PHP software still supports (the closest that gets to XML serializing is string concatenation without non- standard cannot-be-relied-upon extensions!), which leaves XML output on the web in a far from brilliant state. In PHP's case at least, there is no native Unicode support so implementing a lot of the character restrictions would be a fair amount of work (even if only UTF-8 supported, there is still the entire overhead of that needed), as well as having a fair computational overhead. With PHP 6 (which will add native Unicode support) still a fair way off, likely to have fairly slow uptake, and the majority of PHP software supporting six year old versions of the interpreter, there is little likelihood of this changing any time soon — maybe it'll be possible in eleven years time… -- Geoffrey Sneddon <http://gsnedders.com/>
Received on Wednesday, 11 February 2009 16:08:09 UTC