- From: Rick Jelliffe <ricko@topologi.com>
- Date: Wed, 9 Apr 2003 20:26:51 +1000
- To: "Chris Lilley" <chris@w3.org>
- Cc: <www-tag@w3.org>
From: "Chris Lilley" <chris@w3.org> > I really, really want to avoid the situation where an XML file is well > formed over the wire but ceases to be well formed when the server or > other backend, filesystem-based processor manipulates it because the > charset parameter is not present and the encoding declaration is > wrong. Yes, it is increasingly important. It would be great to deprecate text/xml. I don't know whether it will be possible to "stop" people or not, but worth a try. > Transcoding proxies do exactly that - make XML documents not well > formed. the solution is to stop the dumb proxies breaking documents > and if you can't stop them, then just don't use text/xml.. For a robust system, every layer needs to have 1) formats accurately labelled for the next layer to dispatch and dissect and 2) mechanisms to test that the label is feasible for the data found. For example, in Internet protocols not only does a packet say "I am UDP" it also provides a checksum that can be used to verify. Above the XML level, not only does an element information set say "I am element x in namespace y" but we also can have a schema to validate it. For robustness, labelling needs to be paired with verification even if the verification is statistical or optional. There are a handful of methods typically available for verification (error-detection): notably checksums, parsing and redundant codes.[1] XML 1.0 advanced textual formats by providing a workable labelling mechanism for encoding. But we need a verification mechanism too:-- when we go up the protocol stacks XML is somewhat of a weak link. For encoding error-detection, XML 1.1 takes one small step backwards (by opening up the characters used in names) but then takes a very large step forwards (by not allowing most C1 control characters directly). (The C1 controls are roughly U+0080-U+009F: reserving these is enough to detect many common encoding errors, in particular mislabelling character sets --such as Big 5 or Win 1252 "ANSI"-- as ISO 8859-1.) It is not enough to huff and puff...oops...deprecate text/xml! In concert with deprecation XML needs to reserve enough redundant Unicode code points in critical unused areas so that XML processers can detect as many character- encoding-labelling errors as they can. This is also true with application/xml*. I hope the TAG will encourage the XML Core WG to improve and not dump the C1 restrictions proposed in XML 1.1. Cheers Rick Jelliffe [1] For the specific meaning of redundant code see for example http://www.fb9dv.uni-duisburg.de/education/fce1/material/codes.pdf
Received on Wednesday, 9 April 2003 06:23:08 UTC