- From: Chris Lilley <chris@w3.org>
- Date: Fri, 11 Apr 2003 18:45:49 +0200
- To: Paul Grosso <pgrosso@arbortext.com>
- CC: www-tag@w3.org
On Friday, April 11, 2003, 4:28:29 PM, Paul wrote: PG> At 19:20 2003 04 11 +1000, Rick Jelliffe wrote: >>XML can, nothing else can, we need it, it is possible, therefore XML should. >>Have any users requested to the XML Core WG that XML should be >>made less reliable? PG> No, but we're not talking about making it less reliable, we're talking PG> about leaving it as reliable (in this area) as it is currently in XML 1.0. PG> And many users have requested backward compatibility with XML 1.0. PG> paul PG> p.s. Despite my arguments, I'm still not sure what the right answer is. PG> But personally, I'd like to hear cost/benefit analyses from folks on both PG> sides, or this will likely be decided merely by intensity of discussion. Well, see my comments about what that area actually contains. http://lists.w3.org/Archives/Public/www-tag/2003Apr/0074.html Backwards compatibility is all very well in general; but backwards compatibility with stuff that was either a) errors, or b) stuff people had no business doing does not rate very highly. Its not so much decreased backwards compatibility as removing an inadvertent loophole. Unlike Rick I am not making this argument on the basis of the ease of detecting encoding labelling or conversion errors; rather, on the basis of those non-printing characters having no basis being in a marked up document. I mean, start of string? end of guarded area? I think that Unicode Technical Report #20 agrees with me: Unicode in XML and other Markup Languages Unicode Technical Report #20 W3C Note 18 February 2002 http://www.w3.org/TR/unicode-xml/ see in particular 2.2 Overlap of Control Code and Markup Semantics http://www.w3.org/TR/unicode-xml/#Overlap > When markup is not available, plain text may require control > characters. This is usually the case where plain text must contain > some scoping or attribute information in order to be legible, i.e. > to be able to transmit the same content between originator and > receiver. Many of these control characters have direct equivalents > in particular markup languages, since markup handles these concerns > efficiently. If both characters and their markup equivalents may be > present in the same text, the question of priority is raised. > Therefore it is important to identify and resolve these ambiguities > at the time markup is first applied. PG> [1] http://lists.w3.org/Archives/Member/chairs/2002JulSep/0128 PG> [2] To quote from [1], it said: PG> The removal of direct representation of control characters in the range PG> #x7F-#x9F represents a change in well-formedness. That is, well-formed PG> XML 1.0 documents which contain these characters do not become PG> well-formed XML 1.1 documents simply by changing their version number. PG> Occurrences of control characters must also be converted to numeric PG> character references. Yes. And as you say, its an evaluation of cost/benefit ratio. The number of such documents is not very large; of that number, the vast majority are erroneous, incorrectly labelled encoding and will be *helped*, by being made not well formed. They will be noticed, and fixed, and the correct codepoints used for euro and typographic quote and so fort rather than some software ignoring the control codes and some silently fixing it up in a 'we know you really meant windows code page 1252' manner. The rest, a very small number, have no business using those control codes and are a security risk in terms of setting terminals into odd configurations. And such bogus use is still permitted, as long as people really do want them, by escaping the odious control characters. PG> As a criterion for exiting CR, the XML Core WG will collect evidence PG> substantiating (or contradicting) our opinion that: PG> 1) converting characters in the #x7F-#x9F range to numeric PG> character references while updating XML 1.0 documents to XML 1.1 does PG> not represent a significant obstacle to adoption of XML 1.1; I concur with this observation PG> 2) there are no significant scenarios where converting characters PG> in the #x7F-#x9F range to numeric character references is impractical or PG> impossible; Yes PG> 3) that the benefits of this change to the proper detection of PG> character encoding represent a significant improvement in PG> interoperability. Yes. -- Chris mailto:chris@w3.org
Received on Friday, 11 April 2003 12:45:54 UTC