- From: Elliotte Harold <elharo@metalab.unc.edu>
- Date: Wed, 27 Oct 2004 05:58:22 -0400
- To: Chris Lilley <chris@w3.org>
- CC: Norman Walsh <Norman.Walsh@Sun.COM>, www-tag@w3.org
Chris Lilley wrote: > EH> The XML specification does *not* require that the value of an xml:lang > EH> attribute be ASCII. Well-formed XML documents with meaningful infosets > EH> can have xml:lang="Français". > > You will need to demonstrate that, with reference to the productions of > XML, before I can accept it. Currently I consider it an erroneous > assertion. I understand that you would like XML to be that way, but you > need to demonstrate that it is. Easy, it's production 10: [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" The xml:lang is not treated specially by XML processors (aside from the pre-mapping of the xml prefix). It is just like any other attribute, to which some processes may choose to assign particular meaning; but to the XML parser it's just another attribute. > EH> There is no reason to restrict chunk > EH> equality to documents that use RFC 3066 language tags. > > You deleted my quotation from the XML spec that said exactly that - > xml:lang takes an RFC 3066 language tag, or "". I don't find ignoring > the quote to be convincing argument. OK, I really hoped I wasn't going to have to say this, but I will clime down into the much. In Clintonesque fashion, it depends on the meaning of the word "are". The relevant quote in the spec is, "The values of the attribute are language identifiers as defined by [IETF RFC 3066], Tags for the Identification of Languages, or its successor; in addition, the empty string MAY be specified." Note what this is not: 1. It is not a BNF production 2. It is a not a well-formedness constraint. 3. It is not a validity constraint. 4. It is not a compatibility constraint 5. It is not any other sort of error. The XML spec is very careful to explain exactly what is and is not required of XML documents. It carefully defines and uses terms like MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL and indicates that these, "when EMPHASIZED, are to be interpreted as described in [IETF RFC 2119]". The word "are" is not in this list. There are no grounds in the spec for interpreting this as any sort of constraint on the content of legal XML documents. It's simply mildly sloppy writing. Historically, there were BNF productions in the first edition of the XML 1.0 spec that seemed to suggest that this was a well-formedness issue. However, those BNF productions were not actually reachable form any other productions so they had no effect. Furthermore, they were deliberately and intentionally removed from the 2nd edition of the XML 1.0 specification to make it really clear that this was not a well-formedness issue. Bottom line: any well-formed attribute value is a legal value for an xml:lang attribute. It's not wise to use such a value, but a finding such as this must cover all legal XML infosets, not merely the non-perverse ones. -- Elliotte Rusty Harold elharo@metalab.unc.edu XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
Received on Wednesday, 27 October 2004 09:58:27 UTC