- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 12 May 2003 13:34:46 -0400
- To: "Martin Gudgin" <mgudgin@microsoft.com>
- Cc: "Aman Singh" <haramansingh@hotmail.com>, xml-dist-app@w3.org, xml-dist-app-request@w3.org
Exactly. It may also be helpful to point out some information on UTF-8, such as [1] and in particular [2]. [1] http://www.cl.cam.ac.uk/~mgk25/unicode.html [2] http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------ "Martin Gudgin" <mgudgin@microsoft.com> Sent by: xml-dist-app-request@w3.org 05/12/2003 04:59 AM To: "Aman Singh" <haramansingh@hotmail.com>, <xml-dist-app@w3.org> cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: RE: encoding missing in xml declaration Moving to xml-dist-app for discussion > -----Original Message----- > From: xmlp-comments-request@w3.org > [mailto:xmlp-comments-request@w3.org] On Behalf Of Aman Singh > Sent: 09 May 2003 18:33 > To: xmlp-comments@w3.org > Subject: encoding missing in xml declaration > > > > > > Dear Sir/Madame: > > In the document SOAP Version 1.2 Part 0: Primer with status > of Proposed Recommendation, I noted the following issue. > > In Example 4, the xml declaration is <?xml version='1.0' ?> > without any encoding attribute, therefore the value of > encoding defaults to utf-8. Correct. > Within the same soap message, an element is found with french > characters. > > <n:name xmlns:n="http://mycompany.example.com/employees"> > Åke Jógvan Øyvind > </n:name> UTF-8 is able to encode all Unicode characters. According to The Unicode Standard Version 3.0, the character codes are as follows: ---------------------------------------- | Glyph | Hex Code | Bit pattern | ---------------------------------------- | Å | 00C5 | 11000011 10000101 | ---------------------------------------- | ó | 00F3 | 11000011 10110011 | ---------------------------------------- | Ø | 00D8 | 11000011 10011000 | ---------------------------------------- > > This is incorrect according to the XML 1.0 Recommendation > unless the characters are escaped with the values. I do not understand how you draw this conclusion. XML 1.0 only requires that characters be escaped if they cannot be encoded natively in the given encoding. As UTF-8 can encode all of Unicode, not escaping is needed. > According > to my knowledge, two things could be done at this point by > modifying Example 4's text: > > 1.) Add an encoding attribute to the xml declaration <?xml > version='1.0' > encoding='ISO-8859-1' ?> > 2.) Change the element to > <n:name xmlns:n="http://mycompany.example.com/employees"> > Åke Jógvan Øyvind </n:name> > > making it a well formed xml document (due to assumption of > encoding="utf-8") I think the example is fine as is. Gudge
Received on Monday, 12 May 2003 13:44:11 UTC