- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 12 May 2003 13:34:46 -0400
- To: "Martin Gudgin" <mgudgin@microsoft.com>
- Cc: "Aman Singh" <haramansingh@hotmail.com>, xml-dist-app@w3.org, xml-dist-app-request@w3.org
Exactly. It may also be helpful to point out some information on UTF-8,
such as [1] and in particular [2].
[1] http://www.cl.cam.ac.uk/~mgk25/unicode.html
[2] http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
------------------------------------------------------------------
Noah Mendelsohn Voice: 1-617-693-4036
IBM Corporation Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
"Martin Gudgin" <mgudgin@microsoft.com>
Sent by: xml-dist-app-request@w3.org
05/12/2003 04:59 AM
To: "Aman Singh" <haramansingh@hotmail.com>, <xml-dist-app@w3.org>
cc: (bcc: Noah Mendelsohn/Cambridge/IBM)
Subject: RE: encoding missing in xml declaration
Moving to xml-dist-app for discussion
> -----Original Message-----
> From: xmlp-comments-request@w3.org
> [mailto:xmlp-comments-request@w3.org] On Behalf Of Aman Singh
> Sent: 09 May 2003 18:33
> To: xmlp-comments@w3.org
> Subject: encoding missing in xml declaration
>
>
>
>
>
> Dear Sir/Madame:
>
> In the document SOAP Version 1.2 Part 0: Primer with status
> of Proposed Recommendation, I noted the following issue.
>
> In Example 4, the xml declaration is <?xml version='1.0' ?>
> without any encoding attribute, therefore the value of
> encoding defaults to utf-8.
Correct.
> Within the same soap message, an element is found with french
> characters.
>
> <n:name xmlns:n="http://mycompany.example.com/employees">
> Åke Jógvan Øyvind
> </n:name>
UTF-8 is able to encode all Unicode characters.
According to The Unicode Standard Version 3.0, the character codes are as
follows:
----------------------------------------
| Glyph | Hex Code | Bit pattern |
----------------------------------------
| Å | 00C5 | 11000011 10000101 |
----------------------------------------
| ó | 00F3 | 11000011 10110011 |
----------------------------------------
| Ø | 00D8 | 11000011 10011000 |
----------------------------------------
>
> This is incorrect according to the XML 1.0 Recommendation
> unless the characters are escaped with the values.
I do not understand how you draw this conclusion. XML 1.0 only requires
that characters be escaped if they cannot be encoded natively in the given
encoding. As UTF-8 can encode all of Unicode, not escaping is needed.
> According
> to my knowledge, two things could be done at this point by
> modifying Example 4's text:
>
> 1.) Add an encoding attribute to the xml declaration <?xml
> version='1.0'
> encoding='ISO-8859-1' ?>
> 2.) Change the element to
> <n:name xmlns:n="http://mycompany.example.com/employees">
> Åke Jógvan Øyvind </n:name>
>
> making it a well formed xml document (due to assumption of
> encoding="utf-8")
I think the example is fine as is.
Gudge
Received on Monday, 12 May 2003 13:44:11 UTC