W3C home > Mailing lists > Public > xml-dist-app@w3.org > May 2003

RE: encoding missing in xml declaration

From: <noah_mendelsohn@us.ibm.com>
Date: Mon, 12 May 2003 13:34:46 -0400
To: "Martin Gudgin" <mgudgin@microsoft.com>
Cc: "Aman Singh" <haramansingh@hotmail.com>, xml-dist-app@w3.org, xml-dist-app-request@w3.org
Message-ID: <OF8BD4EAAC.6AC2F229-ON85256D24.0058B23F@lotus.com>

Exactly.  It may also be helpful to point out some information on UTF-8, 
such as [1] and in particular [2].

[1] http://www.cl.cam.ac.uk/~mgk25/unicode.html
[2] http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------







"Martin Gudgin" <mgudgin@microsoft.com>
Sent by: xml-dist-app-request@w3.org
05/12/2003 04:59 AM

 
        To:     "Aman Singh" <haramansingh@hotmail.com>, <xml-dist-app@w3.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        RE: encoding missing in xml declaration



Moving to xml-dist-app for discussion 

> -----Original Message-----
> From: xmlp-comments-request@w3.org 
> [mailto:xmlp-comments-request@w3.org] On Behalf Of Aman Singh
> Sent: 09 May 2003 18:33
> To: xmlp-comments@w3.org
> Subject: encoding missing in xml declaration
> 
> 
> 
> 
> 
> Dear Sir/Madame:
> 
> In the document SOAP Version 1.2 Part 0: Primer with status 
> of Proposed Recommendation, I noted the following issue.
> 
> In Example 4, the xml declaration is <?xml version='1.0' ?> 
> without any encoding attribute, therefore the value of 
> encoding defaults to utf-8. 

Correct.

> Within the same soap message, an element is found with french 
> characters.
> 
> <n:name xmlns:n="http://mycompany.example.com/employees">
>            ke Jgvan yvind
> </n:name>

UTF-8 is able to encode all Unicode characters. 

According to The Unicode Standard Version 3.0, the character codes are as 
follows:

----------------------------------------
| Glyph | Hex Code | Bit pattern       |
----------------------------------------
|      |  00C5    | 11000011 10000101 |
----------------------------------------
|      |  00F3    | 11000011 10110011 |
----------------------------------------
|      |  00D8    | 11000011 10011000 |
----------------------------------------

> 
> This is incorrect according to the XML 1.0 Recommendation 
> unless the characters are escaped with the values. 

I do not understand how you draw this conclusion. XML 1.0 only requires 
that characters be escaped if they cannot be encoded natively in the given 
encoding. As UTF-8 can encode all of Unicode, not escaping is needed.

> According 
> to my knowledge, two things could be done at this point by 
> modifying Example 4's text:
> 
> 1.) Add an encoding attribute to the xml declaration <?xml 
> version='1.0' 
> encoding='ISO-8859-1' ?>
> 2.) Change the element to
> <n:name xmlns:n="http://mycompany.example.com/employees">
>           &#197;ke J&#243;gvan &#216;yvind </n:name>
> 
> making it a well formed xml document (due to assumption of 
> encoding="utf-8")

I think the example is fine as is.

Gudge
Received on Monday, 12 May 2003 13:44:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:14 GMT