RE: encoding missing in xml declaration from Aman Singh on 2003-05-12 (xml-dist-app@w3.org from May 2003)

From: Aman Singh <haramansingh@hotmail.com>
Date: Mon, 12 May 2003 15:34:00 -0400 (EDT)
To: noah_mendelsohn@us.ibm.com, mgudgin@microsoft.com
Cc: xml-dist-app@w3.org, xml-dist-app-request@w3.org
Message-ID: <Sea1-F148qzAoa2JB1n0000d436@hotmail.com>
Thank you replying back and many more thanks for the clarification, however, 
I am still confused.

According to the XML 1.0 W3C Recommendation in Appendix F, the following is 
stated:

-------------------------------------------------------------------------------------------------------------------------------------
F.2 Priorities in the Presence of External Encoding Information
The second possible case occurs when the XML entity is accompanied by 
encoding information, as in some file systems and some network protocols. 
When multiple sources of information are available, their relative priority 
and the preferred method of handling conflict should be specified as part of 
the higher-level protocol used to deliver XML. In particular, please refer 
to [IETF RFC 2376] or its successor, which defines the text/xml and 
application/xml MIME types and provides some useful guidance. In the 
interests of interoperability, however, the following rule is recommended.

If an XML entity is in a file, the Byte-Order Mark and encoding declaration 
are used (if present) to determine the character encoding.
-------------------------------------------------------------------------------------------------------------------------------------

I conducted an XML experiment of my own

Using Notepad (OS: Windows XP) I created two xml files with the following 
content:
<?xml version='1.0' ?>
<root>&#197;�</root>

I saved the first file as of type ANSI encoding and the other as Unicode.
Then I opened them up in Internet Explorer 6 (msxml 4 on the OS).

(Case 1) An error was received while opening the first file.
(Case 2)The second file opened fine.

Another experiment (Case 3)
When I add the following encoding attribute to the xml declaration in Case 1 
and save the file as ANSI, I get a positive result.
<?xml version="1.0" encoding="ISO-8859-1" ?>
<root>&#197;�</root>

What I am trying to get at is what is really used to determine the character 
encoding for SOAP, In Case 1, it was the way file was saved and not the xml 
declaration, But the encoding attribute did take prescendence when it was 
added to the xml declaration (Case 3).

However for SOAP, will it be the transport level inforamtion (i.e HTTP 
Headers) that determine the encoding for the document, or the xml 
declaration?

Is the speficiation ambigious that it is left to the XML parser?

In what context is the xml encoding to be used according to the XML 1.0 
Recommendation?

I am sorry that I am still confused.  Given the experiments, my confusion is 
justified ;)


Best Regards,

Aman Singh



>From: noah_mendelsohn@us.ibm.com
>To: "Martin Gudgin" <mgudgin@microsoft.com>
>CC: "Aman Singh" <haramansingh@hotmail.com>, xml-dist-app@w3.org,   
>xml-dist-app-request@w3.org
>Subject: RE: encoding missing in xml declaration
>Date: Mon, 12 May 2003 13:34:46 -0400
>
>Exactly.  It may also be helpful to point out some information on UTF-8,
>such as [1] and in particular [2].
>
>[1] http://www.cl.cam.ac.uk/~mgk25/unicode.html
>[2] http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
>
>------------------------------------------------------------------
>Noah Mendelsohn                              Voice: 1-617-693-4036
>IBM Corporation                                Fax: 1-617-693-8676
>One Rogers Street
>Cambridge, MA 02142
>------------------------------------------------------------------
>
>
>
>
>
>
>
>"Martin Gudgin" <mgudgin@microsoft.com>
>Sent by: xml-dist-app-request@w3.org
>05/12/2003 04:59 AM
>
>
>         To:     "Aman Singh" <haramansingh@hotmail.com>, 
><xml-dist-app@w3.org>
>         cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
>         Subject:        RE: encoding missing in xml declaration
>
>
>
>Moving to xml-dist-app for discussion
>
> > -----Original Message-----
> > From: xmlp-comments-request@w3.org
> > [mailto:xmlp-comments-request@w3.org] On Behalf Of Aman Singh
> > Sent: 09 May 2003 18:33
> > To: xmlp-comments@w3.org
> > Subject: encoding missing in xml declaration
> >
> >
> >
> >
> >
> > Dear Sir/Madame:
> >
> > In the document SOAP Version 1.2 Part 0: Primer with status
> > of Proposed Recommendation, I noted the following issue.
> >
> > In Example 4, the xml declaration is <?xml version='1.0' ?>
> > without any encoding attribute, therefore the value of
> > encoding defaults to utf-8.
>
>Correct.
>
> > Within the same soap message, an element is found with french
> > characters.
> >
> > <n:name xmlns:n="http://mycompany.example.com/employees">
> >            �ke J�gvan �yvind
> > </n:name>
>
>UTF-8 is able to encode all Unicode characters.
>
>According to The Unicode Standard Version 3.0, the character codes are as
>follows:
>
>----------------------------------------
>| Glyph | Hex Code | Bit pattern       |
>----------------------------------------
>|   �   |  00C5    | 11000011 10000101 |
>----------------------------------------
>|   �   |  00F3    | 11000011 10110011 |
>----------------------------------------
>|   �   |  00D8    | 11000011 10011000 |
>----------------------------------------
>
> >
> > This is incorrect according to the XML 1.0 Recommendation
> > unless the characters are escaped with the values.
>
>I do not understand how you draw this conclusion. XML 1.0 only requires
>that characters be escaped if they cannot be encoded natively in the given
>encoding. As UTF-8 can encode all of Unicode, not escaping is needed.
>
> > According
> > to my knowledge, two things could be done at this point by
> > modifying Example 4's text:
> >
> > 1.) Add an encoding attribute to the xml declaration <?xml
> > version='1.0'
> > encoding='ISO-8859-1' ?>
> > 2.) Change the element to
> > <n:name xmlns:n="http://mycompany.example.com/employees">
> >           �ke J�gvan �yvind </n:name>
> >
> > making it a well formed xml document (due to assumption of
> > encoding="utf-8")
>
>I think the example is fine as is.
>
>Gudge
>
>
>
>

_________________________________________________________________
Add photos to your messages with MSN 8. Get 2 months FREE*.  
http://join.msn.com/?page=features/featuredemail
Received on Monday, 12 May 2003 15:39:08 UTC