- From: Aman Singh <haramansingh@hotmail.com>
- Date: Mon, 12 May 2003 15:34:00 -0400 (EDT)
- To: noah_mendelsohn@us.ibm.com, mgudgin@microsoft.com
- Cc: xml-dist-app@w3.org, xml-dist-app-request@w3.org
Thank you replying back and many more thanks for the clarification, however, I am still confused. According to the XML 1.0 W3C Recommendation in Appendix F, the following is stated: ------------------------------------------------------------------------------------------------------------------------------------- F.2 Priorities in the Presence of External Encoding Information The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. In particular, please refer to [IETF RFC 2376] or its successor, which defines the text/xml and application/xml MIME types and provides some useful guidance. In the interests of interoperability, however, the following rule is recommended. If an XML entity is in a file, the Byte-Order Mark and encoding declaration are used (if present) to determine the character encoding. ------------------------------------------------------------------------------------------------------------------------------------- I conducted an XML experiment of my own Using Notepad (OS: Windows XP) I created two xml files with the following content: <?xml version='1.0' ?> <root>ÅÅ</root> I saved the first file as of type ANSI encoding and the other as Unicode. Then I opened them up in Internet Explorer 6 (msxml 4 on the OS). (Case 1) An error was received while opening the first file. (Case 2)The second file opened fine. Another experiment (Case 3) When I add the following encoding attribute to the xml declaration in Case 1 and save the file as ANSI, I get a positive result. <?xml version="1.0" encoding="ISO-8859-1" ?> <root>ÅÅ</root> What I am trying to get at is what is really used to determine the character encoding for SOAP, In Case 1, it was the way file was saved and not the xml declaration, But the encoding attribute did take prescendence when it was added to the xml declaration (Case 3). However for SOAP, will it be the transport level inforamtion (i.e HTTP Headers) that determine the encoding for the document, or the xml declaration? Is the speficiation ambigious that it is left to the XML parser? In what context is the xml encoding to be used according to the XML 1.0 Recommendation? I am sorry that I am still confused. Given the experiments, my confusion is justified ;) Best Regards, Aman Singh >From: noah_mendelsohn@us.ibm.com >To: "Martin Gudgin" <mgudgin@microsoft.com> >CC: "Aman Singh" <haramansingh@hotmail.com>, xml-dist-app@w3.org, >xml-dist-app-request@w3.org >Subject: RE: encoding missing in xml declaration >Date: Mon, 12 May 2003 13:34:46 -0400 > >Exactly. It may also be helpful to point out some information on UTF-8, >such as [1] and in particular [2]. > >[1] http://www.cl.cam.ac.uk/~mgk25/unicode.html >[2] http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 > >------------------------------------------------------------------ >Noah Mendelsohn Voice: 1-617-693-4036 >IBM Corporation Fax: 1-617-693-8676 >One Rogers Street >Cambridge, MA 02142 >------------------------------------------------------------------ > > > > > > > >"Martin Gudgin" <mgudgin@microsoft.com> >Sent by: xml-dist-app-request@w3.org >05/12/2003 04:59 AM > > > To: "Aman Singh" <haramansingh@hotmail.com>, ><xml-dist-app@w3.org> > cc: (bcc: Noah Mendelsohn/Cambridge/IBM) > Subject: RE: encoding missing in xml declaration > > > >Moving to xml-dist-app for discussion > > > -----Original Message----- > > From: xmlp-comments-request@w3.org > > [mailto:xmlp-comments-request@w3.org] On Behalf Of Aman Singh > > Sent: 09 May 2003 18:33 > > To: xmlp-comments@w3.org > > Subject: encoding missing in xml declaration > > > > > > > > > > > > Dear Sir/Madame: > > > > In the document SOAP Version 1.2 Part 0: Primer with status > > of Proposed Recommendation, I noted the following issue. > > > > In Example 4, the xml declaration is <?xml version='1.0' ?> > > without any encoding attribute, therefore the value of > > encoding defaults to utf-8. > >Correct. > > > Within the same soap message, an element is found with french > > characters. > > > > <n:name xmlns:n="http://mycompany.example.com/employees"> > > Åke Jógvan Øyvind > > </n:name> > >UTF-8 is able to encode all Unicode characters. > >According to The Unicode Standard Version 3.0, the character codes are as >follows: > >---------------------------------------- >| Glyph | Hex Code | Bit pattern | >---------------------------------------- >| Å | 00C5 | 11000011 10000101 | >---------------------------------------- >| ó | 00F3 | 11000011 10110011 | >---------------------------------------- >| Ø | 00D8 | 11000011 10011000 | >---------------------------------------- > > > > > This is incorrect according to the XML 1.0 Recommendation > > unless the characters are escaped with the values. > >I do not understand how you draw this conclusion. XML 1.0 only requires >that characters be escaped if they cannot be encoded natively in the given >encoding. As UTF-8 can encode all of Unicode, not escaping is needed. > > > According > > to my knowledge, two things could be done at this point by > > modifying Example 4's text: > > > > 1.) Add an encoding attribute to the xml declaration <?xml > > version='1.0' > > encoding='ISO-8859-1' ?> > > 2.) Change the element to > > <n:name xmlns:n="http://mycompany.example.com/employees"> > > Åke Jógvan Øyvind </n:name> > > > > making it a well formed xml document (due to assumption of > > encoding="utf-8") > >I think the example is fine as is. > >Gudge > > > > _________________________________________________________________ Add photos to your messages with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail
Received on Monday, 12 May 2003 15:39:08 UTC