- From: Aman Singh <haramansingh@hotmail.com>
- Date: Tue, 13 May 2003 09:25:26 -0400 (EDT)
- To: mgudgin@microsoft.com, noah_mendelsohn@us.ibm.com
- Cc: xml-dist-app@w3.org, xml-dist-app-request@w3.org
It all makes sense to me now, it has to do with byte order marks. >>Case 1 was neither UTF-8 or UTF-16, therefore an encoding attribute was >>required. One can be easily misled by the verbosity of an XML fragment to think that for case 1, the default encoding would be utf-8. <?xml version='1.0' ?> <root>ÅÅ</root> By looking at the contents of the file, I would never think that an encoding attribute is required. Thanks again for your time. Best Regards, aman singh >From: "Martin Gudgin" <mgudgin@microsoft.com> >To: "Aman Singh" <haramansingh@hotmail.com>,<noah_mendelsohn@us.ibm.com> >CC: <xml-dist-app@w3.org>,<xml-dist-app-request@w3.org> >Subject: RE: encoding missing in xml declaration >Date: Mon, 12 May 2003 14:26:41 -0700 > > > > > -----Original Message----- > > From: Aman Singh [mailto:haramansingh@hotmail.com] > > Sent: 12 May 2003 20:34 > > To: noah_mendelsohn@us.ibm.com; Martin Gudgin > > Cc: xml-dist-app@w3.org; xml-dist-app-request@w3.org > > Subject: RE: encoding missing in xml declaration > > > > Thank you replying back and many more thanks for the > > clarification, however, I am still confused. > > > > According to the XML 1.0 W3C Recommendation in Appendix F, > > the following is > > stated: > > > > -------------------------------------------------------------- > > -------------------------------------------------------------- > > --------- > > F.2 Priorities in the Presence of External Encoding > > Information The second possible case occurs when the XML > > entity is accompanied by encoding information, as in some > > file systems and some network protocols. > > When multiple sources of information are available, their > > relative priority and the preferred method of handling > > conflict should be specified as part of the higher-level > > protocol used to deliver XML. In particular, please refer to > > [IETF RFC 2376] or its successor, which defines the text/xml > > and application/xml MIME types and provides some useful > > guidance. In the interests of interoperability, however, the > > following rule is recommended. > > > > If an XML entity is in a file, the Byte-Order Mark and > > encoding declaration are used (if present) to determine the > > character encoding. > > -------------------------------------------------------------- > > -------------------------------------------------------------- > > --------- > > > > I conducted an XML experiment of my own > > > > Using Notepad (OS: Windows XP) I created two xml files with > > the following > > content: > > <?xml version='1.0' ?> > > <root>ÅÅ</root> > > > > I saved the first file as of type ANSI encoding and the other > > as Unicode. > > Then I opened them up in Internet Explorer 6 (msxml 4 on the OS). > > > > (Case 1) An error was received while opening the first file. > >The encoding of the first file is ANSI, which, I *think* is ISO-8859-1, >hence in order for an XML parser to correctly interpret it it MUST have an >xml declaration with the value ISO-8859-1 ( or some recapitalization >thereof ). > > > (Case 2)The second file opened fine. > >Right, because it was encoding using UTF-16, began with a BOM and was >interpretable automatically by the XML Parser > > > > > Another experiment (Case 3) > > When I add the following encoding attribute to the xml > > declaration in Case 1 and save the file as ANSI, I get a > > positive result. > > <?xml version="1.0" encoding="ISO-8859-1" ?> <root>ÅÅ</root> > >Right. Case 1 was neither UTF-8 or UTF-16, therefore an encoding attribute >was required. > > > > > What I am trying to get at is what is really used to > > determine the character encoding for SOAP, In Case 1, it was > > the way file was saved and not the xml declaration, > >No, the way an XML parser figures out the encoding is to use the BOM and/or >encoding attribute. If the XML resource was supplied over HTTP, then the >charset parameter to Content-Type could be used in the absence of a BOM and >encoding attribute. In both case 1 and 2 your XML declaration said 'Hey XML >parser, figure out the encoding for yourself, but it's either UTF-8 or >UTF_16' > > > But the > > encoding attribute did take prescendence when it was added to > > the xml declaration (Case 3). > >Yup. > > > > > However for SOAP, will it be the transport level inforamtion (i.e HTTP > > Headers) that determine the encoding for the document, or the > > xml declaration? > >[1] seems to indicate that HTTP headers MAY take precedence over XML >declaration. > > > > > Is the speficiation ambigious that it is left to the XML parser? > >I don't think so. Everything else in the XML world works this way AFAIK. > > > > > In what context is the xml encoding to be used according to > > the XML 1.0 Recommendation? > >If the encoding is NOT UTF-8 or UTF-16 then an XML declaration MUST be >present and the encoding attribute MUST appear. The relevant text from [1] >is > >"It is also a fatal error if an XML entity contains no encoding declaration >and its content is not legal UTF-8 or UTF-16." > > > > > I am sorry that I am still confused. Given the experiments, > > my confusion is justified ;) > >I hope the above helps. > >Gudge > >[1] http://www.w3.org/TR/REC-xml#charencoding > _________________________________________________________________ MSN 8 with e-mail virus protection service: 2 months FREE* http://join.msn.com/?page=features/virus
Received on Tuesday, 13 May 2003 09:54:34 UTC