- From: Martin Gudgin <mgudgin@microsoft.com>
- Date: Mon, 12 May 2003 14:26:41 -0700
- To: "Aman Singh" <haramansingh@hotmail.com>, <noah_mendelsohn@us.ibm.com>
- Cc: <xml-dist-app@w3.org>, <xml-dist-app-request@w3.org>
> -----Original Message----- > From: Aman Singh [mailto:haramansingh@hotmail.com] > Sent: 12 May 2003 20:34 > To: noah_mendelsohn@us.ibm.com; Martin Gudgin > Cc: xml-dist-app@w3.org; xml-dist-app-request@w3.org > Subject: RE: encoding missing in xml declaration > > Thank you replying back and many more thanks for the > clarification, however, I am still confused. > > According to the XML 1.0 W3C Recommendation in Appendix F, > the following is > stated: > > -------------------------------------------------------------- > -------------------------------------------------------------- > --------- > F.2 Priorities in the Presence of External Encoding > Information The second possible case occurs when the XML > entity is accompanied by encoding information, as in some > file systems and some network protocols. > When multiple sources of information are available, their > relative priority and the preferred method of handling > conflict should be specified as part of the higher-level > protocol used to deliver XML. In particular, please refer to > [IETF RFC 2376] or its successor, which defines the text/xml > and application/xml MIME types and provides some useful > guidance. In the interests of interoperability, however, the > following rule is recommended. > > If an XML entity is in a file, the Byte-Order Mark and > encoding declaration are used (if present) to determine the > character encoding. > -------------------------------------------------------------- > -------------------------------------------------------------- > --------- > > I conducted an XML experiment of my own > > Using Notepad (OS: Windows XP) I created two xml files with > the following > content: > <?xml version='1.0' ?> > <root>ÅÅ</root> > > I saved the first file as of type ANSI encoding and the other > as Unicode. > Then I opened them up in Internet Explorer 6 (msxml 4 on the OS). > > (Case 1) An error was received while opening the first file. The encoding of the first file is ANSI, which, I *think* is ISO-8859-1, hence in order for an XML parser to correctly interpret it it MUST have an xml declaration with the value ISO-8859-1 ( or some recapitalization thereof ). > (Case 2)The second file opened fine. Right, because it was encoding using UTF-16, began with a BOM and was interpretable automatically by the XML Parser > > Another experiment (Case 3) > When I add the following encoding attribute to the xml > declaration in Case 1 and save the file as ANSI, I get a > positive result. > <?xml version="1.0" encoding="ISO-8859-1" ?> <root>ÅÅ</root> Right. Case 1 was neither UTF-8 or UTF-16, therefore an encoding attribute was required. > > What I am trying to get at is what is really used to > determine the character encoding for SOAP, In Case 1, it was > the way file was saved and not the xml declaration, No, the way an XML parser figures out the encoding is to use the BOM and/or encoding attribute. If the XML resource was supplied over HTTP, then the charset parameter to Content-Type could be used in the absence of a BOM and encoding attribute. In both case 1 and 2 your XML declaration said 'Hey XML parser, figure out the encoding for yourself, but it's either UTF-8 or UTF_16' > But the > encoding attribute did take prescendence when it was added to > the xml declaration (Case 3). Yup. > > However for SOAP, will it be the transport level inforamtion (i.e HTTP > Headers) that determine the encoding for the document, or the > xml declaration? [1] seems to indicate that HTTP headers MAY take precedence over XML declaration. > > Is the speficiation ambigious that it is left to the XML parser? I don't think so. Everything else in the XML world works this way AFAIK. > > In what context is the xml encoding to be used according to > the XML 1.0 Recommendation? If the encoding is NOT UTF-8 or UTF-16 then an XML declaration MUST be present and the encoding attribute MUST appear. The relevant text from [1] is "It is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16." > > I am sorry that I am still confused. Given the experiments, > my confusion is justified ;) I hope the above helps. Gudge [1] http://www.w3.org/TR/REC-xml#charencoding
Received on Monday, 12 May 2003 17:26:51 UTC