W3C home > Mailing lists > Public > xml-editor@w3.org > January to March 1999

FW: Encoding detection again ...

From: Miles Sabin <msabin@cromwellmedia.co.uk>
Date: Tue, 2 Mar 1999 15:51:43 -0000
Message-ID: <c=US%a=_%p=Cromwell_Media%l=ODIN-990302155143Z-15625@odin.cromwellmedia.co.uk>
To: "'xml-editor@w3.org'" <xml-editor@w3.org>
Hi,

John Cowan suggested that I forward the following
to you for consideration as an errata for XML 1.0.

Cheers,


Miles

Miles Sabin                          Cromwell Media
Internet Systems Architect           5/6 Glenthorne Mews
+44 (0)181 410 2230                  London, W6 0LJ
msabin@cromwellmedia.co.uk           England


-----Original Message-----
From: Miles Sabin 
Sent: 02 March 1999 11:59 am
To: 'xml-dev@ic.ac.uk'
Subject: Encoding detection again ...


I've been browsing throught the archives for an
answer to this question, but I haven't been able
to find anything that seems to give a completely
unambiguous answer ...

Appendix F of the spec say that given a document 
starting with the 4 octet sequence,

  00 3C 00 3F

I'm to infer BOM-less big-endian UTF-16, and 
given a document starting with,

  3C 00 3F 00

I'm to infer BOM-less little-endian UTF-16.

What I what to know is: why could these 
sequences not equally represent (respectively)
big-endian UCS-2 or little-endian UCS-2? In
other words, surely these octet sequences are
ambiguous, and hence the encoding should be
resolved definitively with either,

  <?xml version="1.0" encoding="UTF-16"?>

or,

  <?xml version="1.0" encoding="ISO-10646-UCS-2"?>

or an appropriate MIME header, ie.,

  Content-type: text/xml; charset="utf-16"

or,

  Content-type: text/xml; charset="ISO-10646-UCS-2"

Just so there's no confusion ... I'm assuming:

1. Unicode == UTF-16
2. UCS-2 != UTF-16 (because UCS-2 lacks UTF-16's
   support for characters outside the BMP).

-- 
Miles Sabin                          Cromwell Media
Internet Systems Architect           5/6 Glenthorne Mews
+44 (0)181 410 2230                  London, W6 0LJ
msabin@cromwellmedia.co.uk           England


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on
CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
Received on Tuesday, 2 March 1999 10:59:19 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:29 GMT