W3C home > Mailing lists > Public > xml-editor@w3.org > October to December 2000

XML erratum -- Bytes vs Octets

From: Misha Wolf <misha.wolf@reuters.com>
Date: Wed, 08 Nov 2000 18:37:50 +0000 (GMT)
Message-Id: <B0008710340@euvig1.dtc.lon.ime.reuters.com>
To: xml-editor@w3.org
Cc: w3c-i18n-ig@w3.org
Extensible Markup Language (XML) 1.0 (Second Edition), in:

   4.2.2 External Entities
   http://www.w3.org/TR/REC-xml#sec-external-ent

states:

|  URI references require encoding and escaping of certain characters. The
|  disallowed characters include all non-ASCII characters, plus the
|  excluded characters listed in Section 2.4 of [IETF RFC 2396], except for
|  the number sign (#) and percent sign (%) characters and the square
|  bracket characters re-allowed in [IETF RFC 2732]. Disallowed characters
|  must be escaped as follows: 
|  
|  Each disallowed character is converted to UTF-8 [IETF RFC 2279] as one
|  or more bytes. 
|  
|  Any octets corresponding to a disallowed character are escaped with the
|  URI escaping mechanism (that is, converted to %HH, where HH is the
|  hexadecimal notation of the byte value). 
|  
|  The original character is replaced by the resulting character sequence. 

We seem to have two bytes and one octet.  Please can we standardise 
on one term or the other.

Misha

[This mail was written using voice recognition software]


-----------------------------------------------------------------
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.
Received on Wednesday, 8 November 2000 13:38:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:31 GMT