Re: internet media types and encoding

 From: "Chris Lilley" <chris@w3.org>
> On Tuesday, April 15, 2003, 5:11:44 PM, Rick wrote:

> RJ> Now that characters have been allocated to those code points,
> 
> OK and I said that, too, but luckily I thought to check with Mark
> Davis, the chair of the Unicode consortium about that. And I was
> wrong. He said:
> 
> MD> Ah. The character *names* are actually undefined, and simply
> MD> marked by "<control>". What you are thinking of as the names are
> MD> simply aliases pointing to the ISO 6429 usage. See
> MD> http://www.unicode.org/charts/PDF/U0080.pdf.
> 
> MD> So the Unicode Standard does not define U+0082 to mean "BREAK
> MD> PERMITTED HERE". It just says that there is a control code, one
> MD> which in ISO 6429 has that name and meaning. But implementers of
> MD> the Unicode Standard are not required to interpret the U+0082 in
> MD> the ISO 6429 way.

Doh. The wording of Unicode is http://www.unicode.org/unicode/uni2book/ch13.pdf
    "The Unicode Standard makes no specific use of
these control codes, but it provides for the passage of the numeric code values intact,
neither adding nor subtracting from their semantics. The semantics of the [C0, C1]
Controls and delete are generally determined by the application with which they are
used. However, in the absense of specific application uses, they
may be interpreted according to the semantics specified in ISO/IEC 6429"

I was interpreting "may" too strongly. 

Nevertheless, my point that transcoders may delete control characters willy nilly
stands. 


Cheers
Rick Jelliffe

 

Received on Tuesday, 15 April 2003 16:55:29 UTC