RE: character encoding assumptions and approaches

Let's change the question slightly.  Why should the application know what
kind of data it is returning?  Why should it behave differently for one kind
of data than another?  Did you know that there is text embedded in JPEG
files?  Should I return that portion as UTF-8?  Why should my application be
able to change latin-1 text in a JPEG file to UTF-8?  Why should my
application be able to change ANSEL text in a MARC-21 file to UTF-8?  I
think you assume too much knowlege about MARC records and should treat them
like any other record format.

Ralph

> -----Original Message-----
> From: Ray Denenberg [mailto:rden@loc.gov]
> Sent: Wednesday, March 06, 2002 1:57 PM
> To: www-zig@w3.org
> Cc: www-zig@w3.org
> Subject: Re: character encoding assumptions and approaches
> 
> 
> "LeVan,Ralph" wrote:
> 
> > Someone pointed out that JPEG records
> > would not be affected by the UTF-8 negotiation.  I further 
> add that a JPEG
> > record contained in a UTF-8 GRS-1 record would not be 
> affected either.  So,
> > how do we make the jump that a MARC record would be affected?
> 
> Can you give an example of data where it isn't clear whether 
> or not we have text
> characters?  (jpeg, jpeg in grs, marc --  those aren't examples.)
> 
> --Ray
> 

Received on Wednesday, 6 March 2002 17:11:15 UTC