Re: character encoding assumptions and approaches

> > of data than another?  Did you know that there is text embedded in JPEG
> > files?
> 
> Actually no, my format experts here tell me that jpeg represents text as bits,
> but they might be mistaken. In any case, certainly we wouldn't expect conversion
> to utf-8 in mixed-content or print-format (e.g. pdf, postscript)  files.


To be pedantic, there is plain text embedded in some jpeg files.
For example:

'Created with The GIMP'

These same comments can be found in GIF, but also the GIF version 
information is in plain text. For example:
GIF89a

This is text and it is in graphic files, along with a Whole Lot of other 
information.

In theory these comments could be searched.  If it were mandated that the
utf-8 option bit means that text in all records, including external ones,
should be returned in utf-8, then there would be serious problems.

This is the point Ralph was trying to make, I think.

Rob

-- 
      ,'/:.          Rob Sanderson (azaroth@liverpool.ac.uk)
    ,'-/::::.        http://www.o-r-g.org/~azaroth/
  ,'--/::(@)::.      Special Collections and Archives, extension 3142
,'---/::::::::::.    Twin Cathedrals:  telnet: liverpool.o-r-g.org 7777
____/:::::::::::::.              WWW:  http://liverpool.o-r-g.org:8000/
I L L U M I N A T I

Received on Thursday, 7 March 2002 11:48:31 UTC