W3C home > Mailing lists > Public > www-zig@w3.org > March 2002

Re: character encoding assumptions and approaches

From: Robert Sanderson <azaroth@liverpool.ac.uk>
Date: Thu, 7 Mar 2002 16:44:31 +0000 (GMT)
To: Ray Denenberg <rden@loc.gov>
cc: <www-zig@w3.org>
Message-ID: <Pine.LNX.4.33.0203071639070.11953-100000@gondolin.hist.liv.ac.uk>

> > of data than another?  Did you know that there is text embedded in JPEG
> > files?
> 
> Actually no, my format experts here tell me that jpeg represents text as bits,
> but they might be mistaken. In any case, certainly we wouldn't expect conversion
> to utf-8 in mixed-content or print-format (e.g. pdf, postscript)  files.


To be pedantic, there is plain text embedded in some jpeg files.
For example:

'Created with The GIMP'

These same comments can be found in GIF, but also the GIF version 
information is in plain text. For example:
GIF89a

This is text and it is in graphic files, along with a Whole Lot of other 
information.

In theory these comments could be searched.  If it were mandated that the
utf-8 option bit means that text in all records, including external ones,
should be returned in utf-8, then there would be serious problems.

This is the point Ralph was trying to make, I think.

Rob

-- 
      ,'/:.          Rob Sanderson (azaroth@liverpool.ac.uk)
    ,'-/::::.        http://www.o-r-g.org/~azaroth/
  ,'--/::(@)::.      Special Collections and Archives, extension 3142
,'---/::::::::::.    Twin Cathedrals:  telnet: liverpool.o-r-g.org 7777
____/:::::::::::::.              WWW:  http://liverpool.o-r-g.org:8000/
I L L U M I N A T I
Received on Thursday, 7 March 2002 11:48:31 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 29 October 2009 06:12:22 GMT