RE: character encoding assumptions and approaches

It should apply to the general Term in AttributesPlusTerm and the
characterString Term in AttributesPlusTerm.

The general Term is a special case and needs to be recognized as such in the
description.  General is an OctetString and could contain any random binary
data.  We must agree that when the utf-8 bit is on, that general will only
be used for character data.  If that isn't acceptable, then we're stuck with
just characterString.

The same use of general and characterString Terms applies in
AttributesPlusTerm in the Scan request and TermInfo in the Scan response.
It should also apply to the displayTerm and alternativeTerm in TermInfo.

I'm open to other suggestions, but I believe this is sufficient.

Ralph

> -----Original Message-----
> From: Ray Denenberg [mailto:rden@loc.gov]
> Sent: Thursday, March 07, 2002 11:16 AM
> To: www-zig@w3.org
> Subject: Re: character encoding assumptions and approaches
> 
> 
> "LeVan,Ralph" wrote:
> 
> > Let's change the question slightly.  Why should the 
> application know what
> > kind of data it is returning?  Why should it behave 
> differently for one kind
> > of data than another?  Did you know that there is text 
> embedded in JPEG
> > files?
> 
> Actually no, my format experts here tell me that jpeg 
> represents text as bits,
> but they might be mistaken. In any case, certainly we 
> wouldn't expect conversion
> to utf-8 in mixed-content or print-format (e.g. pdf, 
> postscript)  files.
> 
> 
> >   I
> > think you assume too much knowlege about MARC records and 
> should treat them
> > like any other record format.
> 
> If we define a utf-8 option bit, what do you think it should 
> apply to then?
> 
> --Ray
> 

Received on Thursday, 7 March 2002 11:28:20 UTC