RE: character encoding assumptions and approaches from Pieter Van Lierop on 2002-03-07 (www-zig@w3.org from March 2002)

From: Pieter Van Lierop <pvanlierop@geac.fr>
Date: Thu, 7 Mar 2002 17:41:08 +0100
To: "'LeVan,Ralph'" <levan@oclc.org>, www-zig@w3.org
Message-ID: <00DE8F985709D6119F6B00805F851D8504F783@parisexchange.fr.geac.com>

I agree with you that these are the most important, but why make a
difference with other strings?
Examples: 
ImplementationName in the Init = "Bibliothèque Française"
DatabaseName = "Périodiques" (this is French for Serials)
How should I interpret that string when the option bit is on? Is this utf-8
or not? If not, why not? And what is it then?

Pieter

> -----Message d'origine-----
> De : LeVan,Ralph [mailto:levan@oclc.org]
> Envoyé : jeudi 7 mars 2002 17:28
> À : www-zig@w3.org
> Objet : RE: character encoding assumptions and approaches
> 
> 
> It should apply to the general Term in AttributesPlusTerm and the
> characterString Term in AttributesPlusTerm.
> 
> The general Term is a special case and needs to be recognized 
> as such in the
> description.  General is an OctetString and could contain any 
> random binary
> data.  We must agree that when the utf-8 bit is on, that 
> general will only
> be used for character data.  If that isn't acceptable, then 
> we're stuck with
> just characterString.
> 
> The same use of general and characterString Terms applies in
> AttributesPlusTerm in the Scan request and TermInfo in the 
> Scan response.
> It should also apply to the displayTerm and alternativeTerm 
> in TermInfo.
> 
> I'm open to other suggestions, but I believe this is sufficient.
> 
> Ralph
> 
> > -----Original Message-----
> > From: Ray Denenberg [mailto:rden@loc.gov]
> > Sent: Thursday, March 07, 2002 11:16 AM
> > To: www-zig@w3.org
> > Subject: Re: character encoding assumptions and approaches
> > 
> > 
> > "LeVan,Ralph" wrote:
> > 
> > > Let's change the question slightly.  Why should the 
> > application know what
> > > kind of data it is returning?  Why should it behave 
> > differently for one kind
> > > of data than another?  Did you know that there is text 
> > embedded in JPEG
> > > files?
> > 
> > Actually no, my format experts here tell me that jpeg 
> > represents text as bits,
> > but they might be mistaken. In any case, certainly we 
> > wouldn't expect conversion
> > to utf-8 in mixed-content or print-format (e.g. pdf, 
> > postscript)  files.
> > 
> > 
> > >   I
> > > think you assume too much knowlege about MARC records and 
> > should treat them
> > > like any other record format.
> > 
> > If we define a utf-8 option bit, what do you think it should 
> > apply to then?
> > 
> > --Ray
> > 
>

Received on Thursday, 7 March 2002 11:42:42 UTC