RE: character encoding assumptions and approaches

The difference is that those other strings aren't important.  They are
icing, not cake.  They can be dumbed down to 7-bit ASCII with no loss of
functionality.

Ralph

> -----Original Message-----
> From: Pieter Van Lierop [mailto:pvanlierop@geac.fr]
> Sent: Thursday, March 07, 2002 11:41 AM
> To: 'LeVan,Ralph'; www-zig@w3.org
> Subject: RE: character encoding assumptions and approaches
> 
> 
> I agree with you that these are the most important, but why make a
> difference with other strings?
> Examples: 
> ImplementationName in the Init = "Bibliothèque Française"
> DatabaseName = "Périodiques" (this is French for Serials)
> How should I interpret that string when the option bit is on? 
> Is this utf-8
> or not? If not, why not? And what is it then?
> 
> Pieter
> 
> > -----Message d'origine-----
> > De : LeVan,Ralph [mailto:levan@oclc.org]
> > Envoyé : jeudi 7 mars 2002 17:28
> > À : www-zig@w3.org
> > Objet : RE: character encoding assumptions and approaches
> > 
> > 
> > It should apply to the general Term in AttributesPlusTerm and the
> > characterString Term in AttributesPlusTerm.
> > 
> > The general Term is a special case and needs to be recognized 
> > as such in the
> > description.  General is an OctetString and could contain any 
> > random binary
> > data.  We must agree that when the utf-8 bit is on, that 
> > general will only
> > be used for character data.  If that isn't acceptable, then 
> > we're stuck with
> > just characterString.
> > 
> > The same use of general and characterString Terms applies in
> > AttributesPlusTerm in the Scan request and TermInfo in the 
> > Scan response.
> > It should also apply to the displayTerm and alternativeTerm 
> > in TermInfo.
> > 
> > I'm open to other suggestions, but I believe this is sufficient.
> > 
> > Ralph
> > 
> > > -----Original Message-----
> > > From: Ray Denenberg [mailto:rden@loc.gov]
> > > Sent: Thursday, March 07, 2002 11:16 AM
> > > To: www-zig@w3.org
> > > Subject: Re: character encoding assumptions and approaches
> > > 
> > > 
> > > "LeVan,Ralph" wrote:
> > > 
> > > > Let's change the question slightly.  Why should the 
> > > application know what
> > > > kind of data it is returning?  Why should it behave 
> > > differently for one kind
> > > > of data than another?  Did you know that there is text 
> > > embedded in JPEG
> > > > files?
> > > 
> > > Actually no, my format experts here tell me that jpeg 
> > > represents text as bits,
> > > but they might be mistaken. In any case, certainly we 
> > > wouldn't expect conversion
> > > to utf-8 in mixed-content or print-format (e.g. pdf, 
> > > postscript)  files.
> > > 
> > > 
> > > >   I
> > > > think you assume too much knowlege about MARC records and 
> > > should treat them
> > > > like any other record format.
> > > 
> > > If we define a utf-8 option bit, what do you think it should 
> > > apply to then?
> > > 
> > > --Ray
> > > 
> > 
> 

Received on Thursday, 7 March 2002 13:42:49 UTC