SV: Z39.50 character encoding

Let's assume we just talk about the InternationalString PDU and it's
characterset, i.e. not anything in scope of records in the response of a
PresentRequest. What do the rest of you think of an idea of simply embedding
an XML document in the value as e.g.:

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<InternationalString>
Finally the discussion on charactersets is over as the solution to the
problem is handled by ordinary means in scope of XML
</InternationalString>


In this way all the charactersets supported by XML may be used and the
discussion on how to handle charactersets is over as it's just a matter of
the standard XML possibilities.


Best regards,

Henrik Dahl


-----Oprindelig meddelelse-----
Fra: www-zig-request@w3.org [mailto:www-zig-request@w3.org]På vegne af
LeVan,Ralph
Sendt: Friday, March 01, 2002 2:49 PM
Til: www-zig@w3.org
Emne: RE: Z39.50 character encoding


UTF-8 is the default characterset for XML.  It is possible to specify a
different characterset.

Ralph

> -----Original Message-----
> From: Alan Kent [mailto:ajk@mds.rmit.edu.au]
> Sent: Thursday, February 28, 2002 7:35 PM
> To: www-zig@w3.org
> Subject: Re: Z39.50 character encoding
>
>
> On Thu, Feb 28, 2002 at 09:13:20AM -0500, Johan Zeeman wrote:
> > DC by itself is not a record syntax; it is a list of data
> elements.  To be a
> > record syntax, the data elements need to be encoded using
> some scheme.  The
> > one I know about is XML.  And XML explicitly uses UTF-8.
> >
> > j.
>
> Just to clarify, do you mean the XML record syntax in Z39.50
> explicitly
> uses UTF-8? XML itself certainly *does not* explicitly use UTF-8.
> That is simply what is common. People do use other encodings with
> XML (UTF-16 for example is completely valid and in usage - for
> example when using Chinese or other scripts, UTF-16 encoded files
> are much smaller than the same UTF-8 encoded files).
>
> I was just curious (without re-reading the XML record syntax) whether
> it was a Z39.50 decree that the XML record syntax mandates
> UTF-8 encoding.
>
> Thanks,
> Alan
>

Received on Friday, 1 March 2002 10:09:41 UTC