SV: SV: SV: SV: Z39.50 character encoding from Henrik Dahl on 2002-03-05 (www-zig@w3.org from March 2002)

From: Henrik Dahl <hdahl@inet.uni2.dk>
Date: Tue, 5 Mar 2002 12:25:41 +0100
To: <www-zig@w3.org>
Message-ID: <001601c1c438$7cb1e310$0301a8c0@hdthinkpada22p>
Hello Mike!

I think of course that it sounds as well founded criticism and I agree of
course, that on behalf of this kind of values the idea of embedding an XML
document isn't very convenient and I obviously agree with you completely. I
think however, that it would be necessary to make a custom, very simple of
course, parser for your concrete suggestion instead of just using the
standard XML parser, so you could just use the UTF-8 approach instead.

There are, as far as I know, still some development environments around
which do not provide easy coping with the UNICODE characterset, and evt.
it's UTF-8 encoding, but if nobody think they have any problems with that
everything is fine of course. We for instance do not have such past related
problems.

To summarize: On behalf of the values you express, I obviously agree with
you completely!

Best regards,

Henrik Dahl

-----Oprindelig meddelelse-----
Fra: www-zig-request@w3.org [mailto:www-zig-request@w3.org]På vegne af
Mike Taylor
Sendt: Tuesday, March 05, 2002 12:07 PM
Til: hdahl@inet.uni2.dk
Cc: www-zig@w3.org
Emne: Re: SV: SV: SV: Z39.50 character encoding


> Date: Fri, 1 Mar 2002 20:21:31 +0100
> From: "Henrik Dahl" <hdahl@inet.uni2.dk>
>
> Why don't you just describe yourself why you think it's madness.

OK Henrik, sorry if that was a bit close to the bone.  (Though for the
record a prominent ZIG member told me off-list, "I was hoping that
ignoring it would make it go away" :-)

> I think it has the benefit that basically any characterset may be
> supported without doing any development as the solution is already
> provided by the XML services. It means quite some benefits without
> any investment.

Part of the problem with your proposal is the "without any investment"
part.  Adding an XML parser into a lightweight client is _not_ a
low-investment solution to the relatively straightforward problem of
specifying the character set of a string.

If we really wanted Z39.50 protocol strings to carry their own
character-set information with them, it would be much simpler to use
something like an RFC 822 header (as HTTP responses do): so instead of
saying:

	<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
	<InternationalString>
	Finally the discussion on charactersets is over as the
	solution to the problem is handled by ordinary means in scope
	of XML
	</InternationalString>

the strings would say something like:

	Content-type: text/plain charset=ISO-8859-1

	Finally the discussion on charactersets is over as the
	solution to the problem is handled by ordinary means in scope
	of RFC 822-like headers

But really -- even that smells horribly wrong to me.  If we did
something like this, we'd still need an option bit or something
equivalent to indicate that that's what we're doing.  So why introduce
all the extra mechanism rather that just have the option bit (or
whatever) say "all strings are UTF-8" and have done with it?

 _/|_	 _______________________________________________________________
/o ) \/  Mike Taylor   <mike@miketaylor.org.uk>   www.miketaylor.org.uk
)_v__/\  "Design and programming are human activities; forget that
	 and all is lost" -- Bjarne Stroustrup.
Received on Tuesday, 5 March 2002 06:25:23 UTC