Re: Request OID for "XML 1.0" Record Syntax from Ian Ibbotson on 2004-11-12 (www-zig@w3.org from November 2004)

From: Ian Ibbotson <ian.ibbotson@k-int.com>
Date: Fri, 12 Nov 2004 09:57:52 +0000
To: www-zig@w3.org
Message-ID: <41948920.7040203@k-int.com>
I like this solution better actually. Hadn't responded before because 
mikes pragmatic reasoning had outweiged my slight discomfort at the 
original proposal. This looks ideal to me though as it preserves the 
existing expected behavior and lets us be more specific as needed?

Ian.

Ray Denenberg, Library of Congress wrote:
> Mike, how about
> 1.2.840.10003.5.109.10   means xml (no specific version)
> 1.2.840.10003.5.109.10.1.0  means xml 1.0
> 1.2.840.10003.5.109.10.1.1 means xml 1.1
> 
> Since you're the only one who seems to be interested, if this sounds good to
> you consider it done (after a brief comment period).
> 
> --Ray
> 
> 
> ----- Original Message ----- 
> From: "Mike Taylor" <mike@indexdata.com>
> To: <www-zig@w3.org>
> Sent: Wednesday, November 10, 2004 7:26 AM
> Subject: Request OID for "XML 1.0" Record Syntax
> 
> 
> 
>>I originally sent this three weeks ago, but I don't think anyone
>>responded.  I am inclined to take this as tacit acceptance of my
>>proposal.  If anyone objects, please can they say so?  And if no-one
>>does, Ray, please can I have my OID?
>>
>>--
>>
>>Date: Thu Oct 21 16:57:38 +0100 2004
>>From: Mike Taylor <mike@minmi.miketaylor.org.uk>
>>To: www-zig@w3.org
>>Subject: Request OID for "XML 1.0" Record Syntax
>>
>>I would like to request that the Z39.50 Maintenance Agency issue a new
>>record syntax OID that is explicitly XML version 1.0, as opposed to
>>the existing XML OID 1.2.840.10003.5.109.10 which is just XML, and
>>could be XML 1.0 or XML 1.1.
>>
>>Here comes the rationale.  Hold on.
>>
>>Consider a text-and-structured-data repository such as our very own
>>Zebra.  In principle, it is a storage, indexing and retrieval facility
>>for any structured data, including binary data and text containing
>>control characters.  In practice, it needs to use some kind of
>>structured file format for getting the structured data in and out, and
>>the overwhelmingly most popular choice for that is XML.
>>
>>Now XML as we know it (XML 1.0) is actually a pretty poor choice,
>>because it can't represent certain characters: nothing with a code
>>below 32, except for the three special cases of tab, linefeed and
>>carriage return.  See:
>>http://www.w3.org/TR/2004/REC-xml-20040204/#charsets
>>and note that you can't get around this problem by using entities
>>instead: the entity "&#1;", for example, is ILLEGAL in XML 1.0.  If
>>you don't believe me (and I wouldn't blame you, it took a lot to
>>persuade me that this brain-damage is real), just ask your favourite
>>comformant XML 1.0 parser:
>>
>>$ echo "<x>&#1;</x>" | xmllint -
>>-:1: error: xmlParseCharRef: invalid xmlChar value 1
>><x>&#1;</x>
>>$
>>
>>Now, consider what a system such as Zebra should do when person A
>>wants to add to it a record containing a field with a control
>>character in, and person B wants to retrieve it as XML.  What should
>>it do?
>>
>>* It could refuse point blank to add the record, because the record is
>>  not good XML 1.0.  But (A) that's rude, (B) the record may be
>>  perfectly good XML 1.1, (C) in practice people do have records like
>>  that, and should be able to store them in a general purpose
>>  structured data engine, without being limited by an arbitrary
>>  prohibition in what amounts to a transfer syntax.  Finally, (D) a
>>  legitimate MARC record may be added that contains a control
>>  characters, so the problem will still arise when the record is
>>  retrieved as XML.
>>
>>* It could accept the record, but silently discard or transform the
>>  control character, either at the point where it stores and indexes
>>  it, or just before it returns it as XML.  This is pragmatically
>>  appealing in an It Just Works way, but ethically horrifying, since a
>>  data repository has no business messing with the content of
>>  someone's record.
>>
>>* It could just accept the record, and just give it out, without
>>  even looking at the content.  This is clearly The Right Thing, but
>>  causes people's XML 1.0 parsers to blow up, so it's no good in
>>  practice.  And it's no use telling people to use XML 1.1, since that
>>  is by no means universally implemented (nor, for that matter,
>>  universally liked.)
>>
>>So what we think we should do is this: we will continue to have our
>>repository do The Right Thing, which is accept XML containing control
>>characters, and return it verbatim when XML records are requested
>>(using the established record-syntax OID 1.2.840.10003.5.109.10).  But
>>if a client asks for the new "XML 1.0" record syntax -- the one we're
>>requesting an OID for -- then we'll return the record stripped of its
>>XML-unsafe control characters.  Then client programs that need to work
>>with fussy XML 1.0 parsers can request the new record syntax and know
>>that they'll get back a record which, though it may not be a perfect
>>byte-for-byte representation of the data, is legal XML 1.0.
>>
>>So that's why we need an "XML 1.0" record-syntax OID.
>>
>>If you consider any of this text helpful, you are very welcome to use
>>it on the Maintenance Agency site as a rationale for the new OID.
>>
>>Thanks for listening.
>>
>> _/|_ _______________________________________________________________
>>/o ) \/  Mike Taylor  <mike@indexdata.com>  http://www.miketaylor.org.uk
>>)_v__/\  "Looks like it's time to over-technicalize this previously
>>tame post" -- Mickey Mortimer on the dinosaur mailing list
>>
>>--
>>Listen to free demos of soundtrack music for film, TV and radio
>>http://www.pipedreaming.org.uk/soundtrack/
>>
>>
>>
> 
> 
> 
>
Received on Friday, 12 November 2004 14:05:49 UTC