Re: Media Type registration for XSLT 2, XQuery 1.0 and XQueryX 1.0 from 'Liam R. E. Quin' on 2007-01-19 (public-qt-comments@w3.org from January 2007)

From: 'Liam R. E. Quin' <liam@w3.org>
Date: Fri, 19 Jan 2007 13:56:51 -0500
To: Larry Masinter <LMM@acm.org>
Cc: public-qt-comments@w3.org
Message-ID: <20070119185651.GA25940@w3.org>

(personal reply, not from the Working Group)

On Thu, Jan 18, 2007 at 08:52:39PM -0800, Larry Masinter wrote:

Thanks for taking the time to comment!
> 
> Your HTML to text converter introduced several
> places where the media type registration is
> hard to decypher. Since IANA currently doesn't
> post HTML registration forms, could you fix these
> up before they go into the IANA registry?
Yes.

> For
> example,
>      http://www.w3.org/TR/xqueryx/">http://www.w3.org/TR/xqueryx/

oops, noted

> , or the runon
>   An XQuery file may have the string xquery version "V.V" near
>   the beginning of the document, where "V.V" is a version number.
>   Currently the version number, if present, must be "1.0". 
> 
> without being clear that the string is 
>       xquery version "V.V"

agreed (I'm sure I remember fixing this once, sigh)

> The 'charset' issues with this registration are
> troublesome, since you never really address the
> issue of whether non-UTF8 or UTF16 encodings of
> in something labelled application/xquery are
> allowed, and, if so, by what method the actual
> character encoding is supposed to be determined.

The specification itself is clear (I hope) that UTF-8 and UTF-16 are
both allowed.  If the file (OK, octet stream) contains an
encoding declaration, it's an error if the input octet stream is not
compatible with that encoding declaration.  This is true regardless
of whether the query was transmitted over a network using HTTP,
or was read from a local file, or passed in memory, for example,
and falls into the domain of the language specification.

The case of a query with no encoding declaration that is in neither
UTF-8 nor UTF-16 with BOM is an error case, so the charset
parameter would serve no useful purpose.  Proxies are forbidden
(of course) to transcode data in application/*.

We will consider clarifying this issue as a potential errata.

> It sort of sounds like you expect to guess, which
> has lots of problems.

Just to be clear, no, there's no guesswork -- if it's in UTF-16 it
has to start with a byte order mark and we can tell.  If it's
in neither UTF-8 not UTF-16 t must say what it's in.

Thank you again for some useful and constructive comments --
much appreciated.

Liam

-- 
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/

Received on Friday, 19 January 2007 18:57:04 UTC