RE: [Serial] I18N WG last call comments from Michael Kay on 2004-10-26 (public-qt-comments@w3.org from October 2004)

From: Michael Kay <mhk@mhk.me.uk>
Date: Tue, 26 Oct 2004 23:05:28 +0100
To: "'Jonathan Robie'" <jonathan.robie@datadirect.com>
Cc: "'Martin Duerst'" <duerst@w3.org>, "'Henry Zongaro'" <zongaro@ca.ibm.com>, <public-qt-comments@w3.org>, <w3c-i18n-ig@w3.org>
Message-ID: <E1CMZS7-00036P-Cp@frink.w3.org>
I would like to add some further comments to this response.

Firstly, this isn't just about serialization. The serialization spec is
merely copying the way element and attribute construction work in XSLT and
XQuery.

Element and attribute construction take a typed value as input and produce a
lexical representation of that typed value as output. For atomic values,
this generally produces the canonical lexical representation of the value.
Certainly it always produces one of the possible lexical representations of
that value. For example, the number 1.5 is output as 1.5 (and not as 1.50 or
1,5). This ensures that when the resulting document is validated, the typed
value can be reconstituted. (This is true whether or not the document is
serialized before being validated.)

A sequence of atomic values in the XSLT/XQuery type system corresponds to a
list data type in XML Schema, and the lexical representation of a list data
type in XML Schema is a whitespace-separated list of tokens. 

We don't have a free choice here. Our processing model relies on having a
reversible mapping between typed values and string values. The mapping from
string values to typed values has already been defined for us by XML Schema,
so we are totally constrained to use the inverse of this mapping in the
opposite direction. We can no more choose the serialized form of our
sequences than we can choose the lexical representation of dates or numbers.


Of course users can format sequences, dates, or numbers in a different way
if they wish; but if they do so, the mapping won't be reversible We've
provided plenty of functions in the function library to allow formatting
into localized representations (at least in XSLT). But the default
representation has to be the one that validates - otherwise an identity
transform would produce an invalid document.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Jonathan Robie [mailto:jonathan.robie@datadirect.com] 
> Sent: 26 October 2004 19:48
> To: Jonathan Robie
> Cc: Martin Duerst; Henry Zongaro; Michael Kay; 
> public-qt-comments@w3.org; w3c-i18n-ig@w3.org
> Subject: Re: [Serial] I18N WG last call comments
> 
> Hi Martin,
> 
> On today's telcon, the XSL and XQuery WGs endorsed the text of the 
> personal comments I made this morning in the following email. I now 
> submit this again, as an official response on behalf of the XSL and 
> XQuery Working Groups.
> 
> Thanks!
> 
> Jonathan
> 
> Jonathan Robie wrote:
> > 
> > Martin Duerst wrote:
> > 
> >> Overall, I think that the convention of using a space between
> >> strings, inherited from SGML NMTOKENS and IDREFS, should not be the
> >> default in XQuery and XSLT to contatenate strings.
> > 
> > 
> > Hi Martin,
> > 
> > For concatenating strings, which is what the concat() 
> function does, we 
> > do not insert anything. I think Henry has shown [1] that our string 
> > manipulation library is pretty good at allowing other 
> delimiters to be 
> > inserted if needed.
> > 
> > Serializing a sequence of atomic values is not the same thing as 
> > "concatenating strings". The lexical representation of these atomic 
> > values is given by XML Schema, and the delimiters used are the 
> > delimiters used by XML Schema. The default for serializing 
> a sequence of 
> > tokens defined by XML Schema pretty much has to be the 
> format defined by 
> > XML Schema, or else XML processors won't be able to read serialized 
> > documents. So for serialization, I think your beef is with 
> XML Schema.
> > 
> > Linguistic tokens and delimiters are not the same as 
> computerlanguage 
> > tokens and delimiters. In my opinion, the biggest problem 
> occurs not 
> > when they differ, but when they are the same. That's why we have to 
> > invent conventions like camelCase or hyphenated-names to 
> allow ourselves 
> > to create computer language tokens that consist of multiple 
> linguistic 
> > tokens. XML Schema could have allowed users to create a sequence of 
> > string values that contain spaces, as in:
> > 
> > <sequenceOfRoads>Gibson Road, Main Street</sequenceOfRoads>
> > 
> > That would require XML Schema to allow an alternate delimiter to be 
> > specified. It doesn't. And it shouldn't - in XML, the best way to 
> > delimit individual items is to use markup:
> > 
> > <roads>
> >   <road>Gibson Road</road>
> >   <road>Main Street</road>
> > </roads>
> > 
> > As a markup language, XML exists for the sole purpose of clearly 
> > identifying data. Let's use it! The alternative is to use 
> microparsing. 
> > But that's not how XML works, and XQuery is based on XML.
> > 
> > We support XML Schema, and that's what our serialization does by 
> > default. If you want a different serialization, you can use string 
> > manipulation to create whatever you want, but an XML Schema 
> processor 
> > won't be able to recognize the tokens.
> > 
> > Jonathan
> > My opinion only. Not on behalf of anyone.
> > 
> 
>
Received on Tuesday, 26 October 2004 22:06:09 UTC