Re: [Serial] I18N WG last call comments

Martin Duerst wrote:

> Overall, I think that the convention of using a space between
> strings, inherited from SGML NMTOKENS and IDREFS, should not be the
> default in XQuery and XSLT to contatenate strings.

Hi Martin,

For concatenating strings, which is what the concat() function does, we 
do not insert anything. I think Henry has shown [1] that our string 
manipulation library is pretty good at allowing other delimiters to be 
inserted if needed.

Serializing a sequence of atomic values is not the same thing as 
"concatenating strings". The lexical representation of these atomic 
values is given by XML Schema, and the delimiters used are the 
delimiters used by XML Schema. The default for serializing a sequence of 
tokens defined by XML Schema pretty much has to be the format defined by 
XML Schema, or else XML processors won't be able to read serialized 
documents. So for serialization, I think your beef is with XML Schema.

Linguistic tokens and delimiters are not the same as computerlanguage 
tokens and delimiters. In my opinion, the biggest problem occurs not 
when they differ, but when they are the same. That's why we have to 
invent conventions like camelCase or hyphenated-names to allow ourselves 
to create computer language tokens that consist of multiple linguistic 
tokens. XML Schema could have allowed users to create a sequence of 
string values that contain spaces, as in:

<sequenceOfRoads>Gibson Road, Main Street</sequenceOfRoads>

That would require XML Schema to allow an alternate delimiter to be 
specified. It doesn't. And it shouldn't - in XML, the best way to 
delimit individual items is to use markup:

<roads>
   <road>Gibson Road</road>
   <road>Main Street</road>
</roads>

As a markup language, XML exists for the sole purpose of clearly 
identifying data. Let's use it! The alternative is to use microparsing. 
But that's not how XML works, and XQuery is based on XML.

We support XML Schema, and that's what our serialization does by 
default. If you want a different serialization, you can use string 
manipulation to create whatever you want, but an XML Schema processor 
won't be able to recognize the tokens.

Jonathan
My opinion only. Not on behalf of anyone.

Received on Tuesday, 26 October 2004 13:54:37 UTC