Re: NULL strings... necessary? from Arnt Gulbrandsen on 2005-06-01 (public-ietf-collation@w3.org from June 2005)

From: Arnt Gulbrandsen <arnt@gulbrandsen.priv.no>
Date: Wed, 1 Jun 2005 19:51:56 +0200
To: public-ietf-collation@w3.org
Cc: Martin Duerst <duerst@it.aoyama.ac.jp>, Jim Melton <jim.melton@acm.org>
Message-Id: <S+C7nSObjsytpsELe5Al5g.md5@prosecco.oryx.com>

I thought about this.

Jim Melton writes:
> Indeed, XQuery and XSLT make a distinction here.  However, the term 
> "null value" is not used by those languages nor by the data model on 
> which they depend (the XPath 2.0 and XQuery 1.0 Data Model).  The 
> closest analog to "null value" in that Data Model is "empty 
> sequence", which is a sequence with nothing at all in it.  That is 
> very clearly different from a value of xs:string type that contains 
> exactly zero characters.  The term used by the Data Model for such a 
> string value is "zero-length string".  Note that the Data Model does 
> *not* use the term "empty value" nor "empty string" because of the 
> proven potential for confusion with "empty sequence".

On the face of it, this divide would seem to match the one I'm looking 
at. But not really. See below.

> Because of the specific choice of vocabulary ("empty sequence" and 
> "zero-length string") used by the Data Model, versus the choice of 
> vocabulary used by Arnt below ("NULL/NIL string" and "empty string"), 
> and the absence of any statement by Arnt of what his terms mean, I 
> cannot address his actual question.  Sorry!

Part of the problem is that the term "NULL string" is used, but not 
defined, in the comparator draft. So I don't have a real definition for 
you. (The lack of a real definition was the reason I spotted this, 
btw.)

Anyway. By now it's clear that every (potential) user of collations 
agrees: The null concept as described doesn't go well with existing 
code.

> I should also mention that XQuery's order by clause (part of the FLWOR 
> expression) allows query authors to specify whether empty sequences 
> sort less than or greater than all non-empty-sequence values.

One of the specified properties of null strings is that they sort after 
all non-null strings, but before error strings. (Error strings are 
strings outside the collator's domain, for example non-numbers in the 
case of a numeric collator.) To support XQuery's empty sequences, null 
strings would need user-definable behaviour... which I think is going a 
little bit too far.

I lean towards saying that collations sort strings, and if you want a 
null concept, you have to handle it, and implement the behaviour that 
suits you. That way no new requirements are imposed upon either sieve 
or imap sort.

Arnt

Received on Wednesday, 1 June 2005 17:52:27 UTC