Re: fixing regex collations [Was: Re: [Fwd: Comments on SPARQL from the XML Query and the XSL WGs]]

On Sat, Nov 19, 2005 at 11:44:58AM -0500, Eric Prud'hommeaux wrote:
> On Fri, Nov 18, 2005 at 12:22:19PM -0600, Dan Connolly wrote:
> > 
> > On Fri, 2005-11-18 at 12:45 -0500, Eric Prud'hommeaux wrote:
> > > On Fri, Nov 18, 2005 at 06:37:52AM -0500, Eric Prud'hommeaux wrote:
> > > > On Thu, Nov 17, 2005 at 01:10:44PM +0000, Seaborne, Andy wrote:
> > > > > -------- Original Message --------
> > > > >   > From: Ashok Malhotra <>
> > > > >   > Date: 13 September 2005 16:28
> > > > >   >
> > > > >   > Notes on SPARQL Query Language for RDF
> > > > >   > Last Call Draft July 21, 2005
> > > > >   > ...
> > > > >   > 6. String comparison is defined only using the code point collation.
> > > > >   > Other collations are not supported.  This may be a significant
> > > > >   > limitation.
> > > > > 
> > > > > Code point collation is always required.  Access to other collections can be
> > > > > done through a custom function.
> > > > 
> > > > @@needs work here -- we say nothing about default vs user-supplied
> > > > collations.
> > > 
> > > XPath's fn:matches
> > >   http://www.w3.org/TR/xpath-functions/#func-matches
> > > now has this exciting thing to say about collations:
> > > [[
> > > Note:
> > > 
> > > Regular expression matching is defined on the basis of Unicode code
> > > points; it takes no account of collations.
> > > ]]
> > > 
> > > which means we have no functions that require collations. The sentence
> > > [[
> > > The collation is defined in section 7.3.1 Collations.
> > > ]]
> > > needs to go away. I don't think we need to repeat the note.
> > > 
> > > Do I need a vote on this?
> > 
> > If so, we can do it after publication.
> > 
> > I haven't studied the details.
> > 
> > >  or can I strike it before the publication?
> 
> I think it never made sense. Going back at least as far as
>   http://www.w3.org/TR/2003/WD-xpath-functions-20031112/#func-matches
> we can see that note.
> [[
> Note:
> 
> Regular expression matching is defined on the basis of Unicode
> code-points; it takes no account of collations.
> ]]
> 
> I thought I saw it listed in a table with "A collation may
> be specified", but I can't find that text again so I have
> no confidence that I ever saw it. It seems unlikely to me
> that anyone voted for regex with some expectation that it
> would use collations. Not impossible, just onlikely. A test
> case that distinguishes them:
> 
>   _:a foaf:givenName "Björn".
> 
>   ASK { _:a foaf:givenName "Bjoern" }

make that: ASK { _:a foaf:givenName ?gn
		 FILTER regex(?gn, "^Bjoern$")}

> would fail now, could have been concieved to pass before.

For fn:collation, I've added the text
[[
The collation for fn:compare is defined by XPath and identified by
http://www.w3.org/2005/xpath-functions/collation/codepoint. This
collation allows for string comparison based on code point
values. Codepoint string equivilence can be tested with RDF term
equivilence.
]]
above the operator table and removed these rows:
  xsd:boolean   xsd:string = xsd:string
  xsd:boolean   xsd:string != xsd:string

Justification:
  Implementing collations is lots of work and I haven't seen anyone
crying out for it.

Alternative1 -- mirror XPath: 
XPath's fn:compare function takes an optional collation argument.
"A = B" doesn't leave room for a collation, so we'd have to provide
some sort of system collation in the protocol
  &collation=http://...
or the query language
  COLLATION <http://...>
and say that implementations MUST support the codepoint collation.
This would leave reason to keep:.
  xsd:boolean   xsd:string = xsd:string

Alternative2 -- extensible collations:
Say that there IS a collation, keep
  xsd:boolean   xsd:string = xsd:string
but not say how the collation is used.
I have a hard time imagining how we'll test this or how the world will
get any interop on xsd:string = xsd:string
-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Sunday, 27 November 2005 21:37:19 UTC