Concerning LET or AS

Thank you for your time and attention at the WG meeting today.

TopQuadrant would like Holger's earlier comment [1] to be treated as a 
formal comment. (i.e. with an official WG response on this mailing list).

My understanding from today's meeting is that that is likely to be that 
the WG has already considered the LET design and believes the AS design 
to be adequate.
(LET is merely an abbreviated form for certain AS constructs). I also do 
not believe that TopQuadrant is bringing any new information that was 
not considered at your f2f meeting [2].

We however feel strongly about this, and are likely to raise a formal 
objection (in the sense that we believe it would be better for the WG to 
take a few weeks longer over SPARQL 1.1 and get this right, than to 
deliver SPARQL 1.1 on schedule without this feature).

Thinking through particularly Steve's comments, I tried to come up with 
an example illustrating how the ordering of operations that is sometimes 
required is better articulated with LET than with AS.
This example is not as polished as I would like, since I believe it is 
more helpful to contribute during your F2F meeting.

First I wish to clarify that this is not about whether or not assignment 
should be in SPARQL 1.1. Assignment is in already, with the AS construct 
that was discussed under item 39. This issue is purely about the syntax 
and scoping rules for the single assignment capability.

Many of the sort of processing tasks that we and are customers have 
involve mapping several legacy sources together, merging them into one 
RDF graph, and then doing some processing.
A frequent problem is that different legacy sources represent the same 
data in different ways, e.g. with different case conventions, in 
different units, or whatever. In these cases, data laundry of one sort 
or another is necessary. One option for laundry is using functions and 
assignment within SPARQL.

So for my example, I am taking information about alumni at a college and 
trying to find the appropriate year photo for them.
I will simplify the name problem to a name consist of a first name and a 
last name, (no middle initial),  but people change their last name from 
time to time.

The data sources that I have include:
- a current mailing database, with full-names, e-mail addresses, and 
addresses
  a:fullName a:email a:address
  _:w a:fullName "John Smith" .
  _:w a:email <mailto:john.smith@example.org>.

- a database with students first names and last names and former last names
   to simplify processing I just use two properties
   b:firstName
   b:lastName

  for example:
   _:x  b:firstName "John" .
   _:x b:lastName "Doe" .
   _:x b:lastName "Smith".

shows that the person known as John Doe and the person known as John 
Smith are one and the same, without clarifying the chronology of the 
name change.

- a database with date of matriculation, and years of study, by full 
name at time of matriculation
    c:matriculationDate c:studyYears c:fullName 
  _:y c:fullName "John Doe" .
  _:y c:studyYears "P1Y"^^xs:yearMonthDuration .
  _:y c:matriculationDate "1988-09-01"^^xsd:date.  

- and a list of graduation photo names by year.
    d:year  d:fileName
   _:z d:year "1988"^^xsd:date
  _:z d:fileName "classOf88"

- I have arranged these photos as jpg files on the web at 
http://www.example.org/photos
   http://www.example.org/photos/classOf88.jpg


SELECT ?eMail ?image
WHERE 
{ ?a a:email ?eMail .
  ?a e:fullName ?fullName
  LET ( ?fullNameSpaceNormalized=normalize-space(?fullName) )        [A]      
  LET ( ?firstName=substring-before(?fullNameSpaceNormalized," ")    [B]
        ?lastName=substring-after(?fullNameSpaceNormalized," ") )
  ?b b:firstName ?firstName .
  ?b b:lastName ?lastName .
  ?b b:lastName ?altLastName .                                       [C]
  LET ( ?altName=concat(?firstName, " ", ?altLastName ) ) 
  ?c c:fullName ?a;tName .
  ?c c:studyYears ?lengthOfCourse .
  ?c c:matriculationDate ?matriculate .
  LET (?endDate=|year-from-date(add-yearMonthDuration-to-date(?matriculate,?lengthOfCourse)) )
  ?d d:year ?endDate .
  ?d d:fileName ?imageFile .
  LET ( ?image = xs:anyURI(concat("http://www.example.org/photos", ?imageFile, ".jpg" ) ) )|
}

Notes:
[A] for robustness against leading/trailing space and/or double space in 
the middle
[B] cannot be combined with [A] because of rules discussed under issue 39
[C] ?altLastName can be the same as ?lastName

I believe the WG is considering recommending that this query should be 
written as follows.

SELECT ?eMail,
       xs:anyURI(concat("http://www.example.org/photos", ?imageFile, 
".jpg" ) )  as ?image
WHERE {
  SELECT ( * 
year-from-date(add-yearMonthDuration-to-date(?matriculate,?lengthOfCourse))  

              AS ?endDate )
  WHERE {
    SELECT ( * concat(?firstName, " ", ?altLastName ) AS ?altName ) 
    WHERE {
      SELECT (* substring-before(?fullNameSpaceNormalized," ") 
                  AS ?firstName,
              substring-after(?fullNameSpaceNormalized," ") AS ?lastName )
      WHERE {
        SELECT (* normalize-space(?fullName) as ?fullNameSpaceNormalized)
        WHERE {
          ?a a:email ?eMail .
          ?a e:fullName ?fullName .
        }
      }       
    ?b b:firstName ?firstName .
    ?b b:lastName ?lastName .
    ?b b:lastName ?altLastName .
    }
  ?c c:fullName ?a;tName .
  ?c c:studyYears ?lengthOfCourse .
  ?c c:matriculationDate ?matriculate .
  }
?d d:year ?endDate .
?d d:fileName ?imageFile .
}

(Using the equivalence from [3])
We believe that this is inferior.
Harder to write, harder to read, harder to understand, and that the cost 
of complicating the language by having two ways to say the same thing is 
well worth it.


Jeremy Carroll
AC Rep, TopQuadrant.



[1]
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2009Oct/0003
[2]
http://www.w3.org/2009/sparql/meeting/2009-05-06#ProjectExpressions___26___20_Assignment 

[3]
http://www.w3.org/2009/sparql/wiki/Feature:Assignment#Equivalence_with_SubSelects_and_ProjectExpressions

Received on Tuesday, 3 November 2009 07:30:55 UTC