W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > February 2005

Re: Problems with WITH and FROM

From: Bob MacGregor <bmacgregor@siderean.com>
Date: Wed, 23 Feb 2005 09:43:48 -0800
Message-ID: <421CC0D4.2010603@siderean.com>
To: andy.seaborne@hp.com
CC: public-rdf-dawg-comments@w3.org
Andy,

Seaborne, Andy wrote:

> Bob MacGregor wrote:
>
>> The WITH and FROM constructs embody some unfortunate choices.
>>
>> Background graphs represent the most common case, while named
>> graphs are going to see less use.  The FROM keyword in SQL will
>> seem most familiar to programmers, and its natural that it would
>> refer to background graphs.  The current language spec reverses this
>> commonsense idea, teaming the less used keyword WITH with the
>> more used type of graphs.
>
>
> The working group has had a long debate about these keywords - in a 
> strawpoll,
> people found it marginally clearer if FROM kept the name and WITH 
> didn't.  I
> agree that it can confuse and better suggestions are welcome.

I think this provides good evidence that having both forms is a 
mistake.  Given
that only one of them (WITH) is essential, the other should be dropped.

>> Suppose an unsuspecting programmer writes the following
>>
>> SELECT ?a
>> FROM my:g
>> WHERE (?a my:foo my:b)
>>
>> The FROM clause will either be ignored or an error message might
>> be thrown, but its not going to give the "expected" results.  
>> Substituting
>> WITH for FROM fixes the problem, but it would be much better
>> if we didn't design a propensity for human error right into the 
>> language.
>>
>> The FROM clause inherently defines a disjunction (the poorly-
>> named UNION construct).  However, by masking the fact that
>> it defines a disjunction, it also masks a lack of generality.  Consider
>> the following query, which either is legal or should be (modulo my
>> ignorance of SPARQL syntax):
>>
>> SELECT ...
>> WHERE
>>    Graph ?g1 {...}
>>    Graph ?g2 {...}
>>    AND
>>      {{?g1 = my:a} UNION {?g1 = my:b}
>>    AND
>>     {{?g2 = my:c} UNION {?g2 = my:d}}
>>
>> This query is not expressible using the FROM clause.  This example shows
>> that the FROM clause is superfluous (or should be), and non-general.
>
>
>
> I'm not complete clear as to what RDF dataset your trying to build 
> here - could
> you describe it some more?  What is the "="? assignment, equality test or
> owl:sameAs (N3 style).

Of course I mean mathematical equality (e.g., the KIF equality operator).
I take it for granted that any reasonable
statement-based logic language would include it.  Apparently, SPARQL 
does not,
and instead introduces bizareness like distinguishing WITH and FROM.  
The current
language appears to be very liberal in introducing specialized 
operators, and leery
of introducing generic, broadly applicable ones.

>
> Is it trying to form the RDF merge of two graphs into new,  named 
> one?  If so,
> than the expectation is that union is formed outside the query 
> environment (c.f.
> creating the inference closure of a graph).

If SPARQL had a clear semantics, then the meaning of a language form 
would be
pretty much self-evident, independent of the various use cases.  
Instead, statement-like
notions (e.g., conjunction) are being mixed up with set-like notions 
(UNION).

The quad-like SOURCE operator that SPARQL had a while back had a 
well-defined
interaction with the rest of the language.  If the GRAPH  operator is 
just a semantic
variant, then it would also have a well-defined interaction, e.g.,

    Graph ?g {(?a p1 ?b) (?c p2 ?d)}

would be equivalent to

     {Graph ?g (?a p1 ?b)} AND
     {Graph ?g (?c p2 ?d) }

However, I can imagine that these two statements are not equivalent 
(and  the might not
be  legal SPARQL, but should be).  The SPARQL language is becoming 
increasingly less
comprehensible.  Adding more use cases isn't going to fix it.  
Streamlining the syntax is.

> The WITH/FROM combination isn't the only way to assign a dataset to a 
> query -
> indeed, I don't expect it to be the usual way for systems of any size 
> where I
> expect the dataset to be passed to the query engine
>
> We did discuss the fact that association of the dataset to the query 
> execution
> is a protocol issue, but sometimes there is no "protocol" to peform 
> this (e.g.
> local query).  This will work out as we publish the protocol.
>
> WITH/FORM are most useful in small scale use (including the testing) 
> where creation of graphs just for the life of the query is not an 
> undue burden.
>
> See also:
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JanMar/0070.html
> for some examples of datasets
>
>>
>> Best would be to eliminate it entirely, and rename "WITH' to 'FROM'.
>>
>> If you really want convenience, which is what FROM seems to be all
>> about, try adding an 'IN' operator, e.g.,  something like
>>
>>      ?g1 IN {my:a, my:b}
>> AND
>>      ?g2 IN {my:c, my:dJ}
>
>
> Could you explain that example a bit further?  I find myself guessing 
> as to it
> meaning but I think it is creating the RDF merge of my:a and my:b and 
> assigning
> it to ?g1 (etc).  What name does it get?  Some system defined out?

SQL defines the "IN" operator -- its equivalent to

     {?g1 = my:a} OR {?g2 = my:b}

But then, SPARQL doesn't have equality.  Defining a language has the 
side-effect
of circumscribing what you can and can't imagine.  The current SPARQL is
overly circumscribed, which makes it hard to imagine better alternatives 
while
still expressing examples in SPARQL.  That is
why I recommend looking at SQL -- it is very well defined, very 
expressive, and
allows one to easily express things (such as examples I've thrown in 
above).  I
can correctly interpret all of the the normal SQL expressions without having
to build little database models.

>
> If the query says: "SELECT ?g1" what gets returned for ?g1 ?
>
I assume/hope the expansion using "OR' and '=" answers your question.

>>
>> Cheers, Bob
>>
>>
>     Thanks for the comments
>     Andy
>

-- 

Bob MacGregor
Chief Scientist

	
	Siderean Software Inc
390 North Sepulveda Blvd., Suite 2070
<http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=5155+Rosecrans+Ave&csz=Hawthorne%2C+Ca+90250&country=us> 
El Segundo, CA 90245
bmacgregor@siderean.com <mailto:bmacgregor@siderean.com> 	
tel: 	+1-310 647-4266
fax: 	+1-310-647-3470

 

 

 

 
Received on Wednesday, 23 February 2005 17:44:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:14:47 GMT