Re: [TF-PP] Document now in CVS from Andy Seaborne on 2009-11-24 (public-rdf-dawg@w3.org from October to December 2009)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Tue, 24 Nov 2009 09:44:28 +0000
To: Ivan Herman <ivan@w3.org>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4B0BAAFC.9050009@talis.com>
On 24/11/2009 08:26, Ivan Herman wrote:
>
>
> Andy Seaborne wrote:
>>
>>
>> On 23/11/2009 11:01, Ivan Herman wrote:
>>> Hi Andy,
>>>
>>> nobody reacted until now... But we should thank you in the first
>>> place!:-)
>>
>> Thanks!
>>
>>> There is still a pending issue in my mind on whether property paths can
>>> handle list elements properly or not. The issue I see is to be able to
>>> query list elements in their 'natural' order.
>>
>> When we had the strawpoll[3], the results [4] didn't suggest much
>> support for considering paths with lengths and much against.  I suppose
>> the subcase of length of * matching might be easier (not sure) but we
>> still have the case of multiple paths to the same RDF term if a list
>> contains the same term twice which is a big difference in approaches.
>>
>> [3]
>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JulSep/0439.html
>> [4]
>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009OctDec/0055.html
>>
>
> I think the issues in the poll went a bit beyond that... The subcase of
> getting access to the length of the match is probably more manageable.

Maybe it is - that's one of the things that needs working out.  I hjope 
I'm wrong but it looks to me like there is nothing different about this 
subcase but I haven't had the time to investigate in depth (hint, hint).

For example, what happens about multiple identical matches?  A BGP 
doesn't generate duplicates - if

   rdf:rest*/rdf:first

and

   rdf:rest*{?len}/rdf:first

are the same, just the first is a projection of the second to remove 
?len, then there are duplicates.  Do we care?

I don't have any experience with a matcher that does not terminate early 
when looking for, say:

 ?list rdf:rest*/rdf:first "SomeFixedItem" .

but "SomeFixedItem" may be in the list twice.

> I am not too much worried about the multiple paths case: we can simply
> declare that this case is undetermined, or we can say that it is
> minimum/maximum of the paths. I believe for the use cases when there is
> a need for the length (like the authors' list I referred to) the path is
> unique.

What are the technical characteristics for this special case?  What is 
the defining feature that makes it a special situation and not an 
application of the general one? If we can articulate that, we may be 
able to define a solution for it.

Suggestion: generate use cases to get concrete requirements.  The 
discussion at the moment is a bit abstract, based on a few small 
examples to know definitely one way or the other.

I am nervous of leaving something undefined and just solving some 
particular cases without an understood framework.  We may place 
compatibility barriers for future standardization work.

>>
>> <Digression>
>>
>> The underlying issues are, as I see it:
>>
>> 1/ A significant amount of list use is for closed sets (c.f. OWL) where
>> order does not matter.  Indeed, order gets in the way.
>>
>
> I am not sure I agree. It is true for OWL but, in many cases, I have the
> impression of a self-fulfilling prophecy here. I guess I have made this
> comment before: in many cases vocabularies are defined as to avoid using
> lists, though it would be the proper modelling option exactly because
> the authors know that SPARQL cannot properly query the order. See for
> example the chapter on "RDF Features Best Avoided in the Linked Data
> Context" in the "How to Publish Linked Data on the Web" tutorial of
> Chris and friends[1]:
>
> [[[
> You should think twice before using RDF collections or RDF containers as
> they do not work well together with SPARQL. Does your application really
> need a collection or a container or can the information also be
> expressed using multiple triples having the same predicate? The second
> option makes SPARQL queries straight forward.
> ]]]

I'd say that was good advice about lists generally, not SPARQL specific 
- they don't work with inference because they are encoded in triples, 
mixing syntax and structure.

>> From a modelling point of view I think this advice is wrong, though I
> understand why it ended in the tutorial. And this is only one example.

Other examples?  Can you gather a comprehensive set of examples? That 
would scope the space so we know what implications the design decision 
will have.

> For containers, the RDFS entailment possibility gives a way to express
> what we want. The result may be that people will begin to use those over
> collections which is again not optimal...

You mean an infered list:member or extended rdfs:member?  Doesn't that 
loose ordering and so is exactly rdf:rest*/rdf:first ?

>
> [1] http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/#datamodel
>
>
>> 2/ To do it properly, structures (list, set, bag, map?, multimap - map
>> less so as we have multiple property values) need to be 1st class
>> objects, in SPARQL at least, if not RDF, and treated the structures as
>> structures, not encoded in triples.  The triple encoding can be done
>> wrong and also obscures the real intent, although that is partially
>> because we only have lists (containers not being very popular).
>>
>> Using property functions:
>>
>> http://www.openjena.org/ARQ/rdf_lists.html
>>
>> ?list list:member ?member
>> ?list list:index (?index ?member)
>> ?list list:length ?length
>>
>> puts lists closer to being 1st class.
>>
>
> I think this was one of the features that were rejected, and did not
> even make it to the time-permitting features. Ie, let us consider that
> as water under the bridge...:-(
>
>> </Digression>
>>
>>> I have digged up some mails from early discussions[1][2]. You
>>> proposed[1] a scheme whereby
>>>
>>> SELECT ?paper ?author
>>> WHERE {
>>>       ?paper ex:authors ?l .
>>>       ?l rdf:rest*{?len}/rdf:first ?author .
>>> }
>>> SORTED BY ?len
>>>
>>> (or something similar) would work to
>>>
>>> would work to give back the author's list in list order. Orri had a more
>>> complex scheme[2] in his mail...
>>
>> If someone wants to work on that ...
>>
>
> I am not really good with the grammar nor with the implementation of
> paths. But, well, from a query point of view isn't it enough to allow
> for a {variable} to appear right after '*' or '+' in a path and having
> it referred as an integer?

I can do the mechanics (the grammar and implementation are easy when you 
have a design target) but there is a lot more to be done, such as 
discussion and consensus building as to the design, handling multiple 
identical results (currently not possible with a BGP).

>>
>>> I believe something like that is really necessary for '*' and '+';
>>> handling lists is one of the big outstanding features in my view and not
>>> handling them via paths would be a real shame...
>>
>> We can handle their use as closed sets which is a step forward.
>>
>
> True but, well, in my opinion it does not solve the bigger issue...

It would be great if you would articulate the bigger issue and then we 
can see the amount of work needed in adding to this time permitting 
feature and also generate some consensus on requirements and approach. 
I don't think we are near that point yet.

 Andy

>
> Ivan
>
>>      Andy
>>
>>>
>>> Ivan
>>>
>>>
>>> [1]
>>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JanMar/0102.html
>>> [2]
>>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JanMar/0116.html
>>>
>>>
>>> Andy Seaborne wrote:
>>>> I've moved the document from the wiki into CVS, and used xmlspec.
>>>>
>>>> http://www.w3.org/2009/sparql/docs/property-paths/Overview.xml
>>>>
>>>> It does not look very nice - we need to settle on a "house style"
>>>> (tables, examples, grammars) although I see something emerging in
>>>> sparql.xsl.
>>>>
>>>>       Andy
>>>>
>>>
>
Received on Tuesday, 24 November 2009 09:44:55 UTC