RE: XSRQL proposal from Howard Katz on 2004-06-29 (public-rdf-dawg@w3.org from April to June 2004)

From: Howard Katz <howardk@fatdog.com>
Date: Tue, 29 Jun 2004 06:51:07 -0700
To: "Seaborne, Andy" <andy.seaborne@hp.com>, <public-rdf-dawg@w3.org>
Message-ID: <IKEOLCDFPBBPPAHGNKKOGEPJEMAA.howardk@fatdog.com>
> [mailto:public-rdf-dawg-request@w3.org]On Behalf Of Seaborne, Andy
> Sent: Tuesday, June 29, 2004 2:18 AM
> To: Howard Katz; public-rdf-dawg@w3.org
> Subject: RE: XSRQL proposal

> > >
> > > 3/ The data model has subjects, predicates and objects and the syntax
> > > uses cues to indicate what is a predicate using @.  How do I write a
> > > query that does
> > >
> > > (?x ?x anything)
> >
> > I'm assuming that your "anything" might also be called "something" --
> > ie,
> > it's *not* a variable but an actual specific object. If that's correct
> > (is
> > it?), then you'd say:
> >
> >    { *, @*, anything }
> >
> > and would get back all triples that terminate in an "anything" object.
>
> I am interested in both cases - anything as a vraible or anythign
> a a preset
> value.  I'm trying to understand how the graph labels (on nodes
> and on arcs)
> interact in XsRQL and also the type system og subject/predicates/objects
> rather than graph labels.
>
> >    { *, @*, anything }
>
> Does this ensure that the subject and the predicate slot have the same
> URIref?  In (?x ?x ?z) both the ?x's must be bound to the same graph label
> so it does match:
>
>   rdf:type rdf:type rdf:Property .

Hmm... You ask good questions.

The way wildcards work in the path language might be illustrative, since
it's the same. If you say

    *[ @* ]

in the path language, you're asking for *any* subject that's immediately
upstream of *any* predicate. Since by definition all subjects are
immediately upstream of a predicate, the filter here is a no-op, and this
simply means "any subject". This notation doesn't say anything about the
respective names of the subjects and predicates being matched, and
particularly isn't intended to indicate that they have to be the same. The
names vary independently as it were.

It's the same with my wildcarded triple ctor:

{ *, @anyPred, * }

This is going to generate a triple or triples containing a predicate named
anyPred. Other than that, there's no constraints on what the two end nodes
are, and in particular, they *don't* have to be the same node (but can be if
that's legally possible in RDF; I don't know).

> Based on the example with
> { $afghanistan, $afghanistan/@*, $afghanistan/@*/* }

> I think I need a named variable to make them the same:

Yes, these are named variables, so it does work somewhat differently. The
original example for the above was:

let $afghanistan = *[ @ciafb:Name = "Afghanistan" ]
return
    { $afghanistan, $afghanistan/@*, $afghanistan/@*/* }

Because the variable is being bound in a LET expression, it could be a whole
sequence of nodes, and the constructor internally will be doing something
like:

    for each subject <subject> in the sequence of bound $afghanistan nodes
       for each predicate <predicate> that's downstream of that node
           for each object <object> that's downstream of that predicate
               generate the triple { <subject>, <predicate>, <object> }

In fact, all that work the constructor is doing under the hood for a
LET-bound node sequence is something you can do explicitly using FOR:

   for $afghanistan in *[ @ciafb:Name = "Afghanistan" ]
        for $predicate in $afghanistan/@*
            for $object in $predicate/*
                return { $afghanistan, $predicate, $object }

and it looks like that's almost exactly what you're doing below ...

> Would I have to do (sorry for murdering the synatx):
> If the object is a fixed value:
>
> for $s in subject()
>   for $p in predicate()
>    where $s = $p
>    for $o in object() [someValue]
>      { $s , $p , $o }

I believe so, if I understand what you're after. The

    where $s = $p

filter is simply comparing the names of the two comparees. If they have the
same name, whether they're both nodes or one's a node and one's a predicate
or not doesn't matter, the comparison-by-name is going to succeed.

[I'll come back to it below, but there's a more efficient way of doing this
double-for comparison ...]

What gets generated iteratively by your triple constructor in this case is a
sequence of triples, all of which has the same label (is that the term? I
mean the're the same uri) for both subject and predicate, ie

    <I'mUri_1>  <I'mUri_1>  <ICouldBeAnythng>
    <I'mUri_1>  <I'mUri_1>  <I'mSomethingElse>
    <I'mUri_2>  <I'mUri_2>  <WhateverYouLike>
    <I'mUri_3>  <I'mUri_3>  "I'm getting bored"

> or for a variable object, and different way to write the $p condition:
>
> for $s in subject()
>   for $p in */@$s
>    for $o in object()
>     { $p , $p , $o }       Not sure about $p in the subject slot/.

The "@" notation isn't needed on variables (since they can hold
heterogeneous sequences of nodes and/or predicates and/or integers and/or
strings etc), so you'd say

    for $s in subject()
        for $p in */$s

but I don't think that's what you want (I think!). In fact, here's the most
efficient way of doing this double loop comparison:

    for $s in subject()
        for $p in $s/@*
        where $p = $s
        for $o in object()
        return
            ...

In the second for loop in this case, the $p is dereferenced off the $s node
that's already been isolated in the preceding step, which is much more
efficient than

        for $p in predicate()

which is looping through every predicate in the store over and over again
for every subject in the store.

> Another example query would be:
>
> (<x> ?p ?v)
> (?p rdfs:range ?w)
>
> ?p must be the same in both triple patterns in any one solution of the
> query.

I need to go grok that one offline for the moment. Let's continue after the
telecon ...

Howard
>
> 	Andy
>
Received on Tuesday, 29 June 2004 09:50:13 UTC