Re: Removing triples from a store? from Mark Birbeck on 2010-06-04 (public-rdfa-wg@w3.org from June 2010)

From: Mark Birbeck <mark.birbeck@webbackplane.com>
Date: Fri, 4 Jun 2010 16:20:09 +0100
To: Ivan Herman <ivan@w3.org>
Cc: Toby Inkster <tai@g5n.co.uk>, W3C RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <AANLkTintP_3Xc7aHe79KLH_BnFaX9BRzsubNGe2ChhmT@mail.gmail.com>
Hi Ivan,

On Fri, Jun 4, 2010 at 2:56 PM, Ivan Herman <ivan@w3.org> wrote:
> On Jun 4, 2010, at 14:10 , Mark Birbeck wrote:
>
>> Hi Ivan,
>>
>> 'Triple adding' will also be important for programmers.
>>
>> A typical use-case is to parse a page for details of a Twitter
>> account, and then use those details to retrieve the latest tweets by a
>> person. You /could/ simply retrieve the tweets and display them --
>> essentially an 'old-fashioned' JavaScripty technique. But a better
>> approach would be to retrieve the tweets and then just drop them
>> straight into the store. Once in the store they can be queried for,
>> and manipulated in whatever way we want, including being displayed.
>>
>
> Which is line with what I said about parsing. Ie, that adding a triple is important for parsers.
> I am not sure whether it is important to have the possibility to programmatically create a triple
> and then add that triple to a store.

The scenarios I'm describing are about programmers adding the triples
'programmatically', *not* the parser doing it.

In addition to the Twitter example I gave, consider geocoding (the
following example is from my draft but I hadn't realised it didn't
make it into the spec...I think it's worth putting in to illustrate
these points).

When geocoding, we take an address and ask a service (like Google's,
for example) to give us back the latitude and longitude. The
'standard' JavaScript would look something like this:

  geocoder = new google.maps.Geocoder();

  function geocode( address, callback ) {
    geocoder.geocode( { 'address': address }, function(results, status) {
      if (status == google.maps.GeocoderStatus.OK) {
        // Call the programmer-provided function with the lat and long:
        //
        callback(results[0].geometry.location.lat(),
          results[0].geometry.location.long());
      }
    }
  }

Note that the moment we get the data, we do something with it, by
calling the author-provided callback function.

It's pretty straightforward to see how we might use the RDFa API to
geocode addresses that were marked up with RDFa:

  // Select all addresses and create a property group for each one:
  //
  ar = document.data.query.select( {
    a: "http://rdf.data-vocabulary.org/#Address",
    "http://rdf.data-vocabulary.org/#street-address": "?street",
    "http://rdf.data-vocabulary.org/#locality": "?locality",
    "http://rdf.data-vocabulary.org/#region": "?region",
    "http://rdf.data-vocabulary.org/#postal-code": "?zip",
    "http://rdf.data-vocabulary.org/#country-name": "?country"
  } );

  // Iterate through the property groups:
  //
  for (i = 0; i < ar.length; i++) {
    // Get the property group:
    pg = ar[ i ];

    // Geocode the address and add result to store
    geocode( pg.street + ", " + pg.locality + ", " + pg.region
      + ", " + pg.zip
      + ", " + pg.country,
      function (lat, long) {
        // Do something, like add to a map.
        //
      }
    );
  }

What we've done is used the RDFa API to find a bunch of addresses, but
we've then processed them just as we would do in a normal JavaScript
application.

However, what I'm suggesting is that it is far more powerful to add
the lat/long to the store, and then work with /that/. So our geocode
function changes to this:

  // Set up a function to geocode an address and add the lat/long to a store
  function geocode( store, subj, address ) {
    geocoder.geocode( { 'address': address }, function(results, status) {
      if (status == google.maps.GeocoderStatus.OK) {
        store.add("default", subj, "a", "http://rdf.data-vocabulary.org/#Geo");
        store.add("default", subj, "http://rdf.data-vocabulary.org/#latitude",
          results[0].geometry.location.lat());
        store.add("default", subj, "http://rdf.data-vocabulary.org/#longitude",
          results[0].geometry.location.long());
      }
    });
    return;
  }

Now our triple-store contains the addresses that appeared in the
original document, plus the lat/long for each successfully geocoded
address. The programmer can now write JavaScript applications that use
all of these triples in whatever way they want.

So, the add() method is most definitely of use to programmers.


> But I think this is a bit a side issue, because the fact of being used for a parser
> means that it is necessary to have an 'add' method, and that is fine. We do not have
> an issue there.

It's not an 'issue' as such -- I'm just trying to convey that adding
triples to the store directly will be an everyday thing, not just
something going on under the hood.


>> Separating the retrieval of additional data from actions performed on
>> that data makes for a very powerful programming model.
>>
>> Which brings us back to your point about /removing/ triples...
>>
>> My instinct tells me that there must be a scenario for deleting
>> triples that is comparable to the one I've just described for adding
>> them -- but I can't think of one at the moment. :)
>>
>
> That is exactly my feeling:-)

:)


>> The use-case itself would probably relate to removing data from the
>> store that is difficult to ignore when writing a query. However,
>> although I think we probably should support this feature, I have to
>> say that in all the applications I've developed using my RDFa library,
>> it's much more common to want to delete entire graphs.
>>
>
> Which we have, there as a possibility to clean an entire store.

I think you need to be able to manage multiple graphs per store.

Anyway, we can have that debate in its proper place -- I have many
use-cases for this, when we get to that debate. :)


>> For example, in the Twitter scenario I gave earlier, I actually insert
>> the tweets into separate named graphs within a store. This means that
>> if I then go back to retrieve tweets for the same person again I can
>> simply delete their 'tweet graph' (one per person), rather than having
>> to find the individual tweets that are now buried in one monolithic
>> graph. Once the graph has been deleted it's a simple case of creating
>> it again and then adding in the latest tweets.
>>
>> Obviously we haven't added graph support to the API yet, but my guess
>> is that any scenarios we think of for deleting triples will more than
>> likely be better served by providing named graph support so that we
>> can delete an entire collection in one go.
>>
>
> I am not sure what you mean by graph support to the API. In my mind, a store is
> a graph, the only reason we have not called it a graph is because we do not want
> to scare people away with unknown concepts. The application can have as many
> stores^H^H^H^H^H^Hgraphs as he/she wants, so I do not think we need anything
> more...

Well, first, I think you do our audience a disservice if you think
they won't be able to grasp the idea of a graph. :)

But anyway, the terminology is irrelevant; the main thing that we need
to establish is the 'unit of query'; i.e., what is the thing that I
will query against, and what are the boundaries for finding triples?

In my implementation I can query across many graphs in a store. So
after parsing a document, I have a store which contains all of the
triples provided by the publisher. But then I might do some geocoding
and retrieve tweets, which means I will end up with a bunch more
graphs, each containing a handful of triples. I can query the *store*
to get tweets and locations together, despite the fact that they are
in separate graphs. But by being in separate graphs I maintain their
distinct provenance, and I can control them individually.

The problem with saying that everything I have just described could
just as easily apply to individual stores is that whilst graphs differ
by provenance, stores may differ by implementation. For example, I may
create a store that connects directly to HTML5 Local Storage, and I
might save some triples that can be used within different pages of the
same web-site.

But anyway...as I said, that's a discussion we're still to have, so we
should either hold off on it for a little while, or move it to a
separate thread and start talking about use-cases.

In the meantime, I don't think we disagree that we want to be able to
remove triples; the question will be whether we remove triples
individually, whether we should use a query, or whether we should
allow both.

Regards,

Mark

--
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)
Received on Friday, 4 June 2010 15:20:53 UTC