Re: [] or <> as root ? from Henry Story on 2006-03-13 (semantic-web@w3.org from March 2006)

From: Henry Story <henry.story@bblfish.net>
Date: Mon, 13 Mar 2006 12:29:34 +0100
To: atom-owl@googlegroups.com, Semantic Web <semantic-web@w3.org>
Message-Id: <E43789AC-82C3-46E8-A055-941DCDAD7037@bblfish.net>
Here is a little puzzle regarding atom that it would be interest to  
have some feedback from the larger Semantic Web community. We are  
wondering if there are best practices guidelines for updating  
semantic web data found on the web. We have an ontology for the Atom  
(rfc4287) spec called AtomOwl [1], that would allow us to GRDDL atom  
documents into graphs.

This thread started off with the question as to whether one should map

  <entry>
        <title>Atom-Powered Robots Run Amok</title>
        <link href="http://example.org/2003/12/13/entry"/>
        <id>tag:example.com,2003/blog/entry1</id>
        <updated>2003-12-13T18:30:02Z</updated>
        <summary>Some text.</summary>
</entry>

to

           <> a :Entry;
              :title [ :value "Atom-Powered Robots Run Amok";
                       :type "text/plain" ];
              iana:alternate <http://example.org/blog/entry.html>;
              :id <tag:example.com,2003/blog/entry1>;
              :updated "2003-12-13T18:30:02Z"^^xsd:dateTime;
              :summary [  :value "some text";
                          :type "text/plain" ] .

or to

           [] a :Entry;
              :title [ :value "Atom-Powered Robots Run Amok";
                       :type "text/plain" ];
              iana:alternate <http://example.org/blog/entry.html>;
              :id <tag:example.com,2003/blog/entry1>;
              :updated "2003-12-13T18:30:02Z"^^xsd:dateTime;
              :summary [  :value "some text";
                          :type "text/plain" ] .



On 12 Mar 2006, at 21:03, Reto Bachmann-Gmür wrote:
> Aren't id and updated together a cifp, so that in your examples we are
> unambiguously talking about the same resource whether it is named  
> or not?

Yes. Though David Powell had some good arguments against using the CIFP

   @prefix cifp: <http://eulersharp.sourceforge.net/2004/04test/ 
rogier#>.
   [] cifp:productProperty ( :updated :id );
          a owl:InverseFunctionalProperty .

in some earlier mails to the atom-owl list (misleadingly) entitled by  
me "Feed or Document" [2]. But perhaps the following reasoning can  
help resolve that issue...

> I do however agree in the fundamental question whether atom-owl should
> be good to describe thing over time or just at a specific moment in
> time. If the second design goal is chosen then aggregators may rely on
> some more generic graph versioning systems, of which - as you  
> mention -
> a possible implementation would be quad-based.

Clearly AtomOwl has to be able to describe entries (as identified by  
their id) evolving over time, since there can be more than one entry  
with the same id in a feed. This is a great feature of Atom as it  
does allow the description of the history of certain types of  
resources over time. But in the discussion with David Powell it came  
up that people may want to update an entry without modifiying the  
time stamp. So perhaps the publisher will decide that

  <entry>
        <title>Atom-Powered Robots Run Amok</title>
        <link href="http://example.org/2003/12/13/entry"/>
        <id>tag:example.com,2003/blog/entry1</id>
        <updated>2003-12-13T18:30:02Z</updated>
        <summary>Some text.</summary>
</entry>

published at <http://example.org/coll/> using HTTP POST as specified  
by the Atom Publishing Protocol [3] and resulting in an entry being  
placed at
<http://example.org/coll/entry1.atom> needs a change that he  
considers insignificant. So he will PUT the following xml at that  
location:

<entry>
        <title>Atom-Powered Robots Run Amok in France</title>
        <link href="http://example.org/2003/12/13/entry"/>
        <id>tag:example.com,2003/blog/entry1</id>
        <updated>2003-12-13T18:30:02Z</updated>
        <summary>Some text.</summary>
</entry>

Let us assume that this is acceptable behavior.

After that PUT operation, the feed representing the collection will  
be updated too. There will of course only be one entry with the  
2003-12-13T18:30:02Z time stamp as required by the spec. This entry  
will have the new title "Atom-Powered Robots Run Amok in France".

A Atom-OWL based GRDDL tool that would refetch the entry1.atom  
document would create a new set of triples. And if we were to just  
add these to our triple store (which the [] notation is more  
favorable to) we would end up with 2 anonymous entries in our triple  
store with the same time stamp. With the CIFP rule we would end up  
with a contradiction. So we could of course as suggested by David  
Powell add an extra "fetched-at" relation on each blank node entry  
(and remove the CIFP rule) and then base our idea of the actual state  
of the feed on that relation.

But from what I understand of the way Tim Berners Lee is thinking  
about the SemWeb the correct thing to do might in fact be to remove  
the triples generated by the initial GET from your working graph (you  
can always relegate it to a archive graph of course), and replace  
them with the new triples.

So you would replace graph G1

<http://example.org/coll/entry1.atom> a :Entry;
              :title [ :value "Atom-Powered Robots Run Amok";
                       :type "text/plain" ];
              iana:alternate <http://example.org/blog/entry.html>;
              :id <tag:example.com,2003/blog/entry1>;
              :updated "2003-12-13T18:30:02Z"^^xsd:dateTime;
              :summary [  :value "some text";
                          :type "text/plain" ] .

with graph G2

<http://example.org/coll/entry1.atom> a :Entry;
              :title [ :value "Atom-Powered Robots Run Amok in France";
                       :type "text/plain" ];
              iana:alternate <http://example.org/2003/12/13/entry.html>;
              :id <tag:example.com,2003/blog/entry1>;
              :updated "2003-12-13T18:30:02Z"^^xsd:dateTime;
              :summary [  :value "some text";
                          :type "text/plain" ] .

One should of course write ontologies that are monotonic but we have  
also have to allow people to fix errors they make when publishing  
statements to the Semantic Web, and a PUT overwriting a document does  
just that. So it makes sense.

Now this leaves us with a problem of asynchronous graph updates. A  
client may for example update the graph at http://example.org/coll/ 
entry1.atom resulting in graph G2  without having yet had time to  
update the feed at http://example.org/coll/ which (after GRDDL  
transform) contains graph G3

           [] a :Entry;
              :title [ :value "Atom-Powered Robots Run Amok";
                       :type "text/plain" ];
              iana:alternate <http://example.org/2003/12/13/ 
atom03.html>;
              :id <urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a>;
              :updated "2003-12-13T18:30:02Z"^^xsd:dateTime;
              :summary [  :value "some text";
                          :type "text/plain" ] .


which is compatible with G1 but not G2 (given our CIFP). So what  
should such an aggregator do?
   - a client using the APP protocol would presumably know that it  
will need to update the feed too, and so it could refetch that and  
replace its graph. That's feasible.
   - an aggregator that was not involved in the process, and so did  
not know about the PUT operation that had just happened, could notice  
the contradiction and try to resolve it by refetching the feed,  
noticing that the version it had was older than the entry.


	Anyone have experience in dealing with updates across rdf documents  
on the web? And how to deal with contradictions?

    Henry Story



> reto
>
> Henry Story wrote:
>> I have been reading the "Reaching out onto the Web" document at
>> <http://www.w3.org/2000/10/swap/doc/Reach> a little and am trying to
>> see how keeping this in mind would affect the ontology.
[snip]


[1] http://bblfish.net/work/atom-owl/2005-10-23/
[2] http://groups.google.com/group/atom-owl/browse_frm/thread/ 
357e36c4ee9cd31b
[3] http://bitworking.org/projects/atom/draft-ietf-atompub- 
protocol-08.html
Received on Monday, 13 March 2006 11:29:52 UTC