Re: PROV-ISSUE-89 (what-entity-attributes): How do we find the attributes of an entity? [Formal Model] from Stian Soiland-Reyes on 2011-09-09 (public-prov-wg@w3.org from September 2011)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Fri, 9 Sep 2011 14:32:04 +0100
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Cc: Satya Sahoo <satya.sahoo@case.edu>, public-prov-wg@w3.org
Message-ID: <CAPRnXtkK76WERthSQsXtSEf-XAvUCeM34PH+1-dhZLnM4V8kxw@mail.gmail.com>
On Thu, Sep 8, 2011 at 17:31, Luc Moreau <L.Moreau@ecs.soton.ac.uk> wrote:


> But how does it work in an open world context, when there may be other
> assertions in your triple
> store, e.g. e1 hasColor blue.
>
> But the color property is not one of the attributes  used in any of e1, e2,
> e3.

*Warning* - another long Stian email.


RDF-wise we I think we would need to express these attributes outside
the regular graph.


For instance with anonymous nodes and prov:wasCharacterisedBy :

:e1 a prov:Entity ;
  prov:wasCharacterisedBy [
     car:company "Toyota" ;
       car:model "Corolla" ;
      car:identification "1a"
  ] .

:e2 a prov:Entity ;
  prov:wasCharacterisedBy [
     car:company "Toyota" ;
     car:model "Corolla" ;
     car:identification "1a" ;
     car:owner "tom"
  ] .

:e3 a prov:Entity ;
  prov:wasCharacterisedBy [
     car:company "Toyota" ;
     car:model "Corolla" ;
     car:identification "1a" ;
     car:owner "luc"
  ] .

# Other statements, not part of prov:wasCharacterisedBy
:e1 car:hasColor "blue" .
:e2 car:hasColor "blue" .
:e3 car:hasColor "blue" .



If prov:wasCharacterisedBy is a subproperty of owl:sameAs you could
with just a tiny bit of reasoning find
  :e3 car:owner "luc";
however this raises lots of nasty questions about what attributes are
part of the characterisation, with reasoning enabled you would just
find
 :e3 prov:wasCharacterisedBy :e3 and every attribute of :e3.

Probably more sensible is to make prov:wasCharacterisedBy a
subproperty of prov:wasComplementOf as that would be true anyway.
(right?)


But what if you are characterised by something with richer attributes?
The Abstract Provenance notation allows only flat attributes:

entity := entity ( identifier , [ attribute-values ] )
attribute-values := attribute-value |attribute-value , attribute-values
attribute-value := attribute : Literal

so:

  entity(e3, [ owner: [ name: "Luc", address: "Southampton" ] ])

would not be allowed. It says Literal, so not even URIs?

  entity(e3, [ owner: <http://id.ecs.soton.ac.uk/person/391> ])




In RDF this would sound like an arbitrary restriction, in theory in
RDF you could simply do:


:e3 a prov:Entity ;
  prov:wasCharacterisedBy [
     car:company "Toyota" ;
     car:model "Corolla" ;
     car:identification "1a" ;
     car:owner [
       foaf:name "Luc Moreau" ;
       foaf:based_near "Southampton"
     ]
  ] .

However this causes problems the moment you want to use URIs. If you
said simply
  car:owner <http://id.ecs.soton.ac.uk/person/391>
then we can't say anything more about
<http://id.ecs.soton.ac.uk/person/391> within this 'was characterised
by'.

If I did

  car:owner [
       owl:sameAs <http://id.ecs.soton.ac.uk/person/391> ;
       foaf:name "Luc Moreau" ;
       foaf:based_near "Southampton"
  ]

then that raises the sameAs issues as above.


Named graphs would allow you to declare multiple resources within the
characterisation. In TriG format:

{
  :e1 a prov:Entity ;
    prov:characterisedBy :e1Attrs ;
    car:hasColor "blue" .

  :e2 a prov:Entity ;
    prov:characterisedBy :e1Attrs, :e2Attrs ;
    car:hasColor "blue" .

  :e3 a prov:Entity ;
    prov:characterisedBy :e1Attrs, :e3Attrs ;
    car:hasColor "blue" .
}

:e1Attrs {
  :e1 car:company "Toyota" ;
       car:model "Corolla" ;
      car:identification "1a" .
}

:e2Attrs {
  :e1 car:owner "tom" .
}
:e3Attrs {
  :e1 car:owner <http://id.ecs.soton.ac.uk/person/391> .
  <http://id.ecs.soton.ac.uk/person/391> foaf:name "Luc Moreau" ;
       foaf:based_near "Southampton" .
}

One advantage of nested graphs is that it would be fairly easy to
query across all graphs to find both characterised and 'other'
properties of the car. Also above I allowed multiple graphs in
prov:characterisedBy to reuse common attributes.


I'm not sure if I like this named graph approach, because it seems to
say that everything in the characterisation graph must be true to be
able to identify :e1, and it's easy to include too much information
within the graph which is just auxiliary information.


Above, the car is owned by <http://id.ecs.soton.ac.uk/person/391>
called "Luc Moreau", based in Southampton - so if elsewhere
<http://id.ecs.soton.ac.uk/person/391> had the name "Professor Luc
Moreau" and was based in Boston, then that person would not
necessarily be the owner of the car, even if he has the same URI.
Perhaps this is good? After all the asserter did include those details
inside the prov:characterisedBy graph instead of in the default graph,
so they must be important.


(As always) I've argued with myself here, and I think that although
nested descriptions would be nice, it opens up a can of worms, and we
should keep the attributes flat, and rather say that you should
describe nested attributes by introducing a new prov:Entity with its
own characterisation:


:e3 a prov:Entity ;
  prov:wasCharacterisedBy [
     car:company "Toyota" ;
     car:model "Corolla" ;
     car:identification "1a" ;
     car:owner :luc
  ] .

:luc a prov:Entity;
  prov:wasCharacterisedBy [
       foaf:name "Luc Moreau" ;
       foaf:based_near "Southampton"
] .


Luc, note that I only used your foaf-stuff as examples here, I'm not
suggesting  you stop owning the car if you move out of Southampton!
But this would be a good way to talk about narrow compliments such as
"Luc as the professor teaching subject 101 in 2011 at Southampton
University"


-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Friday, 9 September 2011 13:32:52 UTC