Re: publicly available RDF* datasets

A change of perspective...

> On 2. Sep 2020, at 23:18, Patrick J Hayes <phayes@ihmc.us> wrote:
> 
> Hi Thomas
> 
> So, you are trying to represent n-ary relationships in RDF*. How to encode n-ary relationships in what is essentially a binary language (RDF properties) is an issue that goes back at least to the 1960s and arguably to Pierce’s work in around 1885, so there are plenty of ideas out there. If, as here, we want the n in ’n-ary’ to be extendable, there is really only one good way to do it, which is to reify the ‘fact’ of the relation’s holding and connect it to its relational arguments (in this case, JohnDoe, Bar, 1996, 2002 and the rank number) by suitable binary relations, RDF triples in this case. In plain RDF that would look something like 
> 
> _:x rdf:type PresidencyOf .
> _:x subject JoeDoe .
> _:x object Bar .
> _:x startDate 1996 .
> _:x endDate 2002 .
> _:x rankOrdinal 2 .
> 
> And this _:x thing is variously called a situation, a state of affairs (SOA), an event, a history, an occurrence, a circumstance, a happening and probably a lot of other things depending on which backround literature you find the idea described in. In defence intelligence work it is often called the 5-W because information is organized around these and the basic questions are What happened, Who was involved, Why it happened, Where it happened and When it happened. 
> 
> The trouble with this is not that it doesn't work (it's about the only way that does work) but that it requires simple binary facts to also be written in this peculiar way, so instead of the single triple 
> 
> JoeDoe presidentOf Bar .
> 
> we would have to write the first three triples from the larger set. And nobody wants to have to do this for every simple fact.  
> 
> So, this where RDF* might be useful, since we can keep the basic triple form for a simple unadorned fact and put the rest of it into an annotation. But we still need the SOA as a first-class entity so all the other parts can be hung on it. You have already described this, though I would change annotatedBy to something like hasCircumstance or trueInSOA: 
> 
>> <JohnDoe presOf Bar> trueInSOA _:x .
>> _:x rdf: type StateOfAffairs .
>> _:x startDate 1996 .
>> _:x endDate 2002 .
>> _:x rank 2 .
>> 
>> <JohnDoe presOf Bar> trueInSOA _:y .
>> _:y rdf: type StateOfAffairs .
>> _:y startDate 2008 .
>> _:y endDate 2012 .
>> _:y rank 4 .
>> _:y replaces AliceDoe .
> 
> And I think this now works reasonably well. (You can routinely omit the rdf:type triples in practice because an RDFS or OWL ontology for trueInSOA would restrict its range to the appropriate class.)
> 
> The reason for the name change is that this _:x thing isn't an annotation, but a real thing in the world being described. Although this uses the same machinery, this is not adding metadata to data, it is extending the data in a new way. So calling it an ‘annotation' is misleading. 

One could also understand the whole n-ary relation as a detailed description of something (citing from above)

> _:x rdf:type PresidencyOf .
> _:x subject JoeDoe .
> _:x object Bar .
> _:x startDate 1996 .
> _:x endDate 2002 .
> _:x rankOrdinal 2 .

whereas the basic triple 

>> JohnDoe presOf Bar

is just an abbreviated form derived from that complete description
and (again citing from above)

>> <JohnDoe presOf Bar> trueInSOA _:x .


is kind of a backtracing mechanism that connects the basic fact with the more detailed description from which it was derived. 

The basic, core fact facilitates navigation and orientation in large, unknown dataspaces. The full description that it is derived from provides more detail on demand. Of course one wouldn’t want to derive all possible triple permutations from some complex n-ary relation but for the essential core facts it would make sense. The relationship between derived fact and full situation would not be that of an annotation.


Thomas (L)



> You say: 
>> 
>> If we create multiple blank nodes per separate fact, I believe we
>> would come back to the initial problem.
>> E.g with this:
>> 
>> <JohnDoe presOf Bar> during _:x .
>> _:x rdf: type TimePeriod .
>> _:x startDate 1996 .
>> _:x endDate 2002 .
>> <JohnDoe presOf Bar> rank 2 .
> 
> But why is the subject of this last triple not _:x?  It should be. By mentioning the <JohnDoe presOf Bar> twice, you have opened up the possibility of this being a different state of affairs being talked about. 
> 
>> <JohnDoe presOf Bar> during _:y .
>> _:y rdf: type TimePeriod .
>> _:y startDate 2008 .
>> _:y endDate 2012 .
>> <JohnDoe presOf Bar> rank 4 .
>> 
>> I don't see a way for a query to get the proper (position, time
>> period, rank) tuples for JohnDoe.
> 
> Not unless the data is properly organized, no. But then that is a general issue with data. But at least we now have the possibility of doing it right and avoiding confusion. 
> 
> Pat
> 
>> 
>> There are indeed much better ways to encode these facts but my aim was
>> to find a proper encoding of the existing Wikidata ontology in RDF*,
>> not building another ontology based on Wikidata content.
>> 
>> Do you see another way?
>> 
>> Best,
>> 
>> Thomas
>> 
>> 
>> 
>> Le mer. 2 sept. 2020 à 08:07, Patrick J Hayes <phayes@ihmc.us> a écrit :
>>> 
>>> 
>>> 
>>>> On Sep 1, 2020, at 8:41 AM, Thomas Pellissier Tanon <thomas@pellissier-tanon.fr> wrote:
>>>> 
>>>> Hi!
>>>> 
>>>>> I don't know if anyone has attempted it yet, but an RDFStar version of Wikidata could be very interesting. There are a lot of per-factoid annotations.
>>>> 
>>>> I had a look at it while building YAGO 4. There are two challenges with Wikidata mapping to RDF*:
>>>> 
>>>> 1. Different statements could have the same "main triple". For example, Wikidata could have a first statement stating that JohnDoe has been the president of Bar between 1996 and 2002 and an other statement stating he has been the president of Bar between 2008 and 2012. A simple RDF* encoding would lead to:
>>>> <JohnDoe presidentOf Bar> startDate 1996 .
>>>> <JohnDoe presidentOf Bar> endDate 2002 .
>>>> <JohnDoe presidentOf Bar> startDate 2008 .
>>>> <JohnDoe presidentOf Bar> endDate 2012 .
>>>> This encoding might lead query and reasoning systems to assume that JohnDoe has been the president of Bar between 1996 and 2012, fact that is wrong.
>>> 
>>> But that would be a really bad encoding which should never have been considered in the first place. At this point one needs just a little experience with ontology design. This would work:
>>> 
>>> <JohnDoe presOf Bar> during _:x .
>>> _:x rdf: type TimePeriod .
>>> _:x startDate 1996 .
>>> _:x endDate 2002 .
>>> 
>>> or of course a skolemization of it to avoid the bnode. Better still would be a typed literal using a datatype of time periods, if only we had such a thing.
>>> 
>>> Pat Hayes
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> 2. Wikidata contains "deprecated" statements that should not be asserted as facts. For example, we could have in an RDF*-like syntax:
>>>> <JohnDoe presidentOf Bar> prov:wasDerivedFrom Source ; wikibase:rank wikibase:Deprecated .
>>>> In this case the fact "JohnDoe presidentOf Bar" should not be asserted by itself.
>>>> So, a RDF*-Wikidata would only be valid in "SA" mode and not in "PG" mode.
>>>> 
>>>> Thomas
>>>> 
>>>> 
>>>> Le mar. 1 sept. 2020 à 10:54, Dan Brickley <danbri@google.com> a écrit :
>>>>> On Tue, 1 Sep 2020 at 08:53, Jeen Broekstra <jb@metaphacts.com> wrote:
>>>>>> Hi folks,
>>>>>> Does anyone have any pointers to publicly available datasets that make use of RDF*?
>>>>>> I am aware that Yago 4 makes some limited use of RDF* annotations, but I was curious if there are any other good examples that people use for testing, demonstration, or even production use.
>>>>> I don't know if anyone has attempted it yet, but an RDFStar version of Wikidata could be very interesting. There are a lot of per-factoid annotations.
>>>>> https://www.mediawiki.org/wiki/Wikibase/DataModel
>>>>>> Regards,
>>>>>> Jeen
>>>>>> --
>>>>>> Dr Jeen Broekstra (he, him)
>>>>>> principal software engineer
>>>>>> jb@metaphacts.com
>>>>>> www.metaphacts.com
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
> 
> 

Received on Thursday, 3 September 2020 16:56:45 UTC