Re: RDF* to RDF mapping (Re: First version of a test suite for RDF*) from Pierre-Antoine Champin on 2020-11-25 (public-rdf-star@w3.org from November 2020)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Wed, 25 Nov 2020 13:27:19 +0100
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, public-rdf-star@w3.org
Message-ID: <54b54f3e-f4b4-f336-eeb9-d67270fca3fe@ercim.eu>
On 24/11/2020 17:45, Peter F. Patel-Schneider wrote:
> Any encoding will have to transform RDF terms so that there are no collisions
> between them and the terms used in the encoding of embedded triples.
> Otherwise you end up with entailments like:
>
> << :a :b :c >> :d :e .
>
> entails
>
> _:x rdf*:subject* ":a"^^rdf*:term .
>
> (Yes, this is a bit sloppy.)
>
>
> If you are willing to expose the details of the encoding in this way,

Yes, that's how I envisioned a "syntactic sugar" solution, i.e. Turtle* 
as a compact notation for something that could, otherwise, be written in 
plain Turtle.

>   or
> exclude certain RDF graphs from RDF*, then it is not necessary to transform
> RDF terms.
>
>
> And, of course, this is simply a specification of entailment in RDF*.
> Implementations are free to do something else that doesn't involve
> transforming RDF terms.
>
>
>
>
> peter
>
>
>
> On 11/24/20 11:22 AM, Pierre-Antoine Champin wrote:
>>
>> On 24/11/2020 12:40, Peter F. Patel-Schneider wrote:
>>> Consider this as a counter-example to your claim in [1] that
>>>
>>> 3. Otherwise, IMO, we need to somehow extend RDF semantics.
>> Yes, I understood this was your intent :)
>>> But, yes, the two approaches do share the idea of encoding subjects,
>>> predicates, and objects to achieve referential opacity for IRIs and
>>> literals.   My approach handles blank nodes in embedded triples, which is
>>> where the bulk of the difficulty lies.
>> Indeed.
>>
>> Now, as I mentioned before, the reason why I came to the conclusion that a
>> semantic extension was required is that I wanted to enforce /in the
>> semantic/ the consistency between the properties pointing to referentially
>> opaque terms, and the corresponding properties pointing to their denotation.
>>
>> I think that your approach can do without this semantic extension because
>> you prevent anyone (but your -* mapping) from using the rdf*: namespace. So
>> those constraints are enforce structurally by the -* mapping, rather than by
>> the semantics. Would you agree?
>>
>>> To make the approach work for all RDF* graphs care has to be taken that the
>>> encoding terms do not occur in the RDF* graph and cause problems.  As I encode
>>> literals as string literals and permit extended graphs where literals can be
>>> subjects (and predicates) I had to map strings away from my encoding strings.
>>> The encoding could be changed to use a new datatype as you do and that would
>>> obviate the need to map string literals away.  But there would still need to
>>> be a mapping of IRIs.
>> I understand why you want to map IRIs. I think however that this is not
>> required for literals: what harm could happen if I wrote a triple involving
>> "Xhttp://example.com/"? Since I can't use the rdf*: predicate, I don't think
>> this would cause any problem... But that's a detail.
>>
>> As I understand, this mapping would have to be applied to all triples
>> entering the system, regardless of the format. Not only would Turtle* files
>> need to be transformed, /but also all vanilla RDF files/ (Turtle, RDF/XML,
>> N-Triples...), just in case they would use the rdf*: namespace, to prevent
>> them from breaking the internal structure of embedded triples.
>>
>> It feels like like a big price to pay to keep the same semantics. And I feel
>> that implementers would rather add a special treatment for RDF* embedded
>> triples, than transform any triple that enters their system, just to be able
>> to represent embedded triples internally as plain RDF. And if I am right, I
>> would rather extend the semantics accordingly.
>>
>>    pa
>>
>>> There is a bug in my proposal having to do with malformed literals.  To fix
>>> it, add the encoding of malformed literals as the subject, predicate, or
>>> object of the reified embedded triple, as needed.
>>> peter
>>>
>>>
>>>
>>>
>>> On 11/24/20 3:40 AM, Pierre-Antoine Champin wrote:
>>>> Peter,
>>>>
>>>> thanks for this proposal; this is in fact quite similar to what I had in
>>>> mind in my proposal in issue 37 [1].
>>>>
>>>> Two remarks:
>>>>
>>>> 1) Is it really necessary that N transforms literals? I find this quite
>>>> intrusive that a simple graph such as:
>>>>
>>>>      :xavier :name "Xavier".
>>>>
>>>> would be encoded as another graph... (of course, you can use a less probable
>>>> marker than "X", but still...).
>>>>
>>>> Since the only rdf*:subject*/rdf*:predicate/rdf*:object triples in the
>>>> encoded graph are those generated by the mapping, I don't think values need
>>>> to be "protected" by N.
>>>>
>>>> 2) In my idea, RDF semantics was extended to enforce some dependency between
>>>> subject and subject*, between predicate and predicate*, and between object
>>>> and object*, and to ensure that subject, predicate and object were
>>>> functional (a statement describes only one "fact"). In other words, the
>>>> following things would have been *inconsistent* (under RDF* entailment
>>>> recognizing xsd:integer):
>>>>
>>>> # G1
>>>>
>>>> _:stmt rdf*:object* M("a"^^xsd:integer); rdf*:object 42.
>>>>
>>>> # G2
>>>>
>>>> _:stmt rdf*:object* M("42"^^xsd:integer); rdf*:object 43.
>>>>
>>>> Disclaimer: those semantic constraints make the malformed-literal-bnode test
>>>> [2] problematic, I don't have a solution for that, and I am not even sure
>>>> that this test should be kept (but I put it here for discussion). The
>>>> dedicated semantics in my pull-request [3], however, passes this test, and
>>>> the others (I think).
>>>>
>>>>
>>>> [1] https://github.com/w3c/rdf-star/issues/37#issue-746823745
>>>> [2]
>>>> https://w3c.github.io/rdf-star/tests/semantics/manifest.html#malformed-literal-bnode
>>>> [3] https://github.com/w3c/rdf-star/pull/19
>>>>
>>>>
>>>> On 24/11/2020 00:59, Peter F. Patel-Schneider wrote:
>>>>> Here is a mapping from generalized RDF* to generalized RDF that I believe
>>>>> satisfies all the test cases:
>>>>>
>>>>>
>>>>> Short and a bit sloppy version:
>>>>>
>>>>> Define M on RDF terms as follows:
>>>>>     M(I) is the string of I for I an IRI
>>>>>     M(L) is the long lexical form of L for L an RDF literal
>>>>>
>>>>> Let L be an injective mapping from RDF triples to "fresh" blank nodes.
>>>>>
>>>>> Define the mapping -* from a generalized RDF* graph G* to a generalized RDF
>>>>> graph G as follows:
>>>>>     Pick some embedded triple < s p o > such that none of s, p, and o are
>>>>> embedded triples, replace all occurrences of it by L(< s p o >), and add the
>>>>> triples
>>>>>       < L(< s p o >) rdf:type rdf:Statement >
>>>>>       < L(< s p o >) rdf:subject s > if s is not a malformed literal
>>>>>       < L(< s p o >) rdf:subject* M(s) > if s is not a blank node
>>>>>       < L(< s p o >) rdf:predicate p > if p is not a malformed literal
>>>>>       < L(< s p o >) rdf:predicate* M(p) > if p is not a blank node
>>>>>       < L(< s p o >) rdf:object o > if o is not a malformed literal
>>>>>       < L(< s p o >) rdf:object* M(o) >  if o is not a blank node
>>>>>     Finish when there are no embedded triples left.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Longer and more careful version:
>>>>>
>>>>> Let rdf* be an IRI namespace.
>>>>>
>>>>> Let M be an injective mapping from IRIs and RDF literals to strings starting
>>>>> with "X"
>>>>>
>>>>> Let L be an injective mapping from RDF triples to blank nodes whose range
>>>>> excludes an infinite number of blank nodes.
>>>>>
>>>>> Let N be an injective mapping from IRIs to IRIs, strings to strings, and blank
>>>>> nodes to blank nodes that does not include any IRIs in the rdf* namespace or
>>>>> strings that start with "X" or blank nodes in the range of L in its range.
>>>>> (This is just used to ensure that the encoding of IRIs, strings, and blank
>>>>> nodes do not clash with RDF terms in the graphs.)
>>>>>
>>>>> Define the mapping -* from a generalized RDF* graph G* to a generalized RDF
>>>>> graph G as follows:
>>>>>     First replace each RDF term by its mapping under N.
>>>>>     Pick some embedded triple < s p o > such that none of s, p, and o are
>>>>> embedded triples, replace all occurrences of it by L(< s p o >), and add the
>>>>> triples
>>>>>       < L(< s p o >) rdf*:type rdf*:Statement >
>>>>>       < L(< s p o >) rdf*:subject s > if s is not a malformed literal
>>>>>       < L(< s p o >) rdf*:subject* M(s) > if s is not a blank node
>>>>>       < L(< s p o >) rdf*:predicate p > if p is not a malformed literal
>>>>>       < L(< s p o >) rdf*:predicate* M(p) > if p is not a blank node
>>>>>       < L(< s p o >) rdf*:object o > if o is not a malformed literal
>>>>>       < L(< s p o >) rdf*:object* M(o) >  if o is not a blank node
>>>>>     Finish when there are no embedded triples left.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> peter
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 11/23/20 6:30 AM, Pierre-Antoine Champin wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> as per my action assigned on our last call [1], I pushed a first version of
>>>>>> a test-suite. The first goal is to provide a set of concrete examples of how
>>>>>> RDF* implementations are expected to behave. An HTML rendering of that
>>>>>> test-suite is available here:
>>>>>>
>>>>>> https://w3c.github.io/rdf-star/tests/semantics/manifest.html
>>>>>>
>>>>>> so that everyone can review, reference, and comment each of the test cases.
>>>>>>
>>>>>> If "entail" sounds esoteric to you, think of the 2nd graph in each test case
>>>>>> as a SPARQL* ASK query, which is expected to return TRUE (or FALSE, in the
>>>>>> case of "not entail").
>>>>>>
>>>>>> Coming next: a similar test suite for SPARQL*.
>>>>>>
>>>>>>     best
>>>>>>
>>>>>> [1] https://github.com/w3c/rdf-star/issues/40
>>>>>> <https://github.com/w3c/rdf-star/issues/40>
>>>>>>
Attachments

application/pgp-keys attachment: OpenPGP_0x9D1EDAEEEF98D438.asc
Received on Wednesday, 25 November 2020 12:27:25 UTC