Re: RDF* to RDF mapping (Re: First version of a test suite for RDF*) from Peter F. Patel-Schneider on 2020-11-24 (public-rdf-star@w3.org from November 2020)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Tue, 24 Nov 2020 11:45:31 -0500
To: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>, public-rdf-star@w3.org
Message-ID: <409988ca-ca58-7a64-a679-dc63ef93504b@gmail.com>
Any encoding will have to transform RDF terms so that there are no collisions
between them and the terms used in the encoding of embedded triples. 
Otherwise you end up with entailments like:

<< :a :b :c >> :d :e .

entails

_:x rdf*:subject* ":a"^^rdf*:term .

(Yes, this is a bit sloppy.)


If you are willing to expose the details of the encoding in this way, or
exclude certain RDF graphs from RDF*, then it is not necessary to transform
RDF terms.


And, of course, this is simply a specification of entailment in RDF*.  
Implementations are free to do something else that doesn't involve
transforming RDF terms.




peter



On 11/24/20 11:22 AM, Pierre-Antoine Champin wrote:
>
>
> On 24/11/2020 12:40, Peter F. Patel-Schneider wrote:
>> Consider this as a counter-example to your claim in [1] that
>>
>> 3. Otherwise, IMO, we need to somehow extend RDF semantics.
> Yes, I understood this was your intent :)
>> But, yes, the two approaches do share the idea of encoding subjects,
>> predicates, and objects to achieve referential opacity for IRIs and
>> literals.   My approach handles blank nodes in embedded triples, which is
>> where the bulk of the difficulty lies.
>
> Indeed.
>
> Now, as I mentioned before, the reason why I came to the conclusion that a
> semantic extension was required is that I wanted to enforce /in the
> semantic/ the consistency between the properties pointing to referentially
> opaque terms, and the corresponding properties pointing to their denotation.
>
> I think that your approach can do without this semantic extension because
> you prevent anyone (but your -* mapping) from using the rdf*: namespace. So
> those constraints are enforce structurally by the -* mapping, rather than by
> the semantics. Would you agree?
>
>> To make the approach work for all RDF* graphs care has to be taken that the
>> encoding terms do not occur in the RDF* graph and cause problems.  As I encode
>> literals as string literals and permit extended graphs where literals can be
>> subjects (and predicates) I had to map strings away from my encoding strings. 
>> The encoding could be changed to use a new datatype as you do and that would
>> obviate the need to map string literals away.  But there would still need to
>> be a mapping of IRIs.
>
> I understand why you want to map IRIs. I think however that this is not
> required for literals: what harm could happen if I wrote a triple involving
> "Xhttp://example.com/"? Since I can't use the rdf*: predicate, I don't think
> this would cause any problem... But that's a detail.
>
> As I understand, this mapping would have to be applied to all triples
> entering the system, regardless of the format. Not only would Turtle* files
> need to be transformed, /but also all vanilla RDF files/ (Turtle, RDF/XML,
> N-Triples...), just in case they would use the rdf*: namespace, to prevent
> them from breaking the internal structure of embedded triples.
>
> It feels like like a big price to pay to keep the same semantics. And I feel
> that implementers would rather add a special treatment for RDF* embedded
> triples, than transform any triple that enters their system, just to be able
> to represent embedded triples internally as plain RDF. And if I am right, I
> would rather extend the semantics accordingly.
>
>   pa
>
>> There is a bug in my proposal having to do with malformed literals.  To fix
>> it, add the encoding of malformed literals as the subject, predicate, or
>> object of the reified embedded triple, as needed.
>> peter
>>
>>
>>
>>
>> On 11/24/20 3:40 AM, Pierre-Antoine Champin wrote:
>>> Peter,
>>>
>>> thanks for this proposal; this is in fact quite similar to what I had in
>>> mind in my proposal in issue 37 [1].
>>>
>>> Two remarks:
>>>
>>> 1) Is it really necessary that N transforms literals? I find this quite
>>> intrusive that a simple graph such as:
>>>
>>>     :xavier :name "Xavier".
>>>
>>> would be encoded as another graph... (of course, you can use a less probable
>>> marker than "X", but still...).
>>>
>>> Since the only rdf*:subject*/rdf*:predicate/rdf*:object triples in the
>>> encoded graph are those generated by the mapping, I don't think values need
>>> to be "protected" by N.
>>>
>>> 2) In my idea, RDF semantics was extended to enforce some dependency between
>>> subject and subject*, between predicate and predicate*, and between object
>>> and object*, and to ensure that subject, predicate and object were
>>> functional (a statement describes only one "fact"). In other words, the
>>> following things would have been *inconsistent* (under RDF* entailment
>>> recognizing xsd:integer):
>>>
>>> # G1
>>>
>>> _:stmt rdf*:object* M("a"^^xsd:integer); rdf*:object 42.
>>>
>>> # G2
>>>
>>> _:stmt rdf*:object* M("42"^^xsd:integer); rdf*:object 43.
>>>
>>> Disclaimer: those semantic constraints make the malformed-literal-bnode test
>>> [2] problematic, I don't have a solution for that, and I am not even sure
>>> that this test should be kept (but I put it here for discussion). The
>>> dedicated semantics in my pull-request [3], however, passes this test, and
>>> the others (I think).
>>>
>>>
>>> [1] https://github.com/w3c/rdf-star/issues/37#issue-746823745
>>> [2]
>>> https://w3c.github.io/rdf-star/tests/semantics/manifest.html#malformed-literal-bnode
>>> [3] https://github.com/w3c/rdf-star/pull/19
>>>
>>>
>>> On 24/11/2020 00:59, Peter F. Patel-Schneider wrote:
>>>> Here is a mapping from generalized RDF* to generalized RDF that I believe
>>>> satisfies all the test cases:
>>>>
>>>>
>>>> Short and a bit sloppy version:
>>>>
>>>> Define M on RDF terms as follows:
>>>>    M(I) is the string of I for I an IRI
>>>>    M(L) is the long lexical form of L for L an RDF literal
>>>>
>>>> Let L be an injective mapping from RDF triples to "fresh" blank nodes.
>>>>
>>>> Define the mapping -* from a generalized RDF* graph G* to a generalized RDF
>>>> graph G as follows:
>>>>    Pick some embedded triple < s p o > such that none of s, p, and o are
>>>> embedded triples, replace all occurrences of it by L(< s p o >), and add the
>>>> triples
>>>>      < L(< s p o >) rdf:type rdf:Statement >
>>>>      < L(< s p o >) rdf:subject s > if s is not a malformed literal
>>>>      < L(< s p o >) rdf:subject* M(s) > if s is not a blank node
>>>>      < L(< s p o >) rdf:predicate p > if p is not a malformed literal
>>>>      < L(< s p o >) rdf:predicate* M(p) > if p is not a blank node
>>>>      < L(< s p o >) rdf:object o > if o is not a malformed literal
>>>>      < L(< s p o >) rdf:object* M(o) >  if o is not a blank node
>>>>    Finish when there are no embedded triples left.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Longer and more careful version:
>>>>
>>>> Let rdf* be an IRI namespace.
>>>>
>>>> Let M be an injective mapping from IRIs and RDF literals to strings starting
>>>> with "X"
>>>>
>>>> Let L be an injective mapping from RDF triples to blank nodes whose range
>>>> excludes an infinite number of blank nodes.
>>>>
>>>> Let N be an injective mapping from IRIs to IRIs, strings to strings, and blank
>>>> nodes to blank nodes that does not include any IRIs in the rdf* namespace or
>>>> strings that start with "X" or blank nodes in the range of L in its range.
>>>> (This is just used to ensure that the encoding of IRIs, strings, and blank
>>>> nodes do not clash with RDF terms in the graphs.)
>>>>
>>>> Define the mapping -* from a generalized RDF* graph G* to a generalized RDF
>>>> graph G as follows:
>>>>    First replace each RDF term by its mapping under N.
>>>>    Pick some embedded triple < s p o > such that none of s, p, and o are
>>>> embedded triples, replace all occurrences of it by L(< s p o >), and add the
>>>> triples
>>>>      < L(< s p o >) rdf*:type rdf*:Statement >
>>>>      < L(< s p o >) rdf*:subject s > if s is not a malformed literal
>>>>      < L(< s p o >) rdf*:subject* M(s) > if s is not a blank node
>>>>      < L(< s p o >) rdf*:predicate p > if p is not a malformed literal
>>>>      < L(< s p o >) rdf*:predicate* M(p) > if p is not a blank node
>>>>      < L(< s p o >) rdf*:object o > if o is not a malformed literal
>>>>      < L(< s p o >) rdf*:object* M(o) >  if o is not a blank node
>>>>    Finish when there are no embedded triples left.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> peter
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 11/23/20 6:30 AM, Pierre-Antoine Champin wrote:
>>>>> Hi all,
>>>>>
>>>>> as per my action assigned on our last call [1], I pushed a first version of
>>>>> a test-suite. The first goal is to provide a set of concrete examples of how
>>>>> RDF* implementations are expected to behave. An HTML rendering of that
>>>>> test-suite is available here:
>>>>>
>>>>> https://w3c.github.io/rdf-star/tests/semantics/manifest.html
>>>>>
>>>>> so that everyone can review, reference, and comment each of the test cases.
>>>>>
>>>>> If "entail" sounds esoteric to you, think of the 2nd graph in each test case
>>>>> as a SPARQL* ASK query, which is expected to return TRUE (or FALSE, in the
>>>>> case of "not entail").
>>>>>
>>>>> Coming next: a similar test suite for SPARQL*.
>>>>>
>>>>>    best
>>>>>
>>>>> [1] https://github.com/w3c/rdf-star/issues/40
>>>>> <https://github.com/w3c/rdf-star/issues/40>
>>>>>
Received on Tuesday, 24 November 2020 16:45:47 UTC