Re: RDF* to RDF mapping (Re: First version of a test suite for RDF*) from Pierre-Antoine Champin on 2020-11-25 (public-rdf-star@w3.org from November 2020)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Wed, 25 Nov 2020 19:05:03 +0100
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, public-rdf-star@w3.org
Message-ID: <ad814803-1a62-2db9-564d-3fddfaa89865@ercim.eu>
On 25/11/2020 16:54, Peter F. Patel-Schneider wrote:

> So you want something like the following:
>
> Define the following datatypes:
>
> rdf:IRI a datatype for absolute IRIs
>    The lexical-to-value map is from https://tools.ietf.org/html/rfc3987
>    No unescaping is performed.
> rdf:literal a datatype for RDF literals
>    Well-typed lexical forms start with a Unicode string enclosed in " with
> interior " doubled, followed by an optional language tag preceeded by @,
> followed by an IRI preceeded by <.
>    The lexical-to-value mapping takes the two or three parts of the lexical
> form and tries to constructs an RDF literal in the obvious way.  If the result
> is not a valid RDF literal then the lexical form is not in the lexical space
> of the datatype.
>    No unescaping is performed except for undoubling of "" in the lexical form.
> This is not the normal writing of RDF literals and was chosen to reduce the
> need for unescaping and to have a unique representation of RDF literals.
>
> Add rdf:subject*, rdf:predicate*, and rdf:object* to the RDF reification
> vocabulary.  These three predicates are used to hold the syntactic form of
> reified statements when this is desired.
>
> RDF* is then RDF with the addition of rdf:IRI and rdf:literal as recognized
> datatypes.
>
> That's it (for RDF*)!
>
>
>
> Extend Turtle to Turtle* by adding the recursive << s p o >> embedded triples
> construct.
>
> This construct is processed as follows:
>
> Let L be an injective mapping from RDF triples to blank nodes that are
> different from blank nodes resulting from other Turtle constructs.
>
> Define M(i) as the RDF literal with datatype rdf:IRI and lexical form i for i
> an IRI
> and M(l) as the RDF literal with datatype rdf:literal and the lexical form
> corresponding to l for l an RDF literal
>
> Process an embedded triple << s p o >> construct as follows:
> If s, p, or o is an embedded triple, process it before processing << s p o >>.
> Otherwise, replace << s p o >> by L(<s p o>) and add the following triples to
> the graph being created:
>      < L(< s p o >) rdf:type rdf:Statement >
>      < L(< s p o >) rdf:subject s >
>      < L(< s p o >) rdf:subject* M(s) > if s is not a blank node
>      < L(< s p o >) rdf:predicate p >
>      < L(< s p o >) rdf:predicate* M(p) > if p is not a blank node
>      < L(< s p o >) rdf:object o > if o is not a malformed literal
>      < L(< s p o >) rdf:object M(o) > if o is a malformed literal
>      < L(< s p o >) rdf:object* M(o) >  if o is not a blank node

the second to last line is wicked :-)

In a way, it strikes me as "wrong" because if the value of rdf:object* 
has no denotation, then rdf:object should have no value at all...

On the other hand, it fixes the problem with the test-case 
malformed-literal-bnode. Let me state again that I included this test, 
because this is an extension of how bnodes work elsewhere in RDF (you 
can always replace a term with a bnode); but I did it reluctantly, for 
the same reason that this tricks seems "wrong" to me.


Then, the question remains: what about people messing with the vocabulary?

How should the following graph be handled?

     << :a :b "c" >> rdf:object "d" .

We may decide that it is satisfiable (bite the syntactic sugar bullet 
completely), but that makes it impractical to optimize the handling of 
"correct" embedded triples, as pointed out by Andy: 
https://github.com/w3c/rdf-star/issues/37#issuecomment-730670232 . I am 
not sure implementers want to go that way. As I understand, RDF* was 
created precisely to avoid the verbosity of standard reification.

Or we may decide that this graph is unsatisfiable, per a special 
semantics assigned to rdf:object (and friends) in any RDF* 
interpretation (call this approach "syntactic sweetener", as it is not 
real syntactic sugar, since it changes the semantics). That way, 
implementers can assume that the reification properties will never be 
messed up, and may optimize them internally. But that makes RDF*'s 
semantics significantly more complex that RDF's.

Or we go back to extend the abstract syntax and the semantics, as per 
the current state of the report (which, by the way, passes all test 
including the infamous malformed-literal-bnode, as far as I can tell). 
In that case, of course, the graph above would be satisfiable, because 
rdf:object would have no special semantics w.r.t. embedded triples.

That's how I see our options now.

But I am sure you can see other options, or other pros and cons to the 
ones above... :-)

   pa

> peter
>
>
>
> On 11/25/20 7:27 AM, Pierre-Antoine Champin wrote:
>> On 24/11/2020 17:45, Peter F. Patel-Schneider wrote:
>>> Any encoding will have to transform RDF terms so that there are no collisions
>>> between them and the terms used in the encoding of embedded triples.
>>> Otherwise you end up with entailments like:
>>>
>>> << :a :b :c >> :d :e .
>>>
>>> entails
>>>
>>> _:x rdf*:subject* ":a"^^rdf*:term .
>>>
>>> (Yes, this is a bit sloppy.)
>>>
>>>
>>> If you are willing to expose the details of the encoding in this way,
>> Yes, that's how I envisioned a "syntactic sugar" solution, i.e. Turtle* as a
>> compact notation for something that could, otherwise, be written in plain
>> Turtle.
>>
>>>    or
>>> exclude certain RDF graphs from RDF*, then it is not necessary to transform
>>> RDF terms.
>>>
>>>
>>> And, of course, this is simply a specification of entailment in RDF*.
>>> Implementations are free to do something else that doesn't involve
>>> transforming RDF terms.
>>>
>>>
>>>
>>>
>>> peter
>>>
Attachments

application/pgp-keys attachment: OpenPGP_0x9D1EDAEEEF98D438.asc
Received on Wednesday, 25 November 2020 18:05:10 UTC