Re: Consolidating triple/edges from Thomas Lörtsch on 2023-12-18 (public-rdf-star-wg@w3.org from December 2023)

From: Thomas Lörtsch <tl@rat.io>
Date: Mon, 18 Dec 2023 22:06:01 +0100
To: Olaf Hartig <olaf.hartig@liu.se>
Cc: "andy@apache.org" <andy@apache.org>, "public-rdf-star-wg@w3.org" <public-rdf-star-wg@w3.org>
Message-Id: <6D435CFB-876E-42F8-AD0C-5A281B88135F@rat.io>
> On 18. Dec 2023, at 15:33, Olaf Hartig <olaf.hartig@liu.se> wrote:
> 
> On Mon, 2023-12-18 at 00:09 +0100, Thomas Lörtsch wrote:
>> On 15. Dec 2023, at 15:33, Olaf Hartig <olaf.hartig@liu.se> wrote:
>>> On Fri, 2023-12-15 at 13:57 +0100, Thomas Lörtsch wrote:
>>>>> On 15. Dec 2023, at 00:01, Olaf Hartig <olaf.hartig@liu.se>
>>>>> wrote:
>>>>> [...]
>>>>> If so, what would you expect to happen if someone writes the
>>>>> following?
>>>>> 
>>>>>  :T rdfx:typeOf << :s1 :p1 :o1 >> .
>>>>>  :T rdfx:typeOf << :s2 :p2 :o2 >> .
>>>>> 
>>>>> Which triple type would :T denote in this case (if any)?
>>>> 
>>>> In the strictly monotonic world of RDF :T would then probably
>>>> refer
>>>> to a little graph term.
>>> 
>>> I don't think this works, but I also think there is no point going
>>> further into this discussion for now.
>> 
>> Not attempting to force a discussion on you, but
>> 
>>    :X a :House, :Bird .
>> 
>> is perfectly legal RDF. RDF is not especially safe.
> 
> Okay, I backtrack. It certainly depends on how you define the semantics
> of rdfx:typeOf.
> 
> Based on how you introduced it, I was assuming that the meaning you
> have in mind for it is to state that the subject of an rdfx:typeOf
> triple (i.e., :T in the examples) is "a reference to the [triple] type"
> captured by the triple term in the object position of the triple. Based
> on this meaning, I see the two lines above as an inconsistency.

I don’t think so. Athough I don’t know anything that qualifies as a house-bird, it IMO is not an _inconsistent_ concept as it is not in itself contradictory or per se unsatisfiable. But I’ll let some logicians make the call.

> With your statement above (":T would then probably refer to a little
> graph term") you are implying a different meaning of rdfx:typeOf; being
> a reference to a triple term/type and being a reference to graph term
> are different thing (at least, I see it that way). Moreover, if an
> rdfx:typeOf triple is meant to state that the subject of this triple is
> a reference to a graph term, then I would expect the object of this
> triple to be that graph term (as a whole), rather than just one of the
> triples that is part of the graph term.

I would expect that too, and I would expect that a language provides me with the means to unambiguously express something that matches that expectation, eg:

    :T rdfx:typeOf << :s1 :p1 :o1. :s2 :p2 :o2 >> 

But if the foreseeable and legitimate needs and expectations of users are not met, they will create cow paths like

    :T rdfx:typeOf ( << :s1 :p1 :o1 >>, << :s2 :p2 :o2 >> ) .

because that’s just what humans do when they need a tool that isn’t provided: they get creative and create the missing tool from what is at their disposal. And frankly: I love that, even if it bends semantic quite a bit. It’s the tool providers, it’s us who are to blame if something like that happens in practice, not the users. And it will happen.


>> [...]
>>>>> In other words, should the two occurances of the
>>>>> subexpression  << :s :p :o >>  in the following two lines be
>>>>> understood to "implicitly reference" the same token or two
>>>>> different tokens?
>>>>> 
>>>>>  << :s :p :o >> :p2 :o2 .
>>>>>  << :s :p :o >> :p3 :o3 .
>>>> 
>>>> Always a different one, and that’s indeed crucial (I pointed that
>>>> out in the nested graph proposal too).
>>> 
>>> In this case, I cannot see how it would be possible to make more
>>> than one annotation statement for each token? (If you attempt to
>>> answer this question based on an example, please write the example
>>> either in terms of the abstract syntax or the N-Triple-star format,
>>> but not in Turtle-star.)
>> 
>> I was sloppy in this example, but I seem to remember that in the
>> context of the whole mail it might have been clearer. The idea was
>> (and is) that providing no identifier makes is interpreted as "i
>> don’t care about the name of this) and a blank node identifier is
>> created. That motivates the need to define a way to refer to the type
>> (above). I just realize the type reference, given in the syntax
>> above, could then be interpreted as denoting a type of occurrence -
>> that would have to be explained away…
>> 
>>    << :s :p :o >> :p2 :o2 .
>>    << :s :p :o >> :p3 :o3 ;
>>                   :p4 :o4         # multiple annotations only as
>> trees
>>                                   # if no explicit ID is provided
>> 
>> is then the same as
>> 
>>    << _:b1 | :s :p :o >> :p2 :o2 .
>>    << _:b2 | :s :p :o >> :p3 :o3 ;
>>                          :p4 :o4 .
> 
> No, I would not say that these are the same. In contrast, the first of
> these two snippets of Turtle is the same as the following.
> 
>   << :s :p :o >> :p2 :o2 .
>   << :s :p :o >> :p3 :o3 .
>   << :s :p :o >> :p4 :o4 .

At present, yes, but not in the proposal I’m making.

Thomas

> Best,
> Olaf
> 
> 
>> N-Triples-star might look like this:
>> 
>>    _:b1 rdfx:occurrenceOf << <http://example/s> <http://example/p> <
>> http://example/o> >>.
>>    _:b2 rdfx:occurrenceOf << <http://example/s> <http://example/p> <
>> http://example/o> >>
>>    _:b1 <http://example/p2> <http://example/o2> .
>>    _:b2 <http://example/p3> <http://example/03> .
>>    _:b2 <http://example/p4> <http://example/04> .
>> 
>> 
>> Best,
>> Thomas
>> 
>> 
>>> Thanks,
>>> Olaf
>>> 
>>>> The preference for types in the semantics of RDF might be
>>>> characterized as early optimization: understandable for an
>>>> integration focused technology, and well understood in logic.
>>>> However, the unification of tokens into types risks losing
>>>> context
>>>> (and annotations). It can just as well be postponed to querying
>>>> (DISTINCT) or to a concious data management operation (spring
>>>> cleaning in the dataset). The one thing that one doesn’t want to
>>>> lose
>>>> when working with data is … data. So late unification of tokens
>>>> into
>>>> types has some merit.
>>>> 
>>>> Best,
>>>> Thomas
>>>> 
>>>> 
>>>> 
>>>>> Thanks,
>>>>> Olaf
>>>>> 
>>>>> 
>>>>>> and may either provide a custom name or will be provided with
>>>>>> a
>>>>>> new blank node to name the reference.
>>>>>> 
>>>>>> 
>>>>>> ## Syntax
>>>>>> 
>>>>>> We should try to make the naming syntactically as uniform and
>>>>>> predicatble as possible. The nested graph proposal uses a
>>>>>> pair of
>>>>>> square brackets [] prepending constructs to indicate the
>>>>>> name. If
>>>>>> a custom name is given it is entered into that pair. That
>>>>>> violates the rules for [] in Turtle/TriG but seems to parse
>>>>>> unambiguously.  Not providing any name syntactically and
>>>>>> still
>>>>>> assuming the presence of a blank node name is a bit more
>>>>>> tricky.
>>>>>> 
>>>>>>   :liz :spouse :dick [id:1]{| :start 1964; :end 1974 |} .
>>>>>>   :liz :spouse :dick {| :start 1975; :end 1976 |} .       #
>>>>>> _:id2
>>>>>> 
>>>>>>   [] << :s :p :o >> :start 1964 ; :end 1974 .
>>>>>> 
>>>>>> In any case: if it doesn’t parse without a prepended name,
>>>>>> then
>>>>>> prepend a [].
>>>>>> 
>>>>>> 
>>>>>> ## Unasserted vs Asserted
>>>>>> 
>>>>>> Why not define a property that not only references a token,
>>>>>> but
>>>>>> also creates the triple, e.g.:
>>>>>> 
>>>>>>  :liz :spouse :dick [id:1]{| :start 1964; :end 1974 |} .
>>>>>> 
>>>>>> mapping to
>>>>>> 
>>>>>>   id:1 rdfx:assertionOf << :liz :spouse :dick >>
>>>>>>       :start 1964; :end 1974 .
>>>>>> 
>>>>>> instead of
>>>>>> 
>>>>>>   id:1 rdfx:occurrenceOf << :liz :spouse :dick >>
>>>>>>       :start 1964; :end 1974 .
>>>>>>   :liz :spouse :dick .
>>>>>> 
>>>>>> That way we get identifiers for each triple occurrence
>>>>>> together
>>>>>> with the triple being asserted - direct identification, not
>>>>>> earyl
>>>>>> optimization. See above why that is important.
>>>>>> 
>>>>>> All this unasserted business may seem a bit eccentric, but
>>>>>> it’s
>>>>>> the key to any sort of configurable semantics like quotation
>>>>>> etc.
>>>>>> It therefore has huge potential - if done right.
>>>>>> 
>>>>>> 
>>>>>> ## SPARQL sugar
>>>>>> 
>>>>>> You compare the occurence-based shortcut relation to
>>>>>> syntactic
>>>>>> sugar for RDF lists, which is fine, except that querying
>>>>>> those
>>>>>> lists is a hardship. Same for RDF/XML’s syntactic support for
>>>>>> RDF
>>>>>> standard reification. Any kind of RDF syntactic sugar also
>>>>>> needs
>>>>>> proper support in SPARQL to be effective in practice.
>>>>>> 
>>>>>> 
>>>>>> ## Triple terms vs Graph terms
>>>>>> 
>>>>>> Just for completeness: all for this can easily be expanded to
>>>>>> graph terms. The syntax
>>>>>> 
>>>>>>   []{ :s :p :o. :u :v :w }
>>>>>> 
>>>>>> is explored in the nested graph proposal.
>>>>>> 
>>>>>> 
>>>>>> ## Graph Terms vs Named Graphs
>>>>>> 
>>>>>> I like Adrians example [0] of a complicated named graph based
>>>>>> application and I’m taking that serious. However it should
>>>>>> also
>>>>>> be clear that triple/graph terms in the end are always stored
>>>>>> in
>>>>>> a way very similar to named graphs. There is just no other
>>>>>> way in
>>>>>> a quad based system. Triple/graph terms can be represented as
>>>>>> named graphs, named graphs can be represented as graph terms.
>>>>>> It’s a practical question of how to encode
>>>>>> belonging/membership:
>>>>>> syntactically as nested graphs, via a new term type as in
>>>>>> RDF-
>>>>>> star that transforms a triple into a term at the surface (but
>>>>>> NOT
>>>>>> in the underlying storage layer, for obvious performance
>>>>>> reasons), via explicit binding relations as Niklas proposes
>>>>>> [1]
>>>>>> (and as Dydra implements nested graphs), etc. The main
>>>>>> question
>>>>>> is how to ensure that those binding relations don’t get lost
>>>>>> in
>>>>>> the process, but that IMHO is true for any solution. Nested
>>>>>> graphs can be serialized to graph terms, which are just an
>>>>>> extension of triple terms. That requires an additional en/de-
>>>>>> coding step to fit them into an environment that reserves
>>>>>> named
>>>>>> graphs to its own purposes. That extra step is the price that
>>>>>> those applications have to pay for being so particular about
>>>>>> their use of named graphs. That’s only fair, and probably
>>>>>> still
>>>>>> economical for them.
>>>>>> 
>>>>>> 
>>>>>> ## Term types vs Datatypes
>>>>>> 
>>>>>> The most fundamental grievance with RDF-star is the
>>>>>> introduction
>>>>>> of a new term type when a new datatype of type RDF/TTL would
>>>>>> suffice. All I proposed above is readily imlpementable in the
>>>>>> nested graph proposal, which does map to TriG and regular N-
>>>>>> quads
>>>>>> and such a datatype (and even Turtle and N-triples, but
>>>>>> that’s
>>>>>> another discussion).
>>>>>> 
>>>>>> 
>>>>>> Best,
>>>>>> Thomas
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> [0]
>>>>>> https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Dec/0019.html
>>>>>> [1]
>>>>>> https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Nov/0032.html
>>>>>> 
>>>>>> 
>>>>>>>  Andy
>>>>>>> 
>>>>>>> [1]
>>>>>>> https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Dec/0024.html
>>>>>>> 
>>>>>>> [2]
>>>>>>> https://w3c.github.io/rdf-concepts/spec/#section-triples
>>>>>>>  (as of 2023-12-10)
>>>>>>>
Received on Monday, 18 December 2023 21:06:16 UTC