Re: An outline of RDFn -- RDF with (auto- and custom-) names from Thomas Lörtsch on 2023-11-27 (public-rdf-star-wg@w3.org from November 2023)

From: Thomas Lörtsch <tl@rat.io>
Date: Mon, 27 Nov 2023 11:45:42 +0100
To: public-rdf-star-wg@w3.org, Olaf Hartig <olaf.hartig@liu.se>, "public-rdf-star-wg@w3.org" <public-rdf-star-wg@w3.org>, "souripriya.das@oracle.com" <souripriya.das@oracle.com>
Message-ID: <7BD77B1F-AD2F-4EB4-A355-927D143FBEF1@rat.io>
Olaf,

you should acknowledge that RDF-star is only defined on types of statements, but actual use cases in there overwhelming majority (including the "seminal example" that you wrongly used in your papers on RDF*) work on tokens. A mechanism to define a reference to such a token is mentioned in the RDF-star CG report only in the most informal way possible.

Ergo the RDF-star formalization is irrelevant in practice and for actual practical applications using some derivate of ':occurrenceOf'  no formalization exists, not even an informal standard vocabulary - years after the problem has been pointed out to the CG. In that respect RDFn is definitely one or two steps ahead of RDF-star. 

Also, I think Souri is right in another way: RDFn provides quins out of the box, RDF-star handwavingly resorts to out-of-band means to define a token identifier, but both approaches add a fifth element to the subject, predicate, object and graph that we already have.
However, that is a problem with both RDFn and RDF-star, that a named graph based approach can avoid, to considerable benefit of implementors as well as users. 

Thomas 

Am 27. November 2023 10:34:45 MEZ schrieb Olaf Hartig <olaf.hartig@liu.se>:
>Hi Souri,
>
>I don't think your claim that "RDFn = RDF-star" is true (assuming "="
>means something like: is the same as).
>
>In your previous email you introduce the notion of an "RDFn statement"
>about which you say the following.
>
>"""
>An RDFn statement is uniquely identified using the tuple <s, p, o, g,
>n>, where the component n is the "name" of the statement. (The
>components s, p, and o represent the subject, predicate, and object,
>respectively. The component g, representing graph name, is non-NULL
>only for quads and will not be used in the examples below.)
>"""
>
>First of all, notice that you are not explicitly saying what an RDFn
>statement actually is; you are only saying how it is uniquely
>identified. Moreover, you do not specify what kind of a thing this
>"component n" is (neither do you explicitly say what kinds of things
>the components s, p, o, and g are, respectively). Also, I wonder how
>the notion of "is uniquely identified" would be captured explicitly as
>an extension of the abstract syntax of RDF (or are you proposing to
>change the abstract syntax such that it is based on such 5-tuples
>rather than RDF triples??).
>
>Now, regarding your claim, your notion of an RDFn statement is not a
>concept of RDF-star [1]. Also, among the concepts of RDF-star, there is
>no such thing as what you informally call ''the "name" of the
>statement,'' and neither is there any notion of NULL in RDF-star
>(whereas you seem to assume such a notion for RDFn).
>
>Best,
>Olaf
>
>[1] https://www.w3.org/2021/12/rdf-star.html#concepts
>
>
>On Mon, 2023-11-27 at 03:08 +0000, Souripriya Das wrote:
>> Since I did not hear any comments on RDFn during the first half of
>> our last meeting that I was able to attend (except, maybe, Gregg
>> might have said something right at the beginning but I had audio
>> issues on my side), I thought it may be helpful to mention below a
>> few high-level points about RDFn and how it is related to RDF-star
>> concepts and syntax: ("statement" here simply means "a triple or
>> quad"):
>> 
>> 1) RDFn = RDF-star (which, I think, uses implicit naming in some
>> sense, with << s p o >> as the name) + explicit naming (using IRIs as
>> custom names).
>> 
>> 2) RDFn (with appropriate syntactic shortcut) would appear exactly
>> the same as RDF-star to a user who does not use multi-edges or
>> statement-sets.
>> 
>> 3) RDFn does not change anything regarding how users work with
>> default graph and named graphs today.
>> 
>> 4) RDFn requires use of explicit naming if user needs to store multi-
>> edges. For modeling multi-edges, user does not need to introduce new
>> triples or quads with special properties like :isOccurrenceOf or
>> :hasOccurrence.
>> 
>> 5) RDFn requires use of explicit naming for modeling statement-sets
>> as well. A statement-set in RDFn can include (asserted or unasserted)
>> triples from the default graph and the named graphs. The custom-name
>> of a statement-set can be used for making statements about it.
>> 
>> Thanks,
>> Souri.
>> From: Souripriya Das <souripriya.das@oracle.com>
>> Sent: Wednesday, November 15, 2023 9:39 PM
>> To: RDF-star WG <public-rdf-star-wg@w3.org>
>> Subject: [External] : An outline of RDFn -- RDF with (auto- and
>> custom-) names
>>  
>> As the group tries to decide on options, the following outline of a
>> revised version of RDFn may be useful for discussions.
>>  
>> Core concepts and ideas in RDFn: 
>> An RDFn statement is uniquely identified using the tuple <s, p, o, g,
>> n>, where the component n is the "name" of the statement. (The
>> components s, p, and o represent the subject, predicate, and object,
>> respectively. The component g, representing graph name, is non-NULL
>> only for quads and will not be used in the examples below.)
>> Example 1: An RDFn statement, with ex:jSm as its name, representing
>> the tuple <ex:john, ex:spouseOf, ex:mary, null, ex:jSm>:
>> --> ex:john ex:spouseOf ex:mary | ex:jSm .
>> Based on how its name was created, a statement can belong to one of
>> two possible types:
>> auto-named: The name n for an auto-named statement <s, p, o, g, n> is
>> computed as rdfnAuto:foo(s, p, o, g), where
>> rdfnAuto is an exclusive namespace used only for names used for auto-
>> named statements, and
>> foo is an implementation-specific function that generates unique
>> string from the <s, p, o, g> portion of the statement,
>> custom-named: The name of a custom-named statement is an IRI that is
>> supplied by the data creator. (The IRI cannot have rdfnAuto as its
>> namespace prefix.)
>> The name of a statement may be used as subject or object of other
>> statements as long as there is no direct or indirect self-recursion
>> involving the name (e.g., <n, p, o, g, n> is not allowed because n
>> has to be computed using n).
>> Example 2: Adding statements about an auto-named statement (using
>> placeholder for the auto-generated name):
>> --> ex:Cleveland ex:servedAs ex:POTUS | rdfnAuto:term1 .
>> --> rdfnAuto:term1 ex:startYear 1885 ; ex:endYear 1889 .
>> Example 3: Adding statements about a custom-named statement:
>> --> ex:Cleveland ex:servedAs ex:POTUS | ex:term2 .
>> --> ex:term2 ex:startYear 1893 ; ex:endYear 1897 .
>> Core concepts and ideas in SPARQLn:
>> A new filter isAuto(<name>) is introduced to allow distinguishing
>> between auto-named and custom-named statements. If this filter is not
>> used, all statements will qualify, regardless whether auto-named or
>> custom-named, provided they match regular SPARQL criteria.
>> Example 4: The following query returns the ?cnt = 2 if the data about
>> President Cleveland's both terms (from Example 2 and Example 3 above)
>> are present in the RDF dataset:
>> --> SELECT (count(*) as ?cnt) { ?s ex:servedAs ex:POTUS }
>> Example 5: The following query returns ?cnt=1 due to the presence of
>> the isAuto() filter:
>> --> SELECT (count(*) as ?cnt) { ?s ex:servedAs ex:POTUS | ?n . FILTER
>> ( isAuto(?n) ) }
>> Example 6: The following query returns ?minStartYr = 1885, ?maxEndYr
>> = 1897:
>> --> SELECT (min(?startYr) as ?minStartYr) (max(?endYr) as ?maxEndYr)
>>         { ?s ex:servedAs ex:POTUS | ?n .
>>            ?n ex:startYear ?startYr ; ex:endYear ?endYr } 
>> A custom-named statement is considered as unasserted unless an auto-
>> named statement exists with the same <s, p, o, g>. This has
>> implications in SPARQL query processing. A new triple-pattern format,
>> that uses the << ... >> enclosure,  is introduced in SPARQL to
>> indicate whether matching with unasserted statements is allowed.
>> Example 7: Consider the following data that consists of just a single
>> custom-named statement. Since there is no auto-named statement with
>> <s, p, o, g> as <ex:bob, ex:fatherOf, ex:john, null> present, the
>> custom-named statement is considered as unasserted. The first query
>> below is looking for match with asserted statements only and hence
>> will return no results. The second query on the other hand is open to
>> considering unasserted statements as well (due to the use of the <<
>> ...>> enclosure for the triple-pattern) and will return the result:
>> ?dad = ex:bob, ?kid = ex:john. 
>> DATA:
>> --> ex:bob ex:fatherOf ex:john | ex:cname1 .
>> QUERY 1:
>> --> SELECT ?dad ?kid { ?dad ex:fatherOf ?kid }
>> QUERY 2: 
>> --> SELECT ?dad ?kid { << ?dad ex:fatherOf ?kid >> }
>> A few other relevant points:
>> For cross-system sharing of query results, include a list containing
>> <s, p, o, g, n> for each auto-generated name n that is (directly or
>> indirectly) included in the result: This is necessary due to the fact
>> that triplestores have full autonomy for implementing the function
>> foo used for generating auto-names and therefore, given the same <s,
>> p, o, g>, two different triplestores could generate two different
>> auto-names. Hence, the recipient needs to know the <s, p, o, g>
>> corresponding to each auto-name returned (or indirectly involved) in
>> the result to generate the appropriate auto-name for its local use.
>> Statement-Set: This can be done by having multiple distinct <s, p, o,
>> g> share the same custom-name. While the advantage over named graphs
>> is that statements from distinct graphs (or default graph) can form a
>> group, a disadvantage would be that auto-named statements cannot be
>> part of a (non-singleton) statement-set.
>> Ref. Transparency vs. Opacity: The current idea of "opaque by default
>> and transparent in case TEPs are involved" would work fine for RDFn
>> too.
>> Based on the above outline, I'd argue that use of RDFn to support the
>> desired extensions to RDF would also satisfy some of the practical
>> constraints that are critical for adoption by enterprise,
>> specifically: 
>> full backward-compatibility for RDF1.1 data (each RDF1.1 statement
>> becomes an auto-named (asserted) statement in RDFn)
>> continued validity of pre-existing SPARQL1.1 queries even as data
>> evolves to include more expressive content by taking advantage of new
>> capabilities to include statements about statements and multi-edges
>> minimization of the custom naming burden on the user because custom
>> names are needed only for those cases where multi-edges or (non-
>> singleton) statement-sets are involved
>> Thanks,
>> Souri.
>>
Received on Monday, 27 November 2023 10:46:00 UTC