Re: An outline of RDFn -- RDF with (auto- and custom-) names from Olaf Hartig on 2023-11-27 (public-rdf-star-wg@w3.org from November 2023)

From: Olaf Hartig <olaf.hartig@liu.se>
Date: Mon, 27 Nov 2023 09:34:45 +0000
To: "public-rdf-star-wg@w3.org" <public-rdf-star-wg@w3.org>, "souripriya.das@oracle.com" <souripriya.das@oracle.com>
Message-ID: <8967552924c7fc0b034b923df06f1192f79160d6.camel@liu.se>
Hi Souri,

I don't think your claim that "RDFn = RDF-star" is true (assuming "="
means something like: is the same as).

In your previous email you introduce the notion of an "RDFn statement"
about which you say the following.

"""
An RDFn statement is uniquely identified using the tuple <s, p, o, g,
n>, where the component n is the "name" of the statement. (The
components s, p, and o represent the subject, predicate, and object,
respectively. The component g, representing graph name, is non-NULL
only for quads and will not be used in the examples below.)
"""

First of all, notice that you are not explicitly saying what an RDFn
statement actually is; you are only saying how it is uniquely
identified. Moreover, you do not specify what kind of a thing this
"component n" is (neither do you explicitly say what kinds of things
the components s, p, o, and g are, respectively). Also, I wonder how
the notion of "is uniquely identified" would be captured explicitly as
an extension of the abstract syntax of RDF (or are you proposing to
change the abstract syntax such that it is based on such 5-tuples
rather than RDF triples??).

Now, regarding your claim, your notion of an RDFn statement is not a
concept of RDF-star [1]. Also, among the concepts of RDF-star, there is
no such thing as what you informally call ''the "name" of the
statement,'' and neither is there any notion of NULL in RDF-star
(whereas you seem to assume such a notion for RDFn).

Best,
Olaf

[1] https://www.w3.org/2021/12/rdf-star.html#concepts



On Mon, 2023-11-27 at 03:08 +0000, Souripriya Das wrote:
> Since I did not hear any comments on RDFn during the first half of
> our last meeting that I was able to attend (except, maybe, Gregg
> might have said something right at the beginning but I had audio
> issues on my side), I thought it may be helpful to mention below a
> few high-level points about RDFn and how it is related to RDF-star
> concepts and syntax: ("statement" here simply means "a triple or
> quad"):
> 
> 1) RDFn = RDF-star (which, I think, uses implicit naming in some
> sense, with << s p o >> as the name) + explicit naming (using IRIs as
> custom names).
> 
> 2) RDFn (with appropriate syntactic shortcut) would appear exactly
> the same as RDF-star to a user who does not use multi-edges or
> statement-sets.
> 
> 3) RDFn does not change anything regarding how users work with
> default graph and named graphs today.
> 
> 4) RDFn requires use of explicit naming if user needs to store multi-
> edges. For modeling multi-edges, user does not need to introduce new
> triples or quads with special properties like :isOccurrenceOf or
> :hasOccurrence.
> 
> 5) RDFn requires use of explicit naming for modeling statement-sets
> as well. A statement-set in RDFn can include (asserted or unasserted)
> triples from the default graph and the named graphs. The custom-name
> of a statement-set can be used for making statements about it.
> 
> Thanks,
> Souri.
> From: Souripriya Das <souripriya.das@oracle.com>
> Sent: Wednesday, November 15, 2023 9:39 PM
> To: RDF-star WG <public-rdf-star-wg@w3.org>
> Subject: [External] : An outline of RDFn -- RDF with (auto- and
> custom-) names
>  
> As the group tries to decide on options, the following outline of a
> revised version of RDFn may be useful for discussions.
>  
> Core concepts and ideas in RDFn: 
> An RDFn statement is uniquely identified using the tuple <s, p, o, g,
> n>, where the component n is the "name" of the statement. (The
> components s, p, and o represent the subject, predicate, and object,
> respectively. The component g, representing graph name, is non-NULL
> only for quads and will not be used in the examples below.)
> Example 1: An RDFn statement, with ex:jSm as its name, representing
> the tuple <ex:john, ex:spouseOf, ex:mary, null, ex:jSm>:
> --> ex:john ex:spouseOf ex:mary | ex:jSm .
> Based on how its name was created, a statement can belong to one of
> two possible types:
> auto-named: The name n for an auto-named statement <s, p, o, g, n> is
> computed as rdfnAuto:foo(s, p, o, g), where
> rdfnAuto is an exclusive namespace used only for names used for auto-
> named statements, and
> foo is an implementation-specific function that generates unique
> string from the <s, p, o, g> portion of the statement,
> custom-named: The name of a custom-named statement is an IRI that is
> supplied by the data creator. (The IRI cannot have rdfnAuto as its
> namespace prefix.)
> The name of a statement may be used as subject or object of other
> statements as long as there is no direct or indirect self-recursion
> involving the name (e.g., <n, p, o, g, n> is not allowed because n
> has to be computed using n).
> Example 2: Adding statements about an auto-named statement (using
> placeholder for the auto-generated name):
> --> ex:Cleveland ex:servedAs ex:POTUS | rdfnAuto:term1 .
> --> rdfnAuto:term1 ex:startYear 1885 ; ex:endYear 1889 .
> Example 3: Adding statements about a custom-named statement:
> --> ex:Cleveland ex:servedAs ex:POTUS | ex:term2 .
> --> ex:term2 ex:startYear 1893 ; ex:endYear 1897 .
> Core concepts and ideas in SPARQLn:
> A new filter isAuto(<name>) is introduced to allow distinguishing
> between auto-named and custom-named statements. If this filter is not
> used, all statements will qualify, regardless whether auto-named or
> custom-named, provided they match regular SPARQL criteria.
> Example 4: The following query returns the ?cnt = 2 if the data about
> President Cleveland's both terms (from Example 2 and Example 3 above)
> are present in the RDF dataset:
> --> SELECT (count(*) as ?cnt) { ?s ex:servedAs ex:POTUS }
> Example 5: The following query returns ?cnt=1 due to the presence of
> the isAuto() filter:
> --> SELECT (count(*) as ?cnt) { ?s ex:servedAs ex:POTUS | ?n . FILTER
> ( isAuto(?n) ) }
> Example 6: The following query returns ?minStartYr = 1885, ?maxEndYr
> = 1897:
> --> SELECT (min(?startYr) as ?minStartYr) (max(?endYr) as ?maxEndYr)
>         { ?s ex:servedAs ex:POTUS | ?n .
>            ?n ex:startYear ?startYr ; ex:endYear ?endYr } 
> A custom-named statement is considered as unasserted unless an auto-
> named statement exists with the same <s, p, o, g>. This has
> implications in SPARQL query processing. A new triple-pattern format,
> that uses the << ... >> enclosure,  is introduced in SPARQL to
> indicate whether matching with unasserted statements is allowed.
> Example 7: Consider the following data that consists of just a single
> custom-named statement. Since there is no auto-named statement with
> <s, p, o, g> as <ex:bob, ex:fatherOf, ex:john, null> present, the
> custom-named statement is considered as unasserted. The first query
> below is looking for match with asserted statements only and hence
> will return no results. The second query on the other hand is open to
> considering unasserted statements as well (due to the use of the <<
> ...>> enclosure for the triple-pattern) and will return the result:
> ?dad = ex:bob, ?kid = ex:john. 
> DATA:
> --> ex:bob ex:fatherOf ex:john | ex:cname1 .
> QUERY 1:
> --> SELECT ?dad ?kid { ?dad ex:fatherOf ?kid }
> QUERY 2: 
> --> SELECT ?dad ?kid { << ?dad ex:fatherOf ?kid >> }
> A few other relevant points:
> For cross-system sharing of query results, include a list containing
> <s, p, o, g, n> for each auto-generated name n that is (directly or
> indirectly) included in the result: This is necessary due to the fact
> that triplestores have full autonomy for implementing the function
> foo used for generating auto-names and therefore, given the same <s,
> p, o, g>, two different triplestores could generate two different
> auto-names. Hence, the recipient needs to know the <s, p, o, g>
> corresponding to each auto-name returned (or indirectly involved) in
> the result to generate the appropriate auto-name for its local use.
> Statement-Set: This can be done by having multiple distinct <s, p, o,
> g> share the same custom-name. While the advantage over named graphs
> is that statements from distinct graphs (or default graph) can form a
> group, a disadvantage would be that auto-named statements cannot be
> part of a (non-singleton) statement-set.
> Ref. Transparency vs. Opacity: The current idea of "opaque by default
> and transparent in case TEPs are involved" would work fine for RDFn
> too.
> Based on the above outline, I'd argue that use of RDFn to support the
> desired extensions to RDF would also satisfy some of the practical
> constraints that are critical for adoption by enterprise,
> specifically: 
> full backward-compatibility for RDF1.1 data (each RDF1.1 statement
> becomes an auto-named (asserted) statement in RDFn)
> continued validity of pre-existing SPARQL1.1 queries even as data
> evolves to include more expressive content by taking advantage of new
> capabilities to include statements about statements and multi-edges
> minimization of the custom naming burden on the user because custom
> names are needed only for those cases where multi-edges or (non-
> singleton) statement-sets are involved
> Thanks,
> Souri.
>
Received on Monday, 27 November 2023 09:34:56 UTC