Re: An outline of RDFn -- RDF with (auto- and custom-) names

Hi Souri,

On Thu, 2023-11-30 at 03:05 +0000, Souripriya Das wrote:
> Hi Olaf,
> 
> There are two kinds of triples in RDFn: 1) auto-named and 2) custom-
> named.

I see. A question then is: how would this distinction between these two
kinds of triples be captured in the abstract syntax of the data model?
So far, RDF has the concept of an RDF triple, the concept of an RDF
graph (defined to be a set of RDF triples), and the concept of an RDF
dataset (defined to be a collection of (name,graph) pairs plus an
additional graph called the default graph of the dataset). How do you
think to extend these concepts, or to augment them with additional
concepts, in order to capture the aforementioned distinction?

Or, do you think that this distinction does not actually need to be
captured in the abstract syntax of the data model?

Thanks,
Olaf


> For a given <s,p,o>, the auto-named one can be used as a type and all
> of the corresponding custom-named ones as tokens.
> 
> Example:
>   ex:Cleveland ex:servedAs ex:POTUS | rdfnAuto:type .    # type
> (using a locally-scoped placeholder for the auto-generated name)
>   ex:Cleveland ex:servedAs ex:POTUS | ex:term1 .               #
> token 1
>   ex:Cleveland ex:servedAs ex:POTUS | ex:term2 .               #
> token 2
> 
>   rdfnAuto:type ex:occurredBefore ex:WorldWar1 .             #
> subject is a "quoted triple" (uses placeholder for auto-named triple
> above)
>   ex:term1 ex:startYear 1885 ; ex:endYear 1889 .                   #
> subject is the name of token 1 above
>   ex:term2 ex:startYear 1893 ; ex:endYear 1897 .                   #
> subject is the name of token 2 above
> 
> Thanks,
> Souri.
> 
> From: Olaf Hartig <olaf.hartig@liu.se>
> Sent: Tuesday, November 28, 2023 3:15 AM
> To: public-rdf-star-wg@w3.org <public-rdf-star-wg@w3.org>; Souripriya
> Das <souripriya.das@oracle.com>
> Subject: [External] : Re: An outline of RDFn -- RDF with (auto- and
> custom-) names
>  
> Hi Souri,
> 
> On Mon, 2023-11-27 at 18:19 +0000, Souripriya Das wrote:
> > Hi Olaf,
> > 
> > Thanks for your comments. Let me take one comment at a time, just
> to
> > make sure that we are on the same page before moving to the next
> > comment.
> 
> Makes sense :-)
> 
> > I am not sure if you noticed the '+ explicit naming ...' in the
> > following. All I was trying to say, talking as a practitioner, is
> > that if you extend RDF-star by adding ("+") the idea of "explicit
> > naming (using IRIs as custom names)", you can arrive at RDFn. [...]
> 
> While I saw the "+ explicit naming ..." part of that bullet point,
> due
> to the parenthesis in between, I did indeed *not* notice that this
> was
> meant to be a term of the equation formula in that bullet point.
> Thanks
> for the clarification!
> 
> > > > > 1) RDFn = RDF-star (which, I think, uses implicit naming in
> > > > > some sense, with << s p o >> as the name) + explicit naming
> > > > > (using IRIs as custom names).
> > 
> > Please let me know how you feel about the above statement (and
> > whether it is simple enough for a practitioner to get the basic
> > idea). If we agree on this, we can move to your other comments.
> 
> Now that I understood the complete equation illustrated by this
> bullet
> point, I agree that a practitioner may get the idea from it. (Yet, I
> would suggest you remove the first parenthesis because it is
> distracting.)
> 
> Having said that, I still think the (complete) equation in this
> bullet
> point is still incorrect in terms of the details. As you may
> remember,
> regarding the type/token distinction, quoted triples in RDF-star are
> considered as types, not as tokens. In contrast, RDFn is about tokens
> [2]. As a consequence, one can talk about types (of triples) in RDF-
> star but not in RDFn. Therefore, the equation
> 
>     RDFn = RDF-star + explicit naming
> 
> cannot be right.
> 
> Best,
> Olaf
> 
> 
> [1] 
> https://urldefense.com/v3/__https://plato.stanford.edu/entries/types-tokens/__;!!ACWV5N9M2RV99hQ!JvZuNalUDJlbVgtFZ4jO4jxSvUQqqDG8Pmfx_-QbrlNYTAvW_p61sIGdan4w8WZWaFH5jzDcSV7R3I0bpV4MPMgJ$

>  
> 
> [2] 
> https://urldefense.com/v3/__https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Oct/0106.html__;!!ACWV5N9M2RV99hQ!JvZuNalUDJlbVgtFZ4jO4jxSvUQqqDG8Pmfx_-QbrlNYTAvW_p61sIGdan4w8WZWaFH5jzDcSV7R3I0bpY2VtEjN$ 
> 
> 
> > Thanks,
> > Souri.
> > From: Olaf Hartig <olaf.hartig@liu.se>
> > Sent: Monday, November 27, 2023 7:40 AM
> > To: tl@rat.io <tl@rat.io>; public-rdf-star-wg@w3.org <
> > public-rdf-star-wg@w3.org>; Souripriya Das <
> souripriya.das@oracle.com
> > >
> > Subject: [External] : Re: An outline of RDFn -- RDF with (auto- and
> > custom-) names
> >  
> > Hi Thomas,
> > 
> > How do you know that RDFn is about tokens? I have not seen Souri
> > making
> > any explicit statements in this direction.
> > 
> > Also, it is not correct to say that "both approaches add a fifth
> > element to the subject, predicate, object and graph that we already
> > have."  RDF-star does not add a fifth element. Strictly speaking,
> > RDF-
> > star does not even have "graph" as a fourth element--there is no
> > notion
> > of a quad in the abstract syntax of RDF-star (and neither is there
> > any
> > such notion in the abstract syntax of RDF). Instead, RDF-star is
> > about
> > i) triples (which may be nested),
> > ii) graphs as sets of such triples, and
> > iii) datasets as collections of (IRI/bnode, graph) pairs, with an
> > additional graph called the default graph.
> > That is all there is in RDF-star. Adding "a fifth element" (as RDFn
> > seems to do) requires extending the abstract syntax with additional
> > concepts, and that's why "RDFn = RDF-star" is not true.
> > 
> > Olaf
> > 
> > 
> > On Mon, 2023-11-27 at 11:45 +0100, Thomas Lörtsch wrote:
> > > Olaf,
> > >
> > > you should acknowledge that RDF-star is only defined on types of
> > > statements, but actual use cases in there overwhelming majority
> > > (including the "seminal example" that you wrongly used in your
> > papers
> > > on RDF*) work on tokens. A mechanism to define a reference to
> such
> > a
> > > token is mentioned in the RDF-star CG report only in the most
> > > informal way possible.
> > >
> > > Ergo the RDF-star formalization is irrelevant in practice and for
> > > actual practical applications using some derivate of
> > > ':occurrenceOf'  no formalization exists, not even an informal
> > > standard vocabulary - years after the problem has been pointed
> out
> > to
> > > the CG. In that respect RDFn is definitely one or two steps ahead
> > of
> > > RDF-star.
> > >
> > > Also, I think Souri is right in another way: RDFn provides quins
> > out
> > > of the box, RDF-star handwavingly resorts to out-of-band means to
> > > define a token identifier, but both approaches add a fifth
> element
> > to
> > > the subject, predicate, object and graph that we already have.
> > > However, that is a problem with both RDFn and RDF-star, that a
> > named
> > > graph based approach can avoid, to considerable benefit of
> > > implementors as well as users.
> > >
> > > Thomas
> > >
> > > Am 27. November 2023 10:34:45 MEZ schrieb Olaf Hartig <
> > > olaf.hartig@liu.se>:
> > > > Hi Souri,
> > > >
> > > > I don't think your claim that "RDFn = RDF-star" is true
> (assuming
> > > > "="
> > > > means something like: is the same as).
> > > >
> > > > In your previous email you introduce the notion of an "RDFn
> > > > statement"
> > > > about which you say the following.
> > > >
> > > > """
> > > > An RDFn statement is uniquely identified using the tuple <s, p,
> > o,
> > > > g,
> > > > n>, where the component n is the "name" of the statement. (The
> > > > components s, p, and o represent the subject, predicate, and
> > > > object,
> > > > respectively. The component g, representing graph name, is non-
> > NULL
> > > > only for quads and will not be used in the examples below.)
> > > > """
> > > >
> > > > First of all, notice that you are not explicitly saying what an
> > > > RDFn
> > > > statement actually is; you are only saying how it is uniquely
> > > > identified. Moreover, you do not specify what kind of a thing
> > this
> > > > "component n" is (neither do you explicitly say what kinds of
> > > > things
> > > > the components s, p, o, and g are, respectively). Also, I
> wonder
> > > > how
> > > > the notion of "is uniquely identified" would be captured
> > explicitly
> > > > as
> > > > an extension of the abstract syntax of RDF (or are you
> proposing
> > to
> > > > change the abstract syntax such that it is based on such 5-
> tuples
> > > > rather than RDF triples??).
> > > >
> > > > Now, regarding your claim, your notion of an RDFn statement is
> > not
> > > > a
> > > > concept of RDF-star [1]. Also, among the concepts of RDF-star,
> > > > there is
> > > > no such thing as what you informally call ''the "name" of the
> > > > statement,'' and neither is there any notion of NULL in RDF-
> star
> > > > (whereas you seem to assume such a notion for RDFn).
> > > >
> > > > Best,
> > > > Olaf
> > > >
> > > > [1]
> > > > 
> > 
> https://urldefense.com/v3/__https://www.w3.org/2021/12/rdf-star.html*concepts__;Iw!!ACWV5N9M2RV99hQ!Ibiq7odY3h_LSW8OEJGy61ig9MRR2G6pwS6Mr2qFRQ5vo5AYBGNwIBXLX_gvfLHTMh3uVgMmkSczC6klziFHsdCP$

> >  
> > > >
> > > >
> > > > On Mon, 2023-11-27 at 03:08 +0000, Souripriya Das wrote:
> > > > > Since I did not hear any comments on RDFn during the first
> half
> > > > > of
> > > > > our last meeting that I was able to attend (except, maybe,
> > Gregg
> > > > > might have said something right at the beginning but I had
> > audio
> > > > > issues on my side), I thought it may be helpful to mention
> > below
> > > > > a
> > > > > few high-level points about RDFn and how it is related to
> RDF-
> > > > > star
> > > > > concepts and syntax: ("statement" here simply means "a triple
> > or
> > > > > quad"):
> > > > >
> > > > > 1) RDFn = RDF-star (which, I think, uses implicit naming in
> > some
> > > > > sense, with << s p o >> as the name) + explicit naming (using
> > > > > IRIs as
> > > > > custom names).
> > > > >
> > > > > 2) RDFn (with appropriate syntactic shortcut) would appear
> > > > > exactly
> > > > > the same as RDF-star to a user who does not use multi-edges
> or
> > > > > statement-sets.
> > > > >
> > > > > 3) RDFn does not change anything regarding how users work
> with
> > > > > default graph and named graphs today.
> > > > >
> > > > > 4) RDFn requires use of explicit naming if user needs to
> store
> > > > > multi-
> > > > > edges. For modeling multi-edges, user does not need to
> > introduce
> > > > > new
> > > > > triples or quads with special properties like :isOccurrenceOf
> > or
> > > > > :hasOccurrence.
> > > > >
> > > > > 5) RDFn requires use of explicit naming for modeling
> statement-
> > > > > sets
> > > > > as well. A statement-set in RDFn can include (asserted or
> > > > > unasserted)
> > > > > triples from the default graph and the named graphs. The
> > custom-
> > > > > name
> > > > > of a statement-set can be used for making statements about
> it.
> > > > >
> > > > > Thanks,
> > > > > Souri.
> > > > > From: Souripriya Das <souripriya.das@oracle.com>
> > > > > Sent: Wednesday, November 15, 2023 9:39 PM
> > > > > To: RDF-star WG <public-rdf-star-wg@w3.org>
> > > > > Subject: [External] : An outline of RDFn -- RDF with (auto-
> and
> > > > > custom-) names
> > > > >
> > > > > As the group tries to decide on options, the following
> outline
> > of
> > > > > a
> > > > > revised version of RDFn may be useful for discussions.
> > > > >
> > > > > Core concepts and ideas in RDFn:
> > > > > An RDFn statement is uniquely identified using the tuple <s,
> p,
> > > > > o, g,
> > > > > n>, where the component n is the "name" of the statement.
> (The
> > > > > components s, p, and o represent the subject, predicate, and
> > > > > object,
> > > > > respectively. The component g, representing graph name, is
> non-
> > > > > NULL
> > > > > only for quads and will not be used in the examples below.)
> > > > > Example 1: An RDFn statement, with ex:jSm as its name,
> > > > > representing
> > > > > the tuple <ex:john, ex:spouseOf, ex:mary, null, ex:jSm>:
> > > > > --> ex:john ex:spouseOf ex:mary | ex:jSm .
> > > > > Based on how its name was created, a statement can belong to
> > one
> > > > > of
> > > > > two possible types:
> > > > > auto-named: The name n for an auto-named statement <s, p, o,
> g,
> > > > > n> is
> > > > > computed as rdfnAuto:foo(s, p, o, g), where
> > > > > rdfnAuto is an exclusive namespace used only for names used
> for
> > > > > auto-
> > > > > named statements, and
> > > > > foo is an implementation-specific function that generates
> > unique
> > > > > string from the <s, p, o, g> portion of the statement,
> > > > > custom-named: The name of a custom-named statement is an IRI
> > that
> > > > > is
> > > > > supplied by the data creator. (The IRI cannot have rdfnAuto
> as
> > > > > its
> > > > > namespace prefix.)
> > > > > The name of a statement may be used as subject or object of
> > other
> > > > > statements as long as there is no direct or indirect self-
> > > > > recursion
> > > > > involving the name (e.g., <n, p, o, g, n> is not allowed
> > because
> > > > > n
> > > > > has to be computed using n).
> > > > > Example 2: Adding statements about an auto-named statement
> > (using
> > > > > placeholder for the auto-generated name):
> > > > > --> ex:Cleveland ex:servedAs ex:POTUS | rdfnAuto:term1 .
> > > > > --> rdfnAuto:term1 ex:startYear 1885 ; ex:endYear 1889 .
> > > > > Example 3: Adding statements about a custom-named statement:
> > > > > --> ex:Cleveland ex:servedAs ex:POTUS | ex:term2 .
> > > > > --> ex:term2 ex:startYear 1893 ; ex:endYear 1897 .
> > > > > Core concepts and ideas in SPARQLn:
> > > > > A new filter isAuto(<name>) is introduced to allow
> > distinguishing
> > > > > between auto-named and custom-named statements. If this
> filter
> > is
> > > > > not
> > > > > used, all statements will qualify, regardless whether auto-
> > named
> > > > > or
> > > > > custom-named, provided they match regular SPARQL criteria.
> > > > > Example 4: The following query returns the ?cnt = 2 if the
> data
> > > > > about
> > > > > President Cleveland's both terms (from Example 2 and Example
> 3
> > > > > above)
> > > > > are present in the RDF dataset:
> > > > > --> SELECT (count(*) as ?cnt) { ?s ex:servedAs ex:POTUS }
> > > > > Example 5: The following query returns ?cnt=1 due to the
> > presence
> > > > > of
> > > > > the isAuto() filter:
> > > > > --> SELECT (count(*) as ?cnt) { ?s ex:servedAs ex:POTUS | ?n
> .
> > > > > FILTER
> > > > > ( isAuto(?n) ) }
> > > > > Example 6: The following query returns ?minStartYr = 1885,
> > > > > ?maxEndYr
> > > > > = 1897:
> > > > > --> SELECT (min(?startYr) as ?minStartYr) (max(?endYr) as
> > > > > ?maxEndYr)
> > > > >         { ?s ex:servedAs ex:POTUS | ?n .
> > > > >            ?n ex:startYear ?startYr ; ex:endYear ?endYr }
> > > > > A custom-named statement is considered as unasserted unless
> an
> > > > > auto-
> > > > > named statement exists with the same <s, p, o, g>. This has
> > > > > implications in SPARQL query processing. A new triple-pattern
> > > > > format,
> > > > > that uses the << ... >> enclosure,  is introduced in SPARQL
> to
> > > > > indicate whether matching with unasserted statements is
> > allowed.
> > > > > Example 7: Consider the following data that consists of just
> a
> > > > > single
> > > > > custom-named statement. Since there is no auto-named
> statement
> > > > > with
> > > > > <s, p, o, g> as <ex:bob, ex:fatherOf, ex:john, null> present,
> > the
> > > > > custom-named statement is considered as unasserted. The first
> > > > > query
> > > > > below is looking for match with asserted statements only and
> > > > > hence
> > > > > will return no results. The second query on the other hand is
> > > > > open to
> > > > > considering unasserted statements as well (due to the use of
> > the
> > > > > <<
> > > > > ...>> enclosure for the triple-pattern) and will return the
> > > > > result:
> > > > > ?dad = ex:bob, ?kid = ex:john.
> > > > > DATA:
> > > > > --> ex:bob ex:fatherOf ex:john | ex:cname1 .
> > > > > QUERY 1:
> > > > > --> SELECT ?dad ?kid { ?dad ex:fatherOf ?kid }
> > > > > QUERY 2:
> > > > > --> SELECT ?dad ?kid { << ?dad ex:fatherOf ?kid >> }
> > > > > A few other relevant points:
> > > > > For cross-system sharing of query results, include a list
> > > > > containing
> > > > > <s, p, o, g, n> for each auto-generated name n that is
> > (directly
> > > > > or
> > > > > indirectly) included in the result: This is necessary due to
> > the
> > > > > fact
> > > > > that triplestores have full autonomy for implementing the
> > > > > function
> > > > > foo used for generating auto-names and therefore, given the
> > same
> > > > > <s,
> > > > > p, o, g>, two different triplestores could generate two
> > different
> > > > > auto-names. Hence, the recipient needs to know the <s, p, o,
> g>
> > > > > corresponding to each auto-name returned (or indirectly
> > involved)
> > > > > in
> > > > > the result to generate the appropriate auto-name for its
> local
> > > > > use.
> > > > > Statement-Set: This can be done by having multiple distinct
> <s,
> > > > > p, o,
> > > > > g> share the same custom-name. While the advantage over named
> > > > > graphs
> > > > > is that statements from distinct graphs (or default graph)
> can
> > > > > form a
> > > > > group, a disadvantage would be that auto-named statements
> > cannot
> > > > > be
> > > > > part of a (non-singleton) statement-set.
> > > > > Ref. Transparency vs. Opacity: The current idea of "opaque by
> > > > > default
> > > > > and transparent in case TEPs are involved" would work fine
> for
> > > > > RDFn
> > > > > too.
> > > > > Based on the above outline, I'd argue that use of RDFn to
> > support
> > > > > the
> > > > > desired extensions to RDF would also satisfy some of the
> > > > > practical
> > > > > constraints that are critical for adoption by enterprise,
> > > > > specifically:
> > > > > full backward-compatibility for RDF1.1 data (each RDF1.1
> > > > > statement
> > > > > becomes an auto-named (asserted) statement in RDFn)
> > > > > continued validity of pre-existing SPARQL1.1 queries even as
> > data
> > > > > evolves to include more expressive content by taking
> advantage
> > of
> > > > > new
> > > > > capabilities to include statements about statements and
> multi-
> > > > > edges
> > > > > minimization of the custom naming burden on the user because
> > > > > custom
> > > > > names are needed only for those cases where multi-edges or
> > (non-
> > > > > singleton) statement-sets are involved
> > > > > Thanks,
> > > > > Souri.
> > > > >

Received on Thursday, 30 November 2023 10:23:50 UTC