- From: Souripriya Das <souripriya.das@oracle.com>
- Date: Mon, 27 Nov 2023 03:08:34 +0000
- To: RDF-star WG <public-rdf-star-wg@w3.org>
- Message-ID: <CY5PR10MB6071B98639916EBD64D691ECFAB8A@CY5PR10MB6071.namprd10.prod.outlook.com>
Since I did not hear any comments on RDFn during the first half of our last meeting that I was able to attend (except, maybe, Gregg might have said something right at the beginning but I had audio issues on my side), I thought it may be helpful to mention below a few high-level points about RDFn and how it is related to RDF-star concepts and syntax: ("statement" here simply means "a triple or quad"): 1) RDFn = RDF-star (which, I think, uses implicit naming in some sense, with << s p o >> as the name) + explicit naming (using IRIs as custom names). 2) RDFn (with appropriate syntactic shortcut) would appear exactly the same as RDF-star to a user who does not use multi-edges or statement-sets. 3) RDFn does not change anything regarding how users work with default graph and named graphs today. 4) RDFn requires use of explicit naming if user needs to store multi-edges. For modeling multi-edges, user does not need to introduce new triples or quads with special properties like :isOccurrenceOf or :hasOccurrence. 5) RDFn requires use of explicit naming for modeling statement-sets as well. A statement-set in RDFn can include (asserted or unasserted) triples from the default graph and the named graphs. The custom-name of a statement-set can be used for making statements about it. Thanks, Souri. ________________________________ From: Souripriya Das <souripriya.das@oracle.com> Sent: Wednesday, November 15, 2023 9:39 PM To: RDF-star WG <public-rdf-star-wg@w3.org> Subject: [External] : An outline of RDFn -- RDF with (auto- and custom-) names As the group tries to decide on options, the following outline of a revised version of RDFn may be useful for discussions. Core concepts and ideas in RDFn: 1. An RDFn statement is uniquely identified using the tuple <s, p, o, g, n>, where the component n is the "name" of the statement. (The components s, p, and o represent the subject, predicate, and object, respectively. The component g, representing graph name, is non-NULL only for quads and will not be used in the examples below.) Example 1: An RDFn statement, with ex:jSm as its name, representing the tuple <ex:john, ex:spouseOf, ex:mary, null, ex:jSm>: --> ex:john ex:spouseOf ex:mary | ex:jSm . 2. Based on how its name was created, a statement can belong to one of two possible types: * auto-named: The name n for an auto-named statement <s, p, o, g, n> is computed as rdfnAuto:foo(s, p, o, g), where * rdfnAuto is an exclusive namespace used only for names used for auto-named statements, and * foo is an implementation-specific function that generates unique string from the <s, p, o, g> portion of the statement, * custom-named: The name of a custom-named statement is an IRI that is supplied by the data creator. (The IRI cannot have rdfnAuto as its namespace prefix.) 3. The name of a statement may be used as subject or object of other statements as long as there is no direct or indirect self-recursion involving the name (e.g., <n, p, o, g, n> is not allowed because n has to be computed using n). Example 2: Adding statements about an auto-named statement (using placeholder for the auto-generated name): --> ex:Cleveland ex:servedAs ex:POTUS | rdfnAuto:term1 . --> rdfnAuto:term1 ex:startYear 1885 ; ex:endYear 1889 . Example 3: Adding statements about a custom-named statement: --> ex:Cleveland ex:servedAs ex:POTUS | ex:term2 . --> ex:term2 ex:startYear 1893 ; ex:endYear 1897 . Core concepts and ideas in SPARQLn: 1. A new filter isAuto(<name>) is introduced to allow distinguishing between auto-named and custom-named statements. If this filter is not used, all statements will qualify, regardless whether auto-named or custom-named, provided they match regular SPARQL criteria. Example 4: The following query returns the ?cnt = 2 if the data about President Cleveland's both terms (from Example 2 and Example 3 above) are present in the RDF dataset: --> SELECT (count(*) as ?cnt) { ?s ex:servedAs ex:POTUS } Example 5: The following query returns ?cnt=1 due to the presence of the isAuto() filter: --> SELECT (count(*) as ?cnt) { ?s ex:servedAs ex:POTUS | ?n . FILTER ( isAuto(?n) ) } Example 6: The following query returns ?minStartYr = 1885, ?maxEndYr = 1897: --> SELECT (min(?startYr) as ?minStartYr) (max(?endYr) as ?maxEndYr) { ?s ex:servedAs ex:POTUS | ?n . ?n ex:startYear ?startYr ; ex:endYear ?endYr } 2. A custom-named statement is considered as unasserted unless an auto-named statement exists with the same <s, p, o, g>. This has implications in SPARQL query processing. A new triple-pattern format, that uses the << ... >> enclosure, is introduced in SPARQL to indicate whether matching with unasserted statements is allowed. Example 7: Consider the following data that consists of just a single custom-named statement. Since there is no auto-named statement with <s, p, o, g> as <ex:bob, ex:fatherOf, ex:john, null> present, the custom-named statement is considered as unasserted. The first query below is looking for match with asserted statements only and hence will return no results. The second query on the other hand is open to considering unasserted statements as well (due to the use of the << ...>> enclosure for the triple-pattern) and will return the result: ?dad = ex:bob, ?kid = ex:john. DATA: --> ex:bob ex:fatherOf ex:john | ex:cname1 . QUERY 1: --> SELECT ?dad ?kid { ?dad ex:fatherOf ?kid } QUERY 2: --> SELECT ?dad ?kid { << ?dad ex:fatherOf ?kid >> } A few other relevant points: 1. For cross-system sharing of query results, include a list containing <s, p, o, g, n> for each auto-generated name n that is (directly or indirectly) included in the result: This is necessary due to the fact that triplestores have full autonomy for implementing the function foo used for generating auto-names and therefore, given the same <s, p, o, g>, two different triplestores could generate two different auto-names. Hence, the recipient needs to know the <s, p, o, g> corresponding to each auto-name returned (or indirectly involved) in the result to generate the appropriate auto-name for its local use. 2. Statement-Set: This can be done by having multiple distinct <s, p, o, g> share the same custom-name. While the advantage over named graphs is that statements from distinct graphs (or default graph) can form a group, a disadvantage would be that auto-named statements cannot be part of a (non-singleton) statement-set. 3. Ref. Transparency vs. Opacity: The current idea of "opaque by default and transparent in case TEPs are involved" would work fine for RDFn too. Based on the above outline, I'd argue that use of RDFn to support the desired extensions to RDF would also satisfy some of the practical constraints that are critical for adoption by enterprise, specifically: * full backward-compatibility for RDF1.1 data (each RDF1.1 statement becomes an auto-named (asserted) statement in RDFn) * continued validity of pre-existing SPARQL1.1 queries even as data evolves to include more expressive content by taking advantage of new capabilities to include statements about statements and multi-edges * minimization of the custom naming burden on the user because custom names are needed only for those cases where multi-edges or (non-singleton) statement-sets are involved Thanks, Souri.
Received on Monday, 27 November 2023 03:08:47 UTC