- From: Thomas Lörtsch <tl@rat.io>
- Date: Thu, 30 Nov 2023 17:36:06 +0100
- To: Niklas Lindström <lindstream@gmail.com>
- Cc: RDF-star Working Group <public-rdf-star-wg@w3.org>
Hi Niklas, I did of course read until the end ;-) but I’m top-posting for readability. But I also incorporated your corrections in the quoted response below. I think too that some proposals are not too far away from each other and that we can get to a coherent whole (better with graphs, but also with triples). Your version B comes pretty close to the nested graph proposal. Version C however, your fallback, will probably not gain much acceptance from implementors because it requires to introspect the triple/graph identifier. I might be wrong, but I guess this is a no-go and standard reification based version A would not only be easier to implement but also more performant - and perfectly backwards compatible. I like your attempts to firmly tie a token and its identifier together, like > <bob> foaf:birthday "1970-01-01" {<#t1>} . and > << _:b1 | <bob> foaf:birthday "1970-01-01" >> ex:certainty 0.9 ; > dct:source <s1> . > << _:b2 | <bob> foaf:birthday "1970-01-01" >> ex:certainty 0.8 ; > dct:source <s2> . Such tight coupling really helps with use cases like that described in "3.1. Challenge #1: Edge Properties, Multiple Edge Instances, and Reification" in "The OneGraph Vision: Challenges of Breaking the Graph Model Lock-In", 2023, Lassila et al [0] (which I also mentioned an hour ago in a response to Andy in another thread). I would like to propose yet another variant: <bob> foaf:birthday "1970-01-01" {| _:b1 | dct:source <s1> ; ex:certainty 0.9 |}, {| _:b2 | dct:source <s2>; ex:certainty 0.8 |} . However, those identifiers must be provided automatically and can not rely on users to take the extra step and define them. In that respect I find the nested graph syntax more succinct: []{ :s :p :o } One can’t omit the preceding name (blank node or explicit IRI) without running into a parsing error - and it’s not more keystrokes either. Please forgive the blatant self-advertisement, camouflaged as suggestion. Relegating opacity to future work seems a waste. Although I would not interpret the charter as explicitly asking for it - I just doubt that many people realized it was there in the first place - any mechamism that distinguishes 'accepted' triples from other triples already has everything in place to enable opacity, unassertedness and whatever else one might desire. Best, Thomas [0] https://content.iospress.com/articles/semantic-web/sw223273&hl=en&sa=T&oi=gsb-ggp&ct=res&cd=0&d=16666059864973320262&ei=IaloZfu3OrKey9YPyMm8yA8&scisig=AFWwaeZbqxIIkVJxn-4m3LZzXFU2 > On 30. Nov 2023, at 14:39, Niklas Lindström <lindstream@gmail.com> wrote: > > Dear all, > > I actually think the current proposals are closer to each other than > it might seem. > > What Souri proposes with RDFn [1] is very close to what I was seeking > with "bound" named graphs ([2], [3]). Both are "about tokens" (as in > the same triple can be named by more than one identifier (blank node > or IRI), which are considered distinct unless asserted to be the > same). But Souri proposes something valuable, which has been around in > various guises before (e.g. in [4] and [5]), and I think is also > alluded to by Peter in [6] (option 2,1,1, expanding to "the same > central node"). > > Here is an attempt at consolidation of these various ideas, taking > what the CG was seeking into account (and this time keeping all of its > syntax). > > > ## The Troubles of Describing Triples > > Having triple terms as "types" has shown to be troublesome, both in > theory and practise. They are *universals* (like literals), and > neither provenance nor qualification (our actual use cases) are about > universals. Cases describe instantiated occurrences of them, in > various contexts (graphs). Admittedly, these are *mainly* the asserted > triples in the current graph (one unique s,p,o per g). So the "type" > point of view is understandable, and in the simplest cases is all you > see. But also "referenced" or "possible" triples come into view a lot; > and they all are "identified by their singleton sets". Such referenced > ("backing") triples also cater for the LPG cases; but can stay > unasserted, in the background, without "polluting" RDF with multisets. > > (It is not logically wrong to talk about universals directly, but it > is rarely (if ever) the intent. RDF has this *cautious* design of > disallowing literals in the subject position for this reason. To > prevent users from "shooting themselves in the foot", if you will.) > > > ## Consolidating Occurrences: Default Token Identifiers > > This "auto-named triple" approach solves the disconnect, in that it > "talks about tokens", without abandoning the effect of concentrating > on a default triple in a graph in the simplest cases. > > So, we can: > > * Define a function (tripleId) that maps s,p,o to a unique identifier > (blank node or IRI). This denotes a "default triple token", or, if you > will, the triple occurrence *in a graph*. > > > ## Options at Hand > > Let's examine a case and some options. I'll use this example (not > because it's my favorite, but because it is common, and also contains > the "seminal error", which we "save ourselves from" by describing > tokens): > > << <bob> foaf:birthday "1970-01-01" >> ex:certainty 0.9 ; > dct:source <s1> . > > This is the same default triple token" throughout the graph, and the > above is the same as: > > << <bob> foaf:birthday "1970-01-01" >> ex:certainty 0.9 . > << <bob> foaf:birthday "1970-01-01" >> dct:source <s1> . > > (Note: Of course the date should be `"1970-01-01"^^xsd:date`; it's > omitted for brevity.) > > For this syntax, we use `tripleId` to get a unique identifier from the > syntactic triple term. Below we'll use a simple bnode id, `_:bb70`; > but anything goes as long as it is unique, e.g. a hash-based bnode id > like `_:gen6e16a579edbbf4dc3339be9415c39ea8`, an IRI like > `<urn:tdb:2014:urn:md5:6e16a579edbbf4dc3339be9415c39ea8>` or a > data-URL-variant thereof (no hash; terribly long). > > ## Option A: Reification > > This can be used as the identifier of a simple reified statement: > > _:bb70 rdf:subject <bob> . > _:bb70 rdf:predicate foaf:birthday . > _:bb70 rdf:object "1970-01-01" . > > _:bb70 ex:certainty 0.9 . > _:bb70 dct:source <s1> . > > For the annotation shorthand: > > <bob> foaf:birthday "1970-01-01" {| ex:certainty 0.9 ; > dct:source <s1> |} . > > This could become: > > <bob> foaf:birthday "1970-01-01" . > > _:bb70 rdf:subject <bob> . > _:bb70 rdf:predicate foaf:birthday . > _:bb70 rdf:object "1970-01-01" . > > _:bb70 ex:certainty 0.9 . > _:bb70 dct:source <s1> . > > We do want repeated annotations too (in some form): > > <bob> foaf:birthday "1970-01-01" {| dct:source <s1> ; > ex:certainty 0.9 |}, > "1970-01-01" {| dct:source <s2>; > ex:certainty 0.8 |} . > > When there is more than one "referenced occurrence" like this, the > auto-naming isn't used, since the reference triples "decohere". So we > reasonably get regular blank nodes: > > <bob> foaf:birthday "1970-01-01" . > > _:b1 rdf:subject <bob> . > _:b1 rdf:predicate foaf:birthday . > _:b1 rdf:object "1970-01-01" . > _:b1 dct:source <s1> . > _:b1 ex:certainty 0.9 . > > _:b2 rdf:subject <bob> . > _:b2 rdf:predicate foaf:birthday . > _:b2 rdf:object "1970-01-01" . > _:b2 dct:source <s2> . > _:b2 ex:certainty 0.8 . > > It could make sense to always use regular blank nodes for the > annotation form; *or* to require explicit names for repetitions. > > On that note, here is a form for explicitly named annotations: > > <bob> foaf:birthday "1970-01-01" {<#t1>} . > > <#t1> ex:certainty 0.9; > dct:source <s1> . > > In "terse" triples: > > <bob> foaf:birthday "1970-01-01" . > > <#t1> rdf:subject <bob> . > <#t1> rdf:predicate foaf:birthday . > <#t1> rdf:object "1970-01-01" . > <#t1> ex:certainty 0.9 . > <#t1> dct:source <s1> . > > With this, we finally have a Turtle equivalent to RDF/XML statement > annotations (used extensively in UniProt): > > <rdf:Description rdf:about="bob"> > <foaf:birthday rdf:ID="t1">1970-01-01</foaf:birthday> > </rdf:Description> > > <rdf:Description rdf:ID="t1"> > <ex:certainty rdf:datatype="&xsd;double">0.9</ex:certainty> > <dct:source rdf:resource="s1"/> > </rdf:Description> > > How do we "save ourselves from the seminal error" then, if triple > terms are at least type-like? In this basic form we could just resort > to reification; or triple terms could have an optional identifier, > like: > > << _:b1 | <bob> foaf:birthday "1970-01-01" >> ex:certainty 0.9 ; > dct:source <s1> . > << _:b2 | <bob> foaf:birthday "1970-01-01" >> ex:certainty 0.8 ; > dct:source <s2> . > > Or (which I prefer) the completing object could be marked as "quoted" > (I've previously used `--`, but it has been considered hard to spot): > > <bob> foaf:birthday << "1970-01-01" >> {| dct:source <s1> ; > ex:certainty 0.9 |}, > << "1970-01-01" >> {| dct:source <s2>; > ex:certainty 0.8 |} . > > Exact syntax isn't important yet, only whether this is what we can > converge upon or not. > > For named graphs, this: > > <g1> { > << <bob> foaf:birthday "1970-01-01" >> ex:certainty 0.9 . > } > <g2> { > << <bob> foaf:birthday "1970-01-01" >> ex:certainty 0.8 . > } > > becomes, in "terse" quads: > > _:bb70 rdf:subject <bob> <g1> . > _:bb70 rdf:predicate foaf:birthday <g1> . > _:bb70 rdf:object "1970-01-01" <g1> . > _:bb70 ex:certainty 0.9 <g1> . > > _:bb70 rdf:subject <bob> <g2> . > _:bb70 rdf:predicate foaf:birthday <g2> . > _:bb70 rdf:object "1970-01-01" <g2> . > _:bb70 ex:certainty 0.8 <g2> . > > Granted, given the reasoning above (an instantiated triple occurrence > in a graph) it might make sense that `tripleId` mint different > identifiers for different graphs. Annotation forms achieve that anyway > though, and the above is simpler as is (*if* the *union* of the two > graphs share blank nodes, the certainty claims in them are in conflict > (assuming such semantics for the property), which can be important > information). > > Of course, we're still on square one here. It's more *convenient* > reification, but perhaps not *better*. While this could be all we > need, let's look further a bit. > > > ## Option B: Attempting Semantics for Datasets > > What I've been aiming for is isolated (as in unasserted, from the open > world point of view) named triple sets, bound to another "graph name > resource" in a dataset. > > I *tried* to base my approach on the open-ended options for dataset > semantics, without touching the abstract syntax. This was not about > giving all uses of named graphs fixed semantics, but about *opting in* > to semantic datasets. I thought this was respectful of what's out > there, given what RDF 1.1 Concepts states [7]: > >> RDF does not place any formal restrictions on what resource the graph name may denote, nor on the relationship between that resource and the graph. A discussion of different RDF dataset semantics can be found in [RDF11-DATASETS]. > > Given that, claiming that graph names mean nothing is only *one* of > many possible interpretations. And while formal means for doing so are > still undefined, I hoped they didn't have to be. Looking at > RDF11-DATASETS [8]: > >> A vocabulary specifically tailored for describing the intended dataset semantics could be defined in a future specification. > > It suggests that through description of the resource naming a graph, > defining how the graph it is paired with is interpreted, within a > dataset, could be possible. Its dataset semantics option 3.4 [9] is > close to what I've attempted. With such semantics for named graphs, in > order not to break monotonicity, graphs must reasonably be explicitly > "accepted" to be considered asserted in a union default graph [10]. > > So my option for the above was to, out of band (in an implementation) > *selecting* a semantic dataset profile, in which named graphs are > isolated unless accepted. (The simple act of loading them into graph > names in a semantic graph store would "accept" the default graph here, > but not the named graph.) > > So our example simply becomes: > > _:bb70 ex:certainty 0.9 . > _:bb70 dct:source <s1> . > > <bob> foaf:birthday "1970-01-01" _:bb70 . > > And for scoping this (for graph store management), I proposed > `rdfx:boundBy` to relate two graph name resources to ensure that the > "bound" ones remain isolated, and "owned" by their binding resource > (for atomic updates and deletes). So if we read the above into named > graph `<g1>`, we get: > > _:bb70 rdfx:boundBy <g1> . > > _:bb70 ex:certainty 0.9 <g1> . > _:bb70 dct:source <s1> <g1> . > > <bob> foaf:birthday "1970-01-01" _:bb70 . > > *Of course* this is not an easy thing to formalize and get > implemented. It requires "semantic datasets", and is hard to get right > (defining semantics by the presence of statements (without breaking > monotonicity), requiring an explicit opt-in profile, etc). > > Thus I said it might be a tall order. Too tall, I've gathered. So > let's defer this option, and see if we can do something else *now* > which does not prevent semantic datasets in the future. > > > ## Option C: Explicit Abstract Syntax Instead > > Another way to get isolated named triple sets is to make them explicit > in the concepts and abstract syntax, but without adding new terms that > regular users will come across (so neither the subject, predicate nor > object positions of triples have access to anything novel). > > This is drawing from Souri's RDFn *and* Andy's graph terms [11], in a > kind of amalgam (or compromise). > > * Define a new kind of quoted identifier. *Not* for general use, > *only* for the fourth position in a quad. > * It is formed by a regular identifier (blank node id or IRI) and an > optional graph name identifier. Formally: quoted(id=some-id, optional > graph=some-graph). > * Triples named by this term are *not asserted*. > > (It is called "quoted" here, but could of course be called e.g. > "isolated" or "protected".) > > Here I use this syntax for such "quoted identifiers" for something in > a default graph (again, *only* usable in the fourth position of a > quad): > > {_:bb70} > > And this for a quoted identifier in a named graph `<g1>`: > > <g1>{_:bb70} > > Structurally, it is related to typed literals. To a lesser extent it > is reminiscent of the triple terms it replaces; the main difference > being that this is not a recursive structure; and that the identifier > "within" is a regular RDF identifier which is used in subjects and > objects. > > Here is the initial example in "terse pseudo-quads": > > <bob> foaf:birthday "1970-01-01" {_:bb70} . > _:bb70 ex:certainty 0.9 . > _:bb70 dct:source <s1> . > > And for a triple description in a named graph: > > <g1> { > << <bob> foaf:birthday "1970-01-01" >> ex:certainty 0.9 ; > dct:source <s1> . > } > > In "terse pseudo-quads": > > <bob> foaf:birthday "1970-01-01" <g1>{_:bb70} . > _:bb70 ex:certainty 0.9 <g1> . > _:bb70 dct:source <s1> <g1> . > > Of course, this can be considered as "quins in disguise". As such this > option is *very* close to what RDFn proposes. The main difference is > that not *all* triples are auto-named, only "RDF-star-described" ones, > and that such names are always isolated triples, marked through > "quoted" quad identifiers (fusing position 4 and 5 of RDFn). > > Note: While this proposal requires a quad representation, it is not > necessarily restricted to TriG (but to N-quads and not N-triples). But > since "statements about statements" is not basic RDF 101, it should be > discussed. For provenance, this is related to named graphs, and should > be explained alongside them. For "qualification", It is the *last* > resort when you've got granular data but "run out of modelling > options"; usually in a production scenario. In schema.org, we've got > "impure" but pragmatic, triples-only options. In Wikidata, this is > more interesting. > > (For LPG usage, I've gotten the impression that semantics have a back > seat, and putting raw data into "something" is more common practice. > Not unlike some RDF usage in the wild; and that's fine. We just need > to ensure that it's hard to "shoot yourself in the foot" with what we > introduce.) > > > ## What About Opacity? > > Controlling opacity is left to a future semantics for datasets (as in > [8], also thought of e.g. in [12].). For now, it depends on specific > implementation options for the union default graph, and for what their > inference engines take into account. > > I think this is acceptable since the majority of collected use cases > and examples rely on a practical transparent interpretation of > triples, whether asserted or not. Also, since if we "get closer" to > named graphs, these options could work on asserted and "protected" > triple sets alike. > > > ## Future Convergence: Upgrading From Option C to B? > > Option C is upgradable to semantic datasets, if such will eventually be defined. > > * The "quoted fourth term" can be made equal to an explicit graph > semantics of that "wrapped" identifier. It is a syntactic marker that > could be interpreted as a semantic declaration. > > * With named annotations, we can also have named, isolated triple > sets. It can still fall back to reification, but would require a > relationship (e.g. `rdfx:triple`) from that named, isolated set to > each isolated triple. > > * There is a path towards graph terms as default names for graph > "token" structures, using RDF C14N on its triple set (a `graphId` > function along the lines of the above `tripleId` mapping function). > > > Thank you if you read this far! > > Best regards, > Niklas > > [1]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Nov/0028.html> > [2]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Nov/0026.html> > [3]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Nov/0032.html> > [4]: <https://lists.w3.org/Archives/Public/public-rdf-star/2020Dec/0062.html> > [5]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023May/0063.html> > [6]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Nov/0031.html> > [7]: <https://www.w3.org/TR/rdf11-concepts/#section-dataset> > [8]: <https://www.w3.org/TR/rdf11-datasets/#declaring> > [9]: <https://www.w3.org/TR/rdf11-datasets/#each-named-graph-defines-its-own-context> > [10]: <https://www.w3.org/TR/sparql11-service-description/#sd-uniondefaultgraph> > [11]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Oct/0038.html> > [12]: <https://gist.github.com/niklasl/c22994e664663b6730613ecc1321c418#opacity-as-conditional-entailment> >
Received on Thursday, 30 November 2023 16:36:18 UTC