- From: Thomas Lörtsch <tl@rat.io>
- Date: Thu, 2 Oct 2025 23:45:13 +0200
- To: Pierre-Antoine Champin <pierre-antoine@w3.org>
- Cc: danbri@gmail.com, Filip Kolarik <filip26@gmail.com>, semantic-web@w3.org
- Message-Id: <5840A1E3-0B2C-48BC-BF1E-BDBAEF0765D6@rat.io>
> On 2. Oct 2025, at 19:44, Pierre-Antoine Champin <pierre-antoine@w3.org> wrote: > > Thomas, > > I'll focus on your argument that what I propose would double the count of triples, and is therefore not tractable. > > First, please note that I wrote > > > I'm not suggesting that the n-triples serialization should be preferred to n-quads+metadata, nor that implementations should give stop representing quads natively and move to embedded triples as above. I see this as a conceptual mapping (...) > > So this doubling of the number of triples would not have to reflect on serializations or implementation internals. It is for the purpose of reasoning only. (Note that any graph RDF-entails an infinite number of axiomatic triples <https://www.w3.org/TR/rdf11-semantics/#rdf_d_interpretations>, and yet people have implemented RDF efficiently). > > > > Second, and more importantly, the mapping that I proposed does not double the number of triples. Given the following TriG file: > > <x:g1> { <x:s1> <x:p1> <x:o1> }. > <x:g2> { <x:s2> <x:p2> <x:o2> }. > > it would map it to the following Turtle 1.2 file: > > <x:g1> <some:predicate> <<( <x:s1> <x:p1> <x:o1> )>>. > <x:g2> <some:predicate> <<( <x:s2> <x:p2> <x:o2> )>>. > > Why would you want to also assert the triples from the named graphs? > (I have a hunch, but I'd rather check before I respond further) > The WG decided that it is important to provide means to annotate both asserted triples and unasserted triples. How does your sketch provide those means, how does it disambiguate asserted from unasserted triples? I reckon that either the information if the triple term is also asserted is encoded in <someOrAnother:predicate> or that, following the current design of RDF 1.2, the triple is to be added to the graph (i.e. asserted). E.g., using 'rdf:reifies' and 'rdfx:states' to refer to unasserted and asserted statements, either <x:s1> <x:p1> <x:o1> . <x:g1> <rdf:reifies> <<( <x:s1> <x:p1> <x:o1> )>>. <x:g2> <rdf:reifies> <<( <x:s2> <x:p2> <x:o2> )>>. or <x:g1> <rdfx:states> <<( <x:s1> <x:p1> <x:o1> )>>. <x:g2> <rdf:reifies> <<( <x:s2> <x:p2> <x:o2> )>>. A third possibility would be to let a further statement add assertedness: <x:g1> <rdf:reifies> <<( <x:s1> <x:p1> <x:o1> )>>. <x:g1> a <rdfx:asserted> . <x:g2> <rdf:reifies> <<( <x:s2> <x:p2> <x:o2> )>>. That wouldn’t hurt the triple count, but it might be considered brittle. > pa > > > > On 02/10/2025 10:58, Thomas Lörtsch wrote: >> Hi, >> >> Dan’s question about billion triple graphs, and Pierre-Antoine’s answer illustrating the use of RDF 1.2 triple terms to model graphs point to a serious problem with the current proposal for RDF 1.2 triple terms. >> >> RDF 1.2 reified triple terms require one extra triple to create a reference node that can then be annotated. Because that is much better than the old-style reification quad, it makes the proposal look pretty good when annotating one or a few triple terms. However, it requires double the triples than named graphs, as each triple has to be asserted and seperately a reference to it has to be created via a second triple. That makes it look pretty bad when annotating graphs of some size! >> >> From that follows that the current WG proposal doesn’t really scale to graphs. However, it could if the reference mechanism didn’t reify but did entail the referenced triple term. The reification mechanism proposed by the WG is very concerned to also enable the corner use case of allowing to annotate triples without stating them, hence the 'rdf:reifies' property which references an abstract proposition without asserting it. Another property 'rdfs:states', that was also discussed [1], caters more directly for the general use case of annotating a statement in the graph, by entailing the triple (term) it references (a second property would be needed to reference a triple without asserting it). That proposal was turned down by the WG, but I think we should revisit that decision. >> >> The WG was tasked to concentrate on triples and it tries to be pragmatic and defer graphs to a later effort, albeit laying the foundation through multi-triple term reifications. However, the issue that Dan brought up illustrates how dangerous that approach is, because it is so easy to overlook possible complications down the road. The current proposal for RDF 1.2 is completely focused on encouraging the use of 'rdf:reifies', and supports it with massive amounts of syntactic sugar. It would be very hard to turn that clock back later and make people use another property instead. It would however also be very hard, quite impossible actually, to define when one should use the triple centric property and syntax and when a graph centric property and syntax should be used, since no use case is clearly one or the other and no clear rule can be established for what to do with just a very few triples (and what to do if they become more or less, or what to query for if one doesn’t know in advance). >> >> I think the WG would be well advised to tackle the graph perspective now, and adjust its approach to reification. Some more comments inline. >> >>> On 30. Sep 2025, at 08:56, Pierre-Antoine Champin <pierre-antoine@w3.org> <mailto:pierre-antoine@w3.org> wrote: >>> >>> >>> On 29/09/2025 16:54, Dan Brickley wrote: >>>> On Thu, 25 Sept 2025 at 08:20, Pierre-Antoine Champin <pierre-antoine@w3.org> <mailto:pierre-antoine@w3.org> wrote: >>>> Hi Filip, >>>> On 19/09/2025 22:50, Filip Kolarik wrote: >>>>> Dear Semantic Web Community, >>>>> I’m seeking feedback on the conceptual and practical aspects of RDF graphs. >>>>> >>>>> In RDF 1.2, an RDF graph is defined as: "An RDF graph is the conjunction (logical AND) of all the claims made by its asserted triples." This definition captures the logical aggregation of triples, but it leaves open questions about how graphs are used in practice. >>>> Indeed. RDF Semantics is only defined for a given graph. How you construct that graph (e.g. by picking and aggregating different RDF resources based on your own criteria) is out of scope of the specification -- even though that's also an bunch of interesting questions :) >>>>> I would appreciate the community’s insights on questions such as: >>>>> * How do you interpret the role of graphs? >>>>> * Are graphs primarily conceptual constructs to organize triples, or are they treated as concrete, addressable units in practice? >>>>> * Do you see named graphs as a way to scope statements, manage provenance, or isolate data for processing, while the “default graph” serves a different purpose? >>>> To be clear: named graphs and datasets are defined in RDF's abstract syntax, but are not covered by RDF Semantics. The reason is that, back in 2014 when RDF 1.1 was specified, datasets were already largely deployed, and used in many different ways (including the ones you list above). The working group at the time considered that it could not decide on a specific semantics for datasets and named graphs without breaking many people's implementations... That's the reason of this status quo. >>>> >>>> It doesn't seem a giant problem. We probably have enough experience now to characterise 2, 3 or however many common patterns for using named graphs in RDF applications and platforms. >>> Indeed, and that was also published by the RDF 1.1 WG (although non normatively): https://www.w3.org/TR/rdf11-datasets/ >> See there especially section 4. on "Declaring the intended semantics": >> >> "The RDF Working Group did not define a formal semantics for a multiple graph data model because none of the semantics presented before could obtained consensus. Choosing one or another of the propositions before would have gone against some deployed implementations. Therefore, the Working Group discussed the possibility to define several semantics, among which an implementation could choose, and provide the means to declare which semantics is adopted. >> This was not retained eventually, because of the lack of experience, so there is no definite option for this." >> >> As Dan suggests, we do have a lot of experience now. >> >>>> Metadata about the way a particular repository manages and names its graphs should be fairly straightforward to describe in RDF. Any standardization could be funneled into that kind of descriptive role. >>> The WG has not (yet?) explored that path, but I agree. >> >> IMO we keep piling too narrowly focused solutions on each other, each with different syntax and semantics, in effect just adding to the mess. Defering the graph issue to a later standardization effort might result in a solution competing not only with named graphs but also with reified triple terms. >> >> Some discussions that bogged down the RDF 1.1 WG had to be repeated anyway, like those about referential transparency and how to handle the type/token distinction, and the conclusions found can very well also inform the work on graphs. >> >>>> So yes, you can use named graphs for all of these things, just remember that this will not be broadly interoperable. In other words, if you send your dataset to someone else, or if you make it available via a SPARQL endpoint, you will need to provide additional off-band knowledge explaining what the (custom) semantics of your named graphs is. This may not be an issue in some cases, but in it may be in others. >>>> With RDF 1.2's triple terms, on the other hand, we have a way to address all these use cases explicitly in a single RDF graph: you can describe triple terms (or sets thereof) with dedicated vocabularies (for provenance, or confidence, etc.), and have this knowledge included in your RDF graph, and available for reasoning. >>>> It does not mean that named graphs will disappear -- most systems using them today will probably continue to do so if that works for them. But triple terms provide an alternative design options for new systems (or for migrating some old ones). >>>> >>>> triple-terms sound like they address usecases at the level of a particular triple, or perhaps a small bundle of related triples. Named graphs can operate usefully with graphs populated by millions or billions of triples. Is it realistic to use triple terms for the latter too? >>> I believe it is. Let me explain (with my W3C hat still off -- this does not represent the WG's position, only my own ideas): >>> RDF 1.2 semantics could be used as a foundation for assigning a precise semantics to a dataset if we also have metadata clarifying the relationship between graph names and graphs. Then any quad >>> S P O G . # in n-quads >>> could be seen as having the same semantics as >>> G X <<( S P O )>> . # in n-triples 1.2 >>> where X is a predicate depending on the metadata associated to the dataset. >>> >>> I'm not suggesting that the n-triples serialization should be preferred to n-quads+metadata, nor that implementations should give stop representing quads natively and move to embedded triples as above. I see this as a conceptual mapping that would allow us to reason with these datasets that have the appropriate metadata. >> As explained above, using 'rdf:reifies' doubles the triple count, using another property risks to double the modelling and querying complexity. >> >>> But still I rest my case about existing datasets in the wild: >>> * In the absence of such metadata makes datasets inherently ambiguous. >> The solution to this problem is to define a respective vocabulary. >> >>> * People are actually embracing this ambiguity by using named graphs anyway they see fit, and we should not prevent them. >> The fear of "preventing" something is unfounded. Trivially, nobody can be required to use such a vocabulary because of all the named grahs that already exist. Consequently it could only be optional, improving the potential to share semantics, but not preventing anything. Whoever "embraces" undefined semantics is free to continue to do so. >> A mapping from "triple term graphs" to named graphs, using some property 'X' from a tbd vocabulary, could however encourage uptake. >> >>> And no, the WG has no immediate plan to standardize how this kind of metadata could be expressed, but any suggestion or incubation work in the RDF-Dev Community Group would be welcome ;-) >> But such a future CG has to deal with what the RDF 1.2 WG defines, only adding to the complexity of the already quite complex task. The RDF 1.2 WG proposal doesn’t make sure that triple terms can be successfully extended to graph terms. It provides no mappings of triple terms to RDF 1.0/1.1 reification and named graphs. Syntactic sugar in Turtle 1.2 uses curly braces '{…}' in Turtle 1.2 for something else than graph data, as conventions would suggest. The discussion wrt. 'rdfs:states' above shows how the triple centric mechanism developed by the WG could be quite unsuitable for graphs. And what then: start over again, with yet another syntax, yet another semantics, yet another mapping, yet another modelling choice to make and to cater for in queries? >> >> >>>> Dan >>>> >>>> pa >>>> PS: this is only my personal position on the subject; this is not an official statement from the Working Group >> >> Thomas >> >> (also not talking for the WG) >> >> >> [1] https://github.com/w3c/rdf-star-wg/issues/128 >> >> >> >>>>> * How do you decide when to create separate graphs versus keeping data in a single graph? >>>>> * In your experience, does the choice of graph boundaries affect reasoning, querying, or data integration in practical applications? For instance, do you treat multiple graphs as separate units, or are there scenarios where it’s helpful to merge graphs and process a subject’s properties across them? >>>>> >>>>> Any references, examples, or experiences you can share would be extremely valuable in understanding the balance between the conceptual model and its practical applications. >>>>> >>>>> Thank you for your time and expertise. >>>>> >>>>> Best regards, >>>>> Filip >>>>> https://www.linkedin.com/in/filipkolarik/ >>
Received on Thursday, 2 October 2025 21:45:24 UTC