- From: Andy Seaborne <andy@apache.org>
- Date: Thu, 25 Jan 2024 13:42:08 +0000
- To: RDF-star Working Group <public-rdf-star-wg@w3.org>
On 25/01/2024 12:27, Thomas Lörtsch wrote: > > >> On 25. Jan 2024, at 12:22, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote: >> On 1/25/24 06:08, Andy Seaborne wrote: >>>> But what does it *mean*? Optimizations should only be applied after we know that it means what we want it to mean. >>> Agreed. >>> We can start with our goals. "bloat" has been used in two senses : "visual bloat" and "size bloat". > > You’re forgetting "term bloat"… > >>> Is the WG addressing the size bloat issue? > > IMO it can’t be addressed in N-Triples, as N-Triples is per definition a strictly triple-based serialization, with pretty atomic terms A bit more complicated than that, especially for annotation usage. Both in-memory and persistent storage system largely share RDF terms that occur more than once in a graph, either as pointers or "node ids" which is at most 16 bytes, often 8. The term is not the whole of the strings and "rdf:subject" does not need to be repeated use of <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject>. The new term can be about the size of whatever a triple is which, with existing approaches of term dictionaries and compression, is quite small. > (language tags to literals being the acceptable exception). It’s whole purpose is to ease processing by eliminating shortcuts. Everything you add to that - especially new term types that combine already defined atomic term types into more complex term types, e.g. triple terms - breaks this simplicity and straightforwardness. > > B.t.w. "streaming" is an argument that has been brought forward a lot. Can you point me to any halfway concise treatise of the problems and practices of streaming RDF data? I’d like to understand how solving problems with reification by means of a triple term would relate to other issues. Would it be a decisive breakthrough, or more like a drop in the bucket? > Because my hunch is that it’s rather the latter. And where does it end? What about list terms? What about CBD terms? Or even graph terms? > > > I’ve been peaking into your draft proposal that you mentioned to Felix the other day, at > https://github.com/afs/rdf-star-notes/blob/main/reif-atoms.md > You give a list of 7 problems with RDF reification. Some of them (problems 4, 5 and 6) would be handled by a notion of wellformedness. And it is does not need to be checked across the whole graph. It works with RDF merge. > Problem 1 would be solved by the proposed annotation syntax. Problems 3 and 7, especially blank nodes split over multiple graphs when breaking a big graph into files of a more manageable size, are not specific to reification but a general problem. Agreed - it is mentioned because the "Turtle syntax only" approach encourages blank nodes - which is probably a good choice for annotations. 7 is not just about blank nodes - it's dealing with finding groups of rdf:subject/rdf:predicate/rdf:object which is relevant for the optimization discussions. 3 is the visual aspect within one large graph document. N-Triples does not preserve proximity of input. It very much depends on the indexing. > That leaves problem 2, verbosity in N-Triples, and that just comes with the terrain. There sometimes are more or less verbose ways to represent a complex type in straight triples - RDF Collections are much worse than RDF Containers - but RDF standard reification is not too bad in that respect. > You mention a reification atom <<(s p o)>> as a possible addition to N-Triples. That seems like a slight variation of N-Triples-Star to me, Sort of - it is similar at the abstract syntax level, it's not the same semantics. or if it is, that's by chance or keeping close to reification. There are several implementations of RDF-star-CG so it does suggest that a new term has some acceptance. > and I’m not fundamentally opposed if it helps and doesn’t rely on a new term type. My question is: does it really help? And would you also add list atoms, CBD atoms, graph atoms? Would I *like* list terms - yes! - but that's out of charter :-( In the abstract syntax, the approach does leave open (does not block) graph terms. I think they bring addition challenges around entailment and "graph reification" that will take a long time to explore. Andy > > Thomas > > >>> Optimization is not just storage space (and the choices there change over the space of a few years at the moment) - it's also preserving the outcome of queries. >>> What does SELECT (count(*) AS ?C) { ?s ?p ?o } return? >>> or any query with a ?p. > >>> Andy >>>> >>>> I just realized that saying *at least* makes an implicit assumption about different terms in object position refering to the same entity in the realm of interpretation, i.e. a kind of owl:sameAs-ness. That may be way beyond what we want fix, and insofar saying *exactly* might be the safer and more restrained definition. >>>> Still it introduces a hint of opacity that I’m not happy with. >>>> >>>> Thomas >>>> >>>>> peter >>>>> >>>> >>>> >> >
Received on Thursday, 25 January 2024 13:42:17 UTC