- From: Andy Seaborne <andy@apache.org>
- Date: Thu, 25 Jan 2024 13:42:08 +0000
- To: RDF-star Working Group <public-rdf-star-wg@w3.org>
On 25/01/2024 12:27, Thomas Lörtsch wrote:
>
>
>> On 25. Jan 2024, at 12:22, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>> On 1/25/24 06:08, Andy Seaborne wrote:
>>>> But what does it *mean*? Optimizations should only be applied after we know that it means what we want it to mean.
>>> Agreed.
>>> We can start with our goals. "bloat" has been used in two senses : "visual bloat" and "size bloat".
>
> You’re forgetting "term bloat"…
>
>>> Is the WG addressing the size bloat issue?
>
> IMO it can’t be addressed in N-Triples, as N-Triples is per definition a strictly triple-based serialization, with pretty atomic terms
A bit more complicated than that, especially for annotation usage.
Both in-memory and persistent storage system largely share RDF terms
that occur more than once in a graph, either as pointers or "node ids"
which is at most 16 bytes, often 8. The term is not the whole of the
strings and "rdf:subject" does not need to be repeated use of
<http://www.w3.org/1999/02/22-rdf-syntax-ns#subject>.
The new term can be about the size of whatever a triple is which, with
existing approaches of term dictionaries and compression, is quite small.
> (language tags to literals being the acceptable exception). It’s whole purpose is to ease processing by eliminating shortcuts. Everything you add to that - especially new term types that combine already defined atomic term types into more complex term types, e.g. triple terms - breaks this simplicity and straightforwardness.
>
> B.t.w. "streaming" is an argument that has been brought forward a lot. Can you point me to any halfway concise treatise of the problems and practices of streaming RDF data? I’d like to understand how solving problems with reification by means of a triple term would relate to other issues. Would it be a decisive breakthrough, or more like a drop in the bucket?
> Because my hunch is that it’s rather the latter. And where does it end? What about list terms? What about CBD terms? Or even graph terms?
>
>
> I’ve been peaking into your draft proposal that you mentioned to Felix the other day, at
> https://github.com/afs/rdf-star-notes/blob/main/reif-atoms.md
> You give a list of 7 problems with RDF reification. Some of them (problems 4, 5 and 6) would be handled by a notion of wellformedness.
And it is does not need to be checked across the whole graph.
It works with RDF merge.
> Problem 1 would be solved by the proposed annotation syntax. Problems 3 and 7, especially blank nodes split over multiple graphs when breaking a big graph into files of a more manageable size, are not specific to reification but a general problem.
Agreed - it is mentioned because the "Turtle syntax only" approach
encourages blank nodes - which is probably a good choice for annotations.
7 is not just about blank nodes - it's dealing with finding groups of
rdf:subject/rdf:predicate/rdf:object which is relevant for the
optimization discussions.
3 is the visual aspect within one large graph document.
N-Triples does not preserve proximity of input. It very much depends on
the indexing.
> That leaves problem 2, verbosity in N-Triples, and that just comes with the terrain. There sometimes are more or less verbose ways to represent a complex type in straight triples - RDF Collections are much worse than RDF Containers - but RDF standard reification is not too bad in that respect.
> You mention a reification atom <<(s p o)>> as a possible addition to N-Triples. That seems like a slight variation of N-Triples-Star to me,
Sort of - it is similar at the abstract syntax level, it's not the same
semantics. or if it is, that's by chance or keeping close to reification.
There are several implementations of RDF-star-CG so it does suggest that
a new term has some acceptance.
> and I’m not fundamentally opposed if it helps and doesn’t rely on a new term type. My question is: does it really help? And would you also add list atoms, CBD atoms, graph atoms?
Would I *like* list terms - yes! - but that's out of charter :-(
In the abstract syntax, the approach does leave open (does not block)
graph terms. I think they bring addition challenges around entailment
and "graph reification" that will take a long time to explore.
Andy
>
> Thomas
>
>
>>> Optimization is not just storage space (and the choices there change over the space of a few years at the moment) - it's also preserving the outcome of queries.
>>> What does SELECT (count(*) AS ?C) { ?s ?p ?o } return?
>>> or any query with a ?p.
>
>>> Andy
>>>>
>>>> I just realized that saying *at least* makes an implicit assumption about different terms in object position refering to the same entity in the realm of interpretation, i.e. a kind of owl:sameAs-ness. That may be way beyond what we want fix, and insofar saying *exactly* might be the safer and more restrained definition.
>>>> Still it introduces a hint of opacity that I’m not happy with.
>>>>
>>>> Thomas
>>>>
>>>>> peter
>>>>>
>>>>
>>>>
>>
>
Received on Thursday, 25 January 2024 13:42:17 UTC