- From: Filip Kolarik <filip26@gmail.com>
- Date: Tue, 30 Sep 2025 14:14:10 +0200
- To: Pierre-Antoine Champin <pierre-antoine@w3.org>
- Cc: danbri@gmail.com, semantic-web@w3.org
- Message-ID: <CADRK2_OJaEVBd3_oJkmH+fv7N1uOOvhCUKT9hR=-SoVBke7-tw@mail.gmail.com>
On Tue, Sep 30, 2025 at 8:56 AM Pierre-Antoine Champin < pierre-antoine@w3.org> wrote: > But still I rest my case about *existing *datasets in the wild: > > * In the absence of such metadata makes datasets inherently ambiguous. > * People are actually embracing this ambiguity by using named graphs > anyway they see fit, and we should not prevent them. > > And no, the WG has no immediate plan to standardize how this kind of > metadata could be expressed, but any suggestion or incubation work in the > RDF-Dev Community Group would be welcome ;-) > Thank you for the inputs. I’ve shared a similar post elsewhere and the most common response was simply to cite the generic definitions. What I found most insightful here: * Graphs as semantic groupings; a way to group statements and attribute them with an identifier or other metadata. Triple terms provide similar functionality, but at a finer granularity. * Graphs/Datasets as processing units; This distinction might help to decide when to use graphs versus triple terms. RDF gives us great expressivity, but this comes at a cost: its generality and high-level definitions can easily overcomplicate processing and add complexity to reasoning and understanding. This seems somewhat at odds with the original vision of the Semantic Web, which is to make data integration and reasoning easier, not harder. There is unlikely to be a single “right” approach, but from what I see there are distinct categories of use cases that would benefit from going beyond the generic definition of a graph, toward clearer best practices and shared conventions. Some perspectives where the differences between graphs, named graphs, and triple terms matter might include: * Processing - Document-oriented: smaller, curated datasets, often self-contained. - Big-data: large, heterogeneous datasets where partitioning and provenance are critical. * Provenance and trust - Tracking the origin of statements (datasets from multiple contributors, trust boundaries, licensing). - Distinguishing between authoritative vs. third-party data. * Data management - Efficient partitioning and indexing for very large graphs. - Isolation of subsets of data for domain-specific reasoning or processing. * Interoperability - Metadata standards could help reduce ambiguity Would others here be interested in working together on a documenting of such practices (perhaps as a Community Group note)? I’d be glad to help contribute to that effort if there’s interest. Best, Filip >
Received on Tuesday, 30 September 2025 12:14:26 UTC