Re: Feedback on RDF Graphs: Conceptual Role and Practical Use Cases from Filip Kolarik on 2025-09-30 (semantic-web@w3.org from September 2025)

From: Filip Kolarik <filip26@gmail.com>
Date: Tue, 30 Sep 2025 14:14:10 +0200
To: Pierre-Antoine Champin <pierre-antoine@w3.org>
Cc: danbri@gmail.com, semantic-web@w3.org
Message-ID: <CADRK2_OJaEVBd3_oJkmH+fv7N1uOOvhCUKT9hR=-SoVBke7-tw@mail.gmail.com>

On Tue, Sep 30, 2025 at 8:56 AM Pierre-Antoine Champin <
pierre-antoine@w3.org> wrote:

> But still I rest my case about *existing *datasets in the wild:
>
> * In the absence of such metadata makes datasets inherently ambiguous.
> * People are actually embracing this ambiguity by using named graphs
> anyway they see fit, and we should not prevent them.
>
> And no, the WG has no immediate plan to standardize how this kind of
> metadata could be expressed, but any suggestion or incubation work in the
> RDF-Dev Community Group would be welcome ;-)
>

Thank you for the inputs. I’ve shared a similar post elsewhere and the most
common response was simply to cite the generic definitions. What I found
most insightful here:

  * Graphs as semantic groupings; a way to group statements and attribute
them with an identifier or other metadata. Triple terms provide similar
functionality, but at a finer granularity.
  * Graphs/Datasets as processing units; This distinction might help to
decide when to use graphs versus triple terms.

RDF gives us great expressivity, but this comes at a cost: its generality
and high-level definitions can easily overcomplicate processing and add
complexity to reasoning and understanding. This seems somewhat at odds with
the original vision of the Semantic Web, which is to make data integration
and reasoning easier, not harder.

There is unlikely to be a single “right” approach, but from what I see
there are distinct categories of use cases that would benefit from going
beyond the generic definition of a graph, toward clearer best practices and
shared conventions.

Some perspectives where the differences between graphs, named graphs, and
triple terms matter might include:

 * Processing
    - Document-oriented: smaller, curated datasets, often self-contained.
    - Big-data: large, heterogeneous datasets where partitioning and
provenance are critical.
 * Provenance and trust
    - Tracking the origin of statements (datasets from multiple
contributors, trust boundaries, licensing).
    - Distinguishing between authoritative vs. third-party data.
  * Data management
    - Efficient partitioning and indexing for very large graphs.
    - Isolation of subsets of data for domain-specific reasoning or
processing.
  * Interoperability
    - Metadata standards could help reduce ambiguity

Would others here be interested in working together on a documenting of
such practices (perhaps as a Community Group note)? I’d be glad to help
contribute to that effort if there’s interest.

Best,
Filip

>

Received on Tuesday, 30 September 2025 12:14:26 UTC