Re: A case for a datatype for identifying RDF graphs from Pierre-Antoine Champin on 2021-05-19 (semantic-web@w3.org from May 2021)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Wed, 19 May 2021 18:53:19 +0200
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>
Message-ID: <02de704e-a84c-0af2-dbee-5550a1039365@ercim.eu>
Hi Antoine,

I have been meaning to reply your mail for some time. Sorry for the delay.

On 07/05/2021 17:03, Antoine Zimmermann wrote:
> Dear semwebers,
>
>
> [I CC the RDF-star list but if you want to reply and are not a member 
> of the RDF-star group, I think it's enough to reply to the 
> semantic-web list, as the overlap is nearly total I guess]
>
> TL;DR: I propose a datatype that has values that are RDF graphs 
> [Note1]. Let us implement it and use it to talk about RDF graphs 
> instead of proposing extensions of the RDF model that include special 
> syntax and semantics for referring to RDF graphs. Then, extensions of 
> RDF can be used for other purposes.
TL;DT: I generally like this idea -- but I also have a number of 
remarks, caveats, comments below...
>
>
> [Note1] In fact, the idea is not new: Ivan Herman drafted a document 
> in 2010 specifying a datatype for RDF graphs. See RDF Graph Literals 
> and Named Graphs, W3C Editor's Draft 26 February 2010 (Ivan Herman, 
> editor) https://www.w3.org/2009/07/NamedGraph.html

Reading the beginning of your email, I was going to try and dig this 
out, till I realized you had already ;-)

>
>
> Long version:
> ============
>
> Datatypes in RDF are used in order to refer to specific values that 
> can be processed adequately by computer systems, as opposed to 
> entities from the real world that can merely be referred to in such 
> systems without access to the real thing. For instance, if one wants 
> to refer to the integer "one", it is much better to use 
> "1"^^xsd:integer, that any system understands as the number "one" -- 
> less than "2"^^xsd:integer and more than "0"^^xsd:integer -- rather 
> than using an identifier like 
> http://km.aifb.kit.edu/projects/numbers/n1 that may refer to a real 
> world entity to which the system has no understanding of.
>
> If one wants to refer to a *specific* instant in time, it is 
> preferable to identify it using xsd:dateTime rather than minting a URI 
> for the instant and relating it to its year, month, day, etc.
>
> If one wants to refer to a *specific* geometry on the Earth, it is 
> better to use a geo:wktLiteral rather than introducing a URI and a 
> vocabulary that would partially, and complexly define the geometry.

Theoretically, literals are cleaner, granted.

In practice, I see this more as a trade-off, and the best answer depends 
on your use-cases. For dates, it is sometimes useful to be able to 
access their "parts" by simply following an arc.

(this is where N3 shines, by the way, but that's another topic)

>
> If one wants to refer to a specific RDF graph, that could be processed 
> based on the datatype that identifies it as such, it would be more 
> efficient and robust to use a literal rather than a URI that 
> supposedly identifies it.
>
> I see many discussions that attempts to address the problem of 
> identifying specific RDF graphs (or specific triples) using extensions 
> of RDF, such as named graphs, or RDF-star, when all we need is a 
> datatype that is fully inside the scope of RDF syntax and semantics.

Again, theoretically you are right; however... the specifics of that 
datatype would still have to be built into RDF implementations...

RDF datasets, on the other hand, are already largely implemented. (and 
so is RDF-star, to some extent :->)

>
> If we have a datatype, say rdf:turtle, whose lexical space is the set 
> of UNICODE strings that encode Turtle documents, and whose value space 
> is the set of RDF graphs (or perhaps, the set of equivalent classes of 
> RDF graphs wrt the isomorphism relation), then we have a way to refer 
> to RDF graphs (or the graphs isomorphic to it) without extending RDF 
> at all.
>
> For instance:
>
> <antoine>  <published>  "ex:x a owl:Thing"^^rdf:turtle .
>
> precisely refers to the RDF triple (ex:x, rdf:type, owl:Thing).
> If we had this datatype, then it would not be necessary to define an 
> extension of RDF for expressing statements about RDF graphs or triples.
>
> As a consequence, extensions of RDF such as named graphs, RDF 
> datasets, or RDF-star, could be used for other things.
>
> As an example, named graphs could be used to refer to a possible world 
> that a named RDF graph may describe:
>
> <a> <says>
>       ":a rdfs:subClassOf :b . :b rdfs:subClassOf :c ."^^rdf:turtle .
> <ng> {
>   :a rdfs:subClassOf :b .
>   :b rdfs:subClassOf :c .
> }
>
> could entail:
>
> <ng> {
>   :a rdfs:subClassOf :c .
> }
>
> because <ng> would describe a world in which :a is really a subclass 
> of :c .
>
> However, it would not entail:
>
> <a> <says>  ":a rdfs:subClassOf :c ."^^rdf:turtle .
>
> because the initial graph tells that <a> said something with a graph 
> of 2 triples, not a graph with one triple.
> While I'm aware that there are limitations to this approach, it seems 
> to me that it covers a lot of use cases for the people who would like 
> to identify RDF graphs.
>
> As an example of the limitations I can see, there is the problem of 
> blank nodes that could occur in both the rdf:turtle literal and 
> externally to the literal. It would probably not be possible to ensure 
> that bnodes inside the literal are the same as the ones outside, even 
> if the same bnode identifiers were used. Yet, the use cases that 
> require going beyond the limitations seem to be rather narrow.
> Regardless of whether this covers all the use cases, wouldn't it be 
> worthwhile investigating this direction, given that it does not 
> require any extension of RDF at all?
>
> One noticeable issue remains: people may want to talk about RDF graphs 
> as the subject of their triples. We could envision an extension of RDF 
> where literals are allowed as the subject of RDF triples. 
+1, that would be awesome (did I mention N3 already?... :-P)
However...
> Would this be more crazy to implement than a whole new data model like 
> RDF-star that allow a completely new type of entities in both object 
> and subject positions?
... I don't know if one is more crazy than the other. But I can only 
acknowledge that RDF-star got much more traction than literals as 
subjects. Go figure...
> (this is not an attack against RDF-star, which has some other values 
> that can justify the effort, notably wrt querying, that I don't want 
> to discuss here).

duly noted :-)

   best

>
> Thoughts? Criticisms? Acclaims?
> -- 
> Antoine Zimmermann
> ISI - Institut Henri Fayol
> École des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 66 03
> https://www.emse.fr/~zimmermann/
>
Received on Wednesday, 19 May 2021 16:53:25 UTC