A case for a datatype for identifying RDF graphs

Dear semwebers,


[I CC the RDF-star list but if you want to reply and are not a member of 
the RDF-star group, I think it's enough to reply to the semantic-web 
list, as the overlap is nearly total I guess]

TL;DR: I propose a datatype that has values that are RDF graphs [Note1]. 
Let us implement it and use it to talk about RDF graphs instead of 
proposing extensions of the RDF model that include special syntax and 
semantics for referring to RDF graphs. Then, extensions of RDF can be 
used for other purposes.


[Note1] In fact, the idea is not new: Ivan Herman drafted a document in 
2010 specifying a datatype for RDF graphs. See RDF Graph Literals and 
Named Graphs, W3C Editor's Draft 26 February 2010 (Ivan Herman, editor) 
https://www.w3.org/2009/07/NamedGraph.html


Long version:
============

Datatypes in RDF are used in order to refer to specific values that can 
be processed adequately by computer systems, as opposed to entities from 
the real world that can merely be referred to in such systems without 
access to the real thing. For instance, if one wants to refer to the 
integer "one", it is much better to use "1"^^xsd:integer, that any 
system understands as the number "one" -- less than "2"^^xsd:integer and 
more than "0"^^xsd:integer -- rather than using an identifier like 
http://km.aifb.kit.edu/projects/numbers/n1 that may refer to a real 
world entity to which the system has no understanding of.

If one wants to refer to a *specific* instant in time, it is preferable 
to identify it using xsd:dateTime rather than minting a URI for the 
instant and relating it to its year, month, day, etc.

If one wants to refer to a *specific* geometry on the Earth, it is 
better to use a geo:wktLiteral rather than introducing a URI and a 
vocabulary that would partially, and complexly define the geometry.

If one wants to refer to a specific RDF graph, that could be processed 
based on the datatype that identifies it as such, it would be more 
efficient and robust to use a literal rather than a URI that supposedly 
identifies it.

I see many discussions that attempts to address the problem of 
identifying specific RDF graphs (or specific triples) using extensions 
of RDF, such as named graphs, or RDF-star, when all we need is a 
datatype that is fully inside the scope of RDF syntax and semantics.

If we have a datatype, say rdf:turtle, whose lexical space is the set of 
UNICODE strings that encode Turtle documents, and whose value space is 
the set of RDF graphs (or perhaps, the set of equivalent classes of RDF 
graphs wrt the isomorphism relation), then we have a way to refer to RDF 
graphs (or the graphs isomorphic to it) without extending RDF at all.

For instance:

<antoine>  <published>  "ex:x a owl:Thing"^^rdf:turtle .

precisely refers to the RDF triple (ex:x, rdf:type, owl:Thing).
If we had this datatype, then it would not be necessary to define an 
extension of RDF for expressing statements about RDF graphs or triples.

As a consequence, extensions of RDF such as named graphs, RDF datasets, 
or RDF-star, could be used for other things.

As an example, named graphs could be used to refer to a possible world 
that a named RDF graph may describe:

<a> <says>
       ":a rdfs:subClassOf :b . :b rdfs:subClassOf :c ."^^rdf:turtle .
<ng> {
   :a rdfs:subClassOf :b .
   :b rdfs:subClassOf :c .
}

could entail:

<ng> {
   :a rdfs:subClassOf :c .
}

because <ng> would describe a world in which :a is really a subclass of :c .

However, it would not entail:

<a> <says>  ":a rdfs:subClassOf :c ."^^rdf:turtle .

because the initial graph tells that <a> said something with a graph of 
2 triples, not a graph with one triple.

While I'm aware that there are limitations to this approach, it seems to 
me that it covers a lot of use cases for the people who would like to 
identify RDF graphs.

As an example of the limitations I can see, there is the problem of 
blank nodes that could occur in both the rdf:turtle literal and 
externally to the literal. It would probably not be possible to ensure 
that bnodes inside the literal are the same as the ones outside, even if 
the same bnode identifiers were used. Yet, the use cases that require 
going beyond the limitations seem to be rather narrow.

Regardless of whether this covers all the use cases, wouldn't it be 
worthwhile investigating this direction, given that it does not require 
any extension of RDF at all?

One noticeable issue remains: people may want to talk about RDF graphs 
as the subject of their triples. We could envision an extension of RDF 
where literals are allowed as the subject of RDF triples. Would this be 
more crazy to implement than a whole new data model like RDF-star that 
allow a completely new type of entities in both object and subject 
positions?
(this is not an attack against RDF-star, which has some other values 
that can justify the effort, notably wrt querying, that I don't want to 
discuss here).

Thoughts? Criticisms? Acclaims?
--
Antoine Zimmermann
ISI - Institut Henri Fayol
École des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
https://www.emse.fr/~zimmermann/

Received on Friday, 7 May 2021 15:03:53 UTC