Re: A case for a datatype for identifying RDF graphs from thomas lörtsch on 2021-05-25 (semantic-web@w3.org from May 2021)

From: thomas lörtsch <tl@rat.io>
Date: Tue, 25 May 2021 16:09:48 +0200
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: Semantic Web <semantic-web@w3.org>
Message-Id: <75769E51-9564-4ADF-AFEB-56B7D330233B@rat.io>
> On 7. May 2021, at 17:03, Antoine Zimmermann <antoine.zimmermann@emse.fr> wrote:
> 
> Dear semwebers,
> 
> 
> [I CC the RDF-star list but if you want to reply and are not a member of the RDF-star group, I think it's enough to reply to the semantic-web list, as the overlap is nearly total I guess]
> 
> TL;DR: I propose a datatype that has values that are RDF graphs [Note1]. Let us implement it and use it to talk about RDF graphs instead of proposing extensions of the RDF model that include special syntax and semantics for referring to RDF graphs. Then, extensions of RDF can be used for other purposes.
> 
> 
> [Note1] In fact, the idea is not new: Ivan Herman drafted a document in 2010 specifying a datatype for RDF graphs. See RDF Graph Literals and Named Graphs, W3C Editor's Draft 26 February 2010 (Ivan Herman, editor) https://www.w3.org/2009/07/NamedGraph.html
> 
> 
> Long version:
> ============
> 
> Datatypes in RDF are used in order to refer to specific values that can be processed adequately by computer systems, as opposed to entities from the real world that can merely be referred to in such systems without access to the real thing. For instance, if one wants to refer to the integer "one", it is much better to use "1"^^xsd:integer, that any system understands as the number "one" -- less than "2"^^xsd:integer and more than "0"^^xsd:integer -- rather than using an identifier like http://km.aifb.kit.edu/projects/numbers/n1 that may refer to a real world entity to which the system has no understanding of.
> 
> If one wants to refer to a *specific* instant in time, it is preferable to identify it using xsd:dateTime rather than minting a URI for the instant and relating it to its year, month, day, etc.
> 
> If one wants to refer to a *specific* geometry on the Earth, it is better to use a geo:wktLiteral rather than introducing a URI and a vocabulary that would partially, and complexly define the geometry.
> 
> If one wants to refer to a specific RDF graph, that could be processed based on the datatype that identifies it as such, it would be more efficient and robust to use a literal rather than a URI that supposedly identifies it.
> 
> I see many discussions that attempts to address the problem of identifying specific RDF graphs (or specific triples) using extensions of RDF, such as named graphs, or RDF-star, when all we need is a datatype that is fully inside the scope of RDF syntax and semantics.
> 
> If we have a datatype, say rdf:turtle, whose lexical space is the set of UNICODE strings that encode Turtle documents, and whose value space is the set of RDF graphs (or perhaps, the set of equivalent classes of RDF graphs wrt the isomorphism relation), then we have a way to refer to RDF graphs (or the graphs isomorphic to it) without extending RDF at all.
> 
> For instance:
> 
> <antoine>  <published>  "ex:x a owl:Thing"^^rdf:turtle .
> 
> precisely refers to the RDF triple (ex:x, rdf:type, owl:Thing).
> If we had this datatype, then it would not be necessary to define an extension of RDF for expressing statements about RDF graphs or triples.
> 
> As a consequence, extensions of RDF such as named graphs, RDF datasets, or RDF-star, could be used for other things.
> 
> As an example, named graphs could be used to refer to a possible world that a named RDF graph may describe:
> 
> <a> <says>
>      ":a rdfs:subClassOf :b . :b rdfs:subClassOf :c ."^^rdf:turtle .
> <ng> {
>  :a rdfs:subClassOf :b .
>  :b rdfs:subClassOf :c .
> }
> 
> could entail:
> 
> <ng> {
>  :a rdfs:subClassOf :c .
> }
> 
> because <ng> would describe a world in which :a is really a subclass of :c .
> 
> However, it would not entail:
> 
> <a> <says>  ":a rdfs:subClassOf :c ."^^rdf:turtle .
> 
> because the initial graph tells that <a> said something with a graph of 2 triples, not a graph with one triple.
> 
> While I'm aware that there are limitations to this approach, it seems to me that it covers a lot of use cases for the people who would like to identify RDF graphs.
> 
> As an example of the limitations I can see, there is the problem of blank nodes that could occur in both the rdf:turtle literal and externally to the literal. It would probably not be possible to ensure that bnodes inside the literal are the same as the ones outside, even if the same bnode identifiers were used. Yet, the use cases that require going beyond the limitations seem to be rather narrow.
> 
> Regardless of whether this covers all the use cases, wouldn't it be worthwhile investigating this direction, given that it does not require any extension of RDF at all?
> 
> One noticeable issue remains: people may want to talk about RDF graphs as the subject of their triples. We could envision an extension of RDF where literals are allowed as the subject of RDF triples. Would this be more crazy to implement than a whole new data model like RDF-star that allow a completely new type of entities in both object and subject positions?
> (this is not an attack against RDF-star, which has some other values that can justify the effort, notably wrt querying, that I don't want to discuss here).
> 
> Thoughts? Criticisms? Acclaims?

Acclaim! This seems like a very sensible way to cite triples and graphs for the purpose of documenting specific states in the life of some piece of RDF data, like e.g. after ingestion but before its deductive closure was computed and co-references were materialized, before versions were created, when it was signed etc. 

I also like the user interface aspect: literals are easily understood as not interpreted and not part of the "active" data and the quotes make that syntactically very obvious. 

I don’t see the limitation w.r.t. bnodes as problematic. I would however think that being able to properly query such RDF literals could be very helpful. How crazy would an effort to that effect be? 

Thomas


> --
> Antoine Zimmermann
> ISI - Institut Henri Fayol
> École des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 66 03
> https://www.emse.fr/~zimmermann/
>
Received on Tuesday, 25 May 2021 14:10:13 UTC