Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?] from Jiří Procházka on 2020-07-16 (semantic-web@w3.org from July 2020)

From: Jiří Procházka <ojirio@gmail.com>
Date: Thu, 16 Jul 2020 19:22:07 +0200
To: Patrick J Hayes <phayes@ihmc.us>, Dan Brickley <danbri@danbri.org>
Cc: David Booth <david@dbooth.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <2935cc30-e34b-c400-0b0f-d745246b2296@gmail.com>
I'll join in on the brainstorming.
The use cases I am thinking of:

1. I have defined a structure (list, geo point, n-ary relation etc.) in
RDF, it uses URI references (names; not blank nodes). When merging data
from external sources and doing reasoning on them I don't want the
result to change the *definition* of the structure.

2. Same thing for less structured-value-types - for references to
people, events etc. (real things) - to not change their meaning, create
contradicting statements or merge multiple references together. Example:
external data adds a *defining* information to a person URI, such as
place of birth. I want to take special care if to merge this data, to
not accidentally change the URI reference to refer to multiple people.

It is a question of handling an operation on a believed/true graph and
an external/merged/inferred graph. I'm not sure SHACL is the tool for
the job, as AFAIK it just works on a single graph, while even a
perfectly logically consistent RDF graph can be an undesirable result:

{ :group1 :member :Alice } & { :group1 :member :Bob }
=>
{ :group1 :member :Alice, :Bob } # changed definition of :group1

One could say it is a problem of interacting with the open world. This
thing could be a part of a recommendation, tooling and vocabularies for
consuming external RDF data.

The ingestion operation would have to be able to consist of multiple
graph merges and inferences (sort of a DB transaction) as often graphs
overlap. Better said the operation would be on two datasets (set of graphs).

Way back I thought the answer would be to mark some RDF properties as
*defining* and have tooling to require special handling (or issue
warnings to users etc.) when *defining* relations on URIs, which already
have some, are merged. I'm not sure this would be a good approach.
Alternatively RDF properties could be marked as safe to merge
(non-defining).

The tooling would have to account for inherent disagreements about what
relations are defining, allowing selective application and different
handling (discarding or putting aside conflicting datasets/graphs,
issuing warning or going ahead and merging).

Cheers,
Jiri

On 7/16/20 5:49 PM, Patrick J Hayes wrote:
> 
> 
>> On Jul 16, 2020, at 10:29 AM, Dan Brickley <danbri@danbri.org> wrote:
>>
>> On Thu, 16 Jul 2020 at 15:43, Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>> I just noticed that Dan already said this in his email. Sorry, Dan, but +1.
>>
>> Let's talk it through.
>>
>> In normal RDF, the marketplace of structures you can use to make statements operate at a painfully fine-grained level, triple by triple you can draw upon types and properties that are already in use, as well as URIs standing for the things being described.
> 
> Well, one person’s pain, etc.. But yes. 
>>
>> In a "ShapedRDF" data format (and database?) there would be chunks that (could/should/must) correspond to shapes defined in SHACL/ShEx/etc., and which ...
>>
>>  -in the data format, a publisher would be either asserting the whole thing, or not; 
>> - in a database (e.g. accessed via SPARQL) something would ensure that it was either all there, or all gone
>> - for a parser, there would be checking to not generate triples for incomplete or ill-formed shape chunks?
> 
> Yes. The fact that SHACL is seen as a syntactic constraint, rather than just another description, is touted as a big advantage of SHACL over OWL. Sounds like just what we need here. I havnt checked the details, admittedly, but the advantages of using an existing recommendation outweigh any minor places of less than perfect fit. 
> 
> OWL/RDF parsers have been in this position, doing OWL syntax checks on chunks of RDF, for over a decade. And the RDF spec does say explicitly that a semantic extension can impose syntactic conditions on RDF graphs (and keeping list descriptions well-formed was exactly what I had in mind.)
> 
>>
>> Something like RDFStar or Property Graphs could allow the shapes to be explicitly indicated in concrete syntax. But maybe that isn't needed? Perhaps the shape commitments would be declared up front at the top of the file like namespaces or json-ld @context definitions?
> 
> Yes. I imagine it being rather the relationship of CSS to HTML, where you can put all the structural specification into a file and just reference it in the RDF file header somewhere. That allows people to publicly agree on formatting (just like they do now for datatyping, by using the XML schema URI) but also allows communities to develop and use new ideas without having to reconvene a W3C WG to develop a new standard. 
> 
> Also thinking out loud. 
> 
> Pat
>>  
>> Thinking out loud...
>>
>> Dan
>>
>>
>> Pat
>>
>>> On Jul 16, 2020, at 9:36 AM, Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>>>
>>> Sounds like a task for SHACL, no? Ignore the nonsense about comparing it to OWL: it is a macro language for describing /syntactic/ constraints on RDF graphs. Doesn’t that give you the required atomicity?.
>>>
>>> Pat
>>>
>>>> On Jul 16, 2020, at 9:18 AM, David Booth <david@dbooth.org <mailto:david@dbooth.org>> wrote:
>>>>
>>>> On 7/16/20 9:58 AM, Dan Brickley wrote:
>>>>> I believe the big appeal of putting it all into the zone we call "literals" is that you get a kind of atomicity; that chunk of data is either there, or not there; it is asserted, or not asserted. With a triples-based (description of a ) data structure you have to be constantly on your guard that every subset of the full graph pattern is at least sensible and harmless, even when subsetting these chunks is often confusing or misleading for data consumers. I can't help wondering whether notions of graph shapes [ . . . ] could be exploited to create an RDF-based data format which had atomicity at the level of entire shapes.
>>>>
>>>> +1
>>>>
>>>> IMO the ability to manipulate chunks of data atomically -- arrays, n-ary tuples and hierarchical objects -- is a key requirement in developing a higher-level form of RDF.   This will include the need to conveniently construct and deconstruct such chunks in rules or query languages.
>>>>
>>>> David Booth
>>>>
>>>
>>>
>>
>>
> 
>
Received on Thursday, 16 July 2020 17:22:29 UTC