Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]

Hi Aidan,

On 6/30/20 6:45 PM, Aidan Hogan wrote:
> I think that getting rid of blank nodes entirely is a reasonable 
> position to discuss. Assuming we have blank nodes, then the RDF 
> semantics makes sense to me: I think they should remain local and 
> existential. But it is another question whether or not they are worth it 
> in the first place. 

Agreed, and that's the position that I am seeking to advance -- not that 
they should be removed from the underlying semantic model, but that they 
should be removed from the *user* experience, by allowing the user to 
work at a higher level, without having to see or think about them.

> Note that I am a big fan of minimality. If we could 
> get away without blank nodes, and if things would be simpler without 
> them, then I would be all for it. My opinion is based on the suspicion 
> that things would be more complex without the *option* of using blank 
> nodes. But in the context of Linked Data, for example, their use is 
> discouraged, and many important datasets heed that advice. I think this 
> is a good balance: blank nodes are an option if you need them, but if 
> you don't like them and/or don't need them, don't use them.

That approach might work in a limited context, such as a single team. 
But when RDF is shared and reused by many others, it means that the RDF 
user would still be forced to deal with blank nodes used by others.

> 
> A third option that various people have worked on, including myself, is 
> to develop methods to skolemise blank nodes, converting them into IRIs 
> and assigning them consistent canonical labels. So if you don't want the 
> headache of dealing with blank nodes (as common in legacy data), there 
> is always the option of eliminating the blank nodes by skolemising as 
> part of a pre-processing step (though it would of course require an 
> additional dependency in the project to include the skolemisation code).

Yes, excellent work!  I think skolemizing may be useful as an underlying 
mechanism, largely hidden from users.

> 
>> In practical terms, this means adopting a new, higher level RDF-based 
>> syntax that allows RDF tooling to be reused as much as possible.
>>
>> A minimum contender would be Turtle/TriG without blank node labels, 
>> but if we are contemplating a new syntax then I personally think it 
>> would be worth making a few more changes at the same time, to make it 
>> even higher level and easier to use.  A number of ideas have been 
>> collected here, though somewhat haphazardly:
>> https://github.com/w3c/EasierRDF/issues
>>
>> But note that a new RDF-based syntax is only one part of the entire 
>> tool chain.  A SPARQL successor would also be needed, to support the 
>> new features and restrictions, and libraries would have to support 
>> them also.
> 
> In terms of higher level RDF-based syntaxes, my first thought is that 
> this would be Turtle or JSON-LD? You mention Turtle removing blank 
> nodes, but I don't immediately agree that it would make the syntax all 
> that much easier to understand (I would need to be convinced). 

The point is to completely eliminate blank nodes from the user 
experience, so that users never even need to learn about blank nodes or 
be puzzled a the relentless discussions that go on about their subtle 
semantics.  Eliminating blank node labels would be a necessary first 
step, but more would be needed to fully reach that goal.  For example, 
we still need easy ways for RDF authors to express multi-part objects, 
n-ary relations and arrays.  But we do NOT need blank node labels to do 
that.

> It would 
> also require removing shortcuts for lists, which creates other issues. 

Lists are another issue.  A new syntax should offer arrays, and they 
should be index-based like in common programming languages, not 
first/rest linked lists (which are *awful* for SPARQL).

> (Also most of the Semantic Web standards would need to be rewritten, 
> which is maybe more of an appeal to historical context or practical 
> concerns and thus should perhaps initially take a back-seat to what is 
> actually best as a guiding principle.)
> 
> I think though it would be interesting to look at a concrete proposal 
> along the lines you mention and compare it with the existing standards.

Yes!  But it's not a simple undertaking.  That's why I'm really hoping 
that some PhD students will take it on as their thesis projects.

>> I REALLY wish that some PhD students would take on this challenge: to 
>> design a higher-level successor to RDF, with a top-line goal of making 
>> it easy enough for AVERAGE developers (middle 33% of skill), who are 
>> new to it, to be consistently success.  Note to such PhD 
>> students/research: pay particular attention to Sean Palmer's 
>> insightful comments also:
>> https://github.com/w3c/EasierRDF/issues/68
>>
>> IMO blank nodes have been a significant factor in pushing RDF over the 
>> cognitive complexity threshold that average developers are willing to 
>> tolerate.  Given how rapidly other easier-to-use graph databases have 
>> become popular and have far overtaken RDF in market share, I think it 
>> is URGENT that we address the problem of making RDF easier for AVERAGE 
>> developers:
>> https://db-engines.com/en/ranking/graph+dbms
> 
> I don't think the comparison is all that simple. 

Agreed.  But it is still eye popping to see how much market share 
they've gained.

> RDF is a standard 
> format for data exchange (particularly on the Web). Graph databases are 
> systems with query languages for querying graphs. Regarding the adoption 
> (or "market share") of RDF, a better statistic might be: "[of 32 million 
> websites] approximately 6.3 million of these websites use Microdata, 5.1 
> million websites use JSON-LD, and 1 million websites make use of RDFa" 
> [1]. Regarding SPARQL more specifically, one might also mention the 
> millions of daily queries being processed on Wikidata [2].
> 
> That is not to say that we do not have something to learn from graph 
> databases like Neo4j. On the contrary, their documentation, demos, 
> installation, etc., are geared towards developers in a way that the RDF 
> et al. standards/primers have not traditionally been and in a way that 
> suggests a possible opportunity that we have been missing. 

Exactly.  And I think we (as a community) REALLY need to address this. 
If we don't, I think RDF will eventually die of neglect, as the 
capabilities of other technologies gradually expands to swallow RDF's 
use cases.

David Booth

Received on Wednesday, 1 July 2020 11:03:09 UTC