Re: WebID default serialization for WebID 2.x from Sebastian Hellmann on 2022-01-22 (public-webid@w3.org from January 2022)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Sat, 22 Jan 2022 09:12:08 +0100
To: Jonas Smedegaard <jonas@jones.dk>, Kingsley Idehen <kidehen@openlinksw.com>, public-webid@w3.org
Message-ID: <e3f0a022-1ed0-e7b3-5946-19540be9890c@informatik.uni-leipzig.de>
Hi Jonas,

On 22.01.22 01:09, Jonas Smedegaard wrote:
> Oh well.
>
> I understand your desire to simplify, I really do.
>
> Ruben Verborgh also wrote about that desire in his latest blog entry:
> https://ruben.verborgh.org/blog/2021/12/23/reflections-of-knowledge/
>
> He links to a single paragraph by Dan Brickley and Libby Miller, about
> that complexity issue:https://book.validatingrdf.com/bookHtml005.html
>
> Let me quote here the first two sentences of that paragraph:
>
>> People think RDF is a pain because it is complicated. The truth is
>> even worse. RDF is painfully simplistic, but it allows you to work
>> with real-world data and problems that are horribly complicated.

I will try to phrase it in a diplomatic manner:  It kind of became a 
recent trend to talk down Linked Data achievements. At the core, lies 
the fact that maybe 50% of the datasets in the LOD-Cloud have become 
stale/unreachable. Now the LOD-Cloud is pretty much manually curated and 
resources are missing to properly keep it updated. So some people start 
saying that it is going down. However, 50% is still 95% better than 
other approaches to put data on the web. I see huge non-LD "data" repos 
that do not have many download and if you count it, tey amount to 10k 
downloads over 5 years or so. Basically, Linked Data already achieved FAIR.

Then some core people of the community are repeating perseverance 
slogans (not meaning Kingsley here in particular, he is more 
educational) but ignoring the fact that there are some problems that we 
would need to address in order to make it fly. Not being able to update 
the LOD Cloud properly (by automatic crawling) is one of the things. Why 
is that? I would see an identity problem, i.e. what is the identity of 
the bubbles, also the lack of WebID for people/orgs publishing data, 
then also no discovery mechanism. Also the question here: is it for the 
lack of infrastructure (nobody doing it) or the lack of a feature/patch 
to the system.

Still more things in RDF, that are not complicated, but painful. Wiggle 
space of JSON-LD is one, basically this sentence by Aaron:

> Without a well-defined context, however, the vagaries in 
> compact/expanded/flattened JSON-LD serializations provide a high bar 
> for data parsing, and you lose a lot of the advantages that JSON-LD 
> has to offer in the first place. In fact, when given the choice 
> between Turtle (or other RDF serializations) and JSON-LD without a 
> structured context, I would always choose Turtle.

That is an insight from somebody who has taken the effort of digging 
through complex things to find a technical best practice how to work 
with RDF in a simple manner. Simplicity doesn't come up-front, but has 
to be discovered.

Then there are may small-scale issues that we could avoid besides the 
JSON issues:

1. upgrading tooling to xsd:string as given by RDF 1.1,

2. I don't remember correctly, but we encountered a ";" problem with 
turtle cert:key [ <a> <b> "" ; ] ;   vs. cert:key [ <a> <b> "" ] ;

3. DBpedia's CTO Kontokostas, my PhD student,  created SHACL, because we 
wanted to patch a particular gap in RDF. By using more SHACL to define 
RDF a lot can be achieved. This issue is also related to the current 
spec: https://www.w3.org/2005/Incubator/webid/spec/identity/ :

a) foaf:img -> URI in plain literal, b) foaf:name with xsd:string, 
language tag or without, c) datatypes for 
<http://www.w3.org/ns/auth/cert#modulus> are defined as :range 
<http://www.w3.org/2001/XMLSchema#base64Binary>, 
<http://www.w3.org/2001/XMLSchema#hexBinary> ;  which means they are 
always both per inference, so in the actual WebID doc, you can put both, 
one or none.

4. for https://github.com/dbpedia/databus / databus.dbpedia.org we 
implemented WebID at first, but e.g. on an Apple the keystore kept 
popping up immediately, so people thought the website was password 
protected. There is definitely no help or guidance or standard that 
tells web site creators how to implement the WebID login properly, which 
would help adoption and also influence browsers to make a user-friendly 
certificate authentication as a core feature. We removed it.

5. Regarding WebID, we tried to have people create this in their own 
space, but it was a mess. We tried to fix this with SHACL 
https://github.com/dbpedia/wall-of-fame/blob/master/src/main/resources/shacl/shapes.ttl 
. In the end, Databus does the following now: Each account comes with a 
webid by appending #me , i.e. https://databus.dbpedia.org/kurzum#me  The 
feature is not yet deployed online, but in the Github repo.

Then we thought it would be good to provide some metadata for the 
Databus itself and my developer asked me how to do it, e.g.

<https://databus.dbpedia.org> a dataid:Databus ; dct:hasVersion "2.0b" .

Even I am struggling with this, i.e. is it https://databus.dbpedia.org 
or https://databus.dbpedia.org/ or https://databus.dbpedia.org#this , 
https://databus.dbpedia.org/#this , or 303 to 
https://databus.dbpedia.org/webid.ttl#this ? Or put it into .well-known 
or robots.txt ?

My main point here is: It could be simple and if you have a lot of 
experience it might become simpler. Beginners are struggling with a 
plethora of hard micro decisions. This could be avoided by 1. tackling 
technical details e.g. SHACL, context, providing an official validator 
and 2. maybe not mandating, but giving one simple way that can be 
adapted without taking micro decisions.

-- Sebastian
Received on Saturday, 22 January 2022 08:12:29 UTC