Re: WebID default serialization for WebID 2.x from Kingsley Idehen on 2022-01-22 (public-webid@w3.org from January 2022)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sat, 22 Jan 2022 14:24:30 -0500
To: public-webid@w3.org
Message-ID: <3e4adefe-8d1a-9fcf-0bac-3445615f2f5b@openlinksw.com>
On 1/22/22 3:12 AM, Sebastian Hellmann wrote:
>
> Hi Jonas,
>
> On 22.01.22 01:09, Jonas Smedegaard wrote:
>> Oh well.
>>
>> I understand your desire to simplify, I really do.
>>
>> Ruben Verborgh also wrote about that desire in his latest blog entry:
>> https://ruben.verborgh.org/blog/2021/12/23/reflections-of-knowledge/
>>
>> He links to a single paragraph by Dan Brickley and Libby Miller, about
>> that complexity issue:https://book.validatingrdf.com/bookHtml005.html
>>
>> Let me quote here the first two sentences of that paragraph:
>>
>>> People think RDF is a pain because it is complicated. The truth is
>>> even worse. RDF is painfully simplistic, but it allows you to work
>>> with real-world data and problems that are horribly complicated.
>
> I will try to phrase it in a diplomatic manner:  It kind of became a 
> recent trend to talk down Linked Data achievements. At the core, lies 
> the fact that maybe 50% of the datasets in the LOD-Cloud have become 
> stale/unreachable. Now the LOD-Cloud is pretty much manually curated 
> and resources are missing to properly keep it updated. So some people 
> start saying that it is going down. However, 50% is still 95% better 
> than other approaches to put data on the web. I see huge non-LD "data" 
> repos that do not have many download and if you count it, tey amount 
> to 10k downloads over 5 years or so. Basically, Linked Data already 
> achieved FAIR.
>
> Then some core people of the community are repeating perseverance 
> slogans (not meaning Kingsley here in particular, he is more 
> educational) but ignoring the fact that there are some problems that 
> we would need to address in order to make it fly. Not being able to 
> update the LOD Cloud properly (by automatic crawling) is one of the 
> things. Why is that? I would see an identity problem, i.e. what is the 
> identity of the bubbles, also the lack of WebID for people/orgs 
> publishing data, then also no discovery mechanism. Also the question 
> here: is it for the lack of infrastructure (nobody doing it) or the 
> lack of a feature/patch to the system.
>
> Still more things in RDF, that are not complicated, but painful. 
> Wiggle space of JSON-LD is one, basically this sentence by Aaron:
>
>> Without a well-defined context, however, the vagaries in 
>> compact/expanded/flattened JSON-LD serializations provide a high bar 
>> for data parsing, and you lose a lot of the advantages that JSON-LD 
>> has to offer in the first place. In fact, when given the choice 
>> between Turtle (or other RDF serializations) and JSON-LD without a 
>> structured context, I would always choose Turtle.
>
> That is an insight from somebody who has taken the effort of digging 
> through complex things to find a technical best practice how to work 
> with RDF in a simple manner. Simplicity doesn't come up-front, but has 
> to be discovered.
>
> Then there are may small-scale issues that we could avoid besides the 
> JSON issues:
>
> 1. upgrading tooling to xsd:string as given by RDF 1.1,
>
> 2. I don't remember correctly, but we encountered a ";" problem with 
> turtle cert:key [ <a> <b> "" ; ] ;   vs. cert:key [ <a> <b> "" ] ;
>
> 3. DBpedia's CTO Kontokostas, my PhD student,  created SHACL, because 
> we wanted to patch a particular gap in RDF. By using more SHACL to 
> define RDF a lot can be achieved. This issue is also related to the 
> current spec: https://www.w3.org/2005/Incubator/webid/spec/identity/ :
>
> a) foaf:img -> URI in plain literal, b) foaf:name with xsd:string, 
> language tag or without, c) datatypes for 
> <http://www.w3.org/ns/auth/cert#modulus> are defined as  :range 
> <http://www.w3.org/2001/XMLSchema#base64Binary>, 
> <http://www.w3.org/2001/XMLSchema#hexBinary> ;  which means they are 
> always both per inference, so in the actual WebID doc, you can put 
> both, one or none.
>
> 4. for https://github.com/dbpedia/databus / databus.dbpedia.org  we 
> implemented WebID at first, but e.g. on an Apple the keystore kept 
> popping up immediately, so people thought the website was password 
> protected. There is definitely no help or guidance or standard that 
> tells web site creators how to implement the WebID login properly, 
> which would help adoption and also influence browsers to make a 
> user-friendly certificate authentication as a core feature. We removed 
> it.
>
> 5. Regarding WebID, we tried to have people create this in their own 
> space, but it was a mess. We tried to fix this with SHACL 
> https://github.com/dbpedia/wall-of-fame/blob/master/src/main/resources/shacl/shapes.ttl 
> . In the end, Databus does the following now: Each account comes with 
> a webid by appending #me , i.e. https://databus.dbpedia.org/kurzum#me 
> The feature is not yet deployed online, but in the Github repo.
>
> Then we thought it would be good to provide some metadata for the 
> Databus itself and my developer asked me how to do it, e.g.
>
> <https://databus.dbpedia.org> a dataid:Databus ;   dct:hasVersion "2.0b" .
>
> Even I am struggling with this, i.e. is it https://databus.dbpedia.org 
> or https://databus.dbpedia.org/ or https://databus.dbpedia.org#this , 
> https://databus.dbpedia.org/#this , or 303 to 
> https://databus.dbpedia.org/webid.ttl#this ? Or put it into 
> .well-known or robots.txt ?
>
> My main point here is: It could be simple and if you have a lot of 
> experience it might become simpler. Beginners are struggling with a 
> plethora of hard micro decisions. This could be avoided by 1. tackling 
> technical details e.g. SHACL, context, providing an official validator 
> and 2. maybe not mandating, but giving one simple way that can be 
> adapted without taking micro decisions.
>
> -- Sebastian
>

Hi Sebastian,

Thanks for the detailed breakdown.

Simplicity is the quest, and history has taught me that "deceptively 
simple" is how it is achieved.

Anything of importance to a note-taker (a data curator) should be named 
unambiguously using a URL + a "#" indexical. It works spectacularly well 
for the problem at hand.

Example, in regards to your problem using RDF-Turtle Notation.

## Turtle Start ##

# Note I can replicate this using JSON or JSON-LD too!

@prefix databus: <https://databus.dbpedia.org#> .
@prefix databus-github: <https://github.com/dbpedia/databus#> .
@prefix dataid: <http://dataid.dbpedia.org/ns/core#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <#> .

databus:this
a dataid:Databus ;
owl:sameAs databus-github:this ;
rdfs:label "Databus" ;
dct:hasVersion "2.0b" .

## Turtle End ##

This approach provides a baseline for transforming to a variety of 
formats on the part of data publisher or consumer.

The statement above is best understood with tools at our disposal which 
is why OpenLink created the Structured Data Sniffer (OSDS) 
<https://chrome.google.com/webstore/detail/openlink-structured-data/egdaiaihbdoiibopledjahjaihbmjhdj> 
-- a browser extension that handles translation to and from other 
structured data representation notations and formats.

You can even read this email using a browser that includes OSDS to see 
the full effect of what I mean.

Here's a thread worth digesting, in light of my comments above:

https://twitter.com/kidehen/status/1481637744099082241

That's a real life example involving an RDF skeptic I (the author of the 
Conzept Faceted Browser) first met some 14 years ago. He, like many 
others, encountered variations of the problems outlined by Sebastian 
while trying to make use of RDF deployed using Linked Data principles.

At the end of my conversation with the author of conzept, he realized 
the following:

1. A "#" indexical gave him an integration point into the LOD Cloud 
without writing a single line of code

2. He could rework his RDFa to reflect what I demonstrated to him in 
Turtle, courtesy of OSDS 
<https://twitter.com/kidehen/status/1483858451457585157#this>.

Here's what didn't happen:

1. A debate about RDFa vs Turtle

2. A debate about how to denote an Entity unambiguously

He just got it.

What's different now? There are tools that enable simplification of the 
education and learning process.

What I've learned personally over the last 15 years re RDF and Linked 
Data principles are as follows:

1. Tools shortage was a problem

2. Tool building didn't happen because of confusion swirling around RDF

What I am doing with what I learned?

Encouraging OpenLink and others to build more tools that help folks 
cross the chasm of confusion regarding Entity Relationship Graphs, 
Entity Relationship Type Semantics, and Structured Data Representation 
-- in regards to RDF and deployment using Linked Data Principles.

NetID is just a reflection of many of these lessons and the way we are 
moving forward :)


-- 
Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Home Page:http://www.openlinksw.com
Community Support:https://community.openlinksw.com
Weblogs (Blogs):
Company Blog:https://medium.com/openlink-software-blog
Virtuoso Blog:https://medium.com/virtuoso-blog
Data Access Drivers Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

Personal Weblogs (Blogs):
Medium Blog:https://medium.com/@kidehen
Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/
               http://kidehen.blogspot.com

Profile Pages:
Pinterest:https://www.pinterest.com/kidehen/
Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter:https://twitter.com/kidehen
Google+:https://plus.google.com/+KingsleyIdehen/about
LinkedIn:http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
         :http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Saturday, 22 January 2022 19:25:51 UTC