Re: What is a WebID? from Melvin Carvalho on 2023-11-10 (public-webid@w3.org from November 2023)

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Fri, 10 Nov 2023 22:10:09 +0100
To: Martynas Jusevičius <martynas@atomgraph.com>
Cc: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>, Kingsley Idehen <kidehen@openlinksw.com>, public-webid@w3.org
Message-ID: <CAKaEYhJR_95=fsVHdDzF5OyuTqYng7PMs4H4475vQQdaA-rtzg@mail.gmail.com>
pá 10. 11. 2023 v 17:52 odesílatel Martynas Jusevičius <
martynas@atomgraph.com> napsal:

> https://dbpedia.org/resource/Berlin#Turtle-Doc
>
> This makes no sense. Document URIs are inherently hash-less because
> servers do not see fragment identifiers.
>

Good read on the hash vs slash :

https://lists.w3.org/Archives/Public/www-tag/2002Mar/0121.html

It's more nuanced than people think.  Also consider all those years (10+?)
curl sent # to the server before realising it was a bug.


>
> On Fri, 10 Nov 2023 at 16.20, Sebastian Hellmann <
> hellmann@informatik.uni-leipzig.de> wrote:
>
>> Hi Kingsley,
>> On 11/10/23 14:33, Kingsley Idehen wrote:
>>
>>
>>
>> I have amendments:
>>
>> 1. we really should go for HTTPS URLs here. We can add a note that HTTP
>> URIs are the more general case, however, these are not meant here in a
>> goal-oriented manner. Ultimately, we can not securely authenticate a WebID
>> using HTTP, plus I can not think of a case where it would be useful to have
>> a URI that is not an URL.
>>
>>
>>
>> We SHOULD encourage the use of HTTPS, but not force it on users. Most
>> WebID's generated by way of SSEO are HTTPS based anyway, since Google has
>> signaled their HTTPS preference to the SEO community etc..
>>
>> Today, only older WebIDs are HTTP based.
>>
>> Whenever you want to authenticate with WebID you MUST NOT use HTTP and
>> you MUST use URLs. As I said, we can add a note and say the "older WebIDs"
>> were HTTP based and that it's conceptually fine.
>>
>>
>>
>>
>>
>> 2. I wouldn't be strict about the # and the Agent (for legacy reasons,
>> i.e. LD published as '/'). I think, it can be either:
>>
>>
>>
>> "#" usage is just an option, that carries low costs that's all.
>> Fundamentally, its "#" is you want to leverage resolution by way of
>> implicit indirection of "/" if you want to use explicit indirection via
>> content negotiation. Disambiguation is always the core objective.
>>
>>
>>
>> a) example.org/agent5 a Agent . example.org/agent5#doc a ProfileDoc
>>
>> b) example.org/agent5#agent a Agent . example.org/agent5 a ProfileDoc
>>
>> c) example.org/agent5#agent a Agent . example.org/agent5#doc a
>> ProfileDoc
>>
>> b and c would be clearer.
>>
>> 3. Non-information resources can resolve directly with 200 using #
>> entities. This would integrate well in REST APIs.  I can see cases where
>> you would want 303., so it should be acceptable to do content negotiation.
>>
>>
>>
>> It is so much easier to speak about these matters in terms of entities
>> and entity description documents. Entities are uniquely identifiable things
>> that comprise perceived structured represented in machine-computable form
>> using an entity relationship graph.
>>
>> These fundamental concepts date back to the beginning of computing i.e.,
>> we can't compute without this kind of baseline clarity.
>>
>> If name something using a "#" based HTTP URI the denotation->connotation
>> indirection just happens without any work. If circumstances lead to using
>> "/" then content negotiation is part of the cost inherited re
>> denotation->connotation indirection. There are no ways around these
>> fundamental matters -- when it comes to the matter of unambiguous entity
>> naming.
>>
>> Another analogy I used to use years ago is as follows:
>>
>> The projector provides a surface for perceiving what's projected. If that
>> distinction doesn't exist, how to do we perceive anything bar the projector
>> itself?
>>
>> Well, I am thinking more of a tablet than a projector.  I am also a big
>> fan of layered architectures and my opinion is that we should push
>> semantics to the uppermost layers.  I think it is a misconception that
>> machines can know semantics. At the end of the days they work best
>> translating strings into other strings.  I think we might fare better with
>> having a graph transport layer (ISO-OSI style). So URLs can be used to get
>> more graph resources via HTTP, then when you have the graph, you can treat
>> URIs as entities. I would consider this way more practical.
>>
>>
>>
>>
>>
>> 4. I am getting more an more skeptical about the "URI as names for
>> things". Was this really the best way of realizing the GGG? Would it make a
>> significant difference to say that "URLs as a tool to retrieve graph nodes
>> and graphs that describe entities"?  It would be more in line with the Web,
>> that also delivers docs about entities. Semantically, most people think
>> about data retrieval first and then interpret them as entities later.
>>
>>
>>
>> You can have a collection of documents comprising entities named using
>> indefinite pronouns (blank nodes), but the onus of disambiguation is then
>> pushed to apps, thereby handing everything off to silo vectors etc..
>>
>> Not saying blank nodes here.  Just saying that you use URIs to resolve to
>> more graph data, the interpret the URIs in the retrieved graph as entities.
>> The result is the same, but you can skip the content negotiation.
>>
>>
>> TimBL though a lot of this through eons ago, but getting it through en
>> masse has clearly been a big challenge.
>>
>> Maybe if we solve some things like HR-14 and the semantic web stack.
>>
>> My main question here is: What part of the web architecture breaks, if we
>> implement conneg free /# mixed URIs? I asked this to a lot of people in
>> different ways, but nobody can tell me.
>>
>> For this example, let's say DBpedia URIs were native https
>>
>> If I do `curl -H "Accept: text/turtle"
>> https://dbpedia.org/resource/Berlin `  and get a 200 OK Content-type:
>> text/turtle  , I don't see any need  to disambiguate anything. The graph
>> says that https://dbpedia.org/resource/Berlin is a dbo:City . So what would
>> actually break?  We can add a node "
>> <https://dbpedia.org/resource/Berlin%C2%A0andgeta200OKContent-type:text/turtle%C2%A0,Idon'tseeanyneed%C2%A0todisambiguateanything.Thegraphsaysthat%C2%A0https://dbpedia.org/resource/Berlinisadbo:City.Sowhatwouldactuallybreak?%C2%A0Wecanaddanode>
>> https://dbpedia.org/resource/Berlin#Turtle-Doc" if we ant to talk about
>> the data payload itself, if necessary.
>>
>> -- Sebastian
>>
>>
>>
>>
>>
>> 5. Using
>> https://www.openlinksw.com/data/pdf/Semantic_Web_and_LLM-based_Chat_Bot_Symbiosis.pdf#page=26
>> it would be possible to make a CSV/TSV subset spec.
>>
>> 6. Might be good to suggest some default strings to use after # , just as
>> a no-brainer suggestion for implementation, so people don't struggle
>> choosing between #me, #i, #this, etc. #organisation, #person, #agent,
>> #website.
>>
>>
>>
>> That's a great point! The challenge is getting the right audience to
>> understand the story being told. In my experience, I've found that the
>> story and the audience are typically out of sync. For instance, developers
>> just want to parse stuff and implement algorithms, while architects, on the
>> other hand, typically think more conceptually, lending themselves to
>> matters of abstraction.
>>
>> Kingsley
>>
>>
Received on Friday, 10 November 2023 21:10:27 UTC