Re: W3C position on URIs http:// vs. https:// from Chris Mungall on 2023-06-14 (semantic-web@w3.org from June 2023)

From: Chris Mungall <cjmungall@lbl.gov>
Date: Wed, 14 Jun 2023 07:27:39 -0700
To: Pierre-Antoine Champin <pierre-antoine@w3.org>
Cc: "Patrick J. Hayes" <phayes@ihmc.org>, Melvin Carvalho <melvincarvalho@gmail.com>, "Hubauer, Thomas" <thomas.hubauer@siemens.com>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <CAN9AiftzgzwN1D83Hx0L-U+KV+Rtu5YKa8TdyuoviDi=hdyEBw@mail.gmail.com>
On Wed, Jun 14, 2023 at 6:28 AM Pierre-Antoine Champin <
pierre-antoine@w3.org> wrote:

>
> On 14/06/2023 00:53, Chris Mungall wrote:
>
> Hi Pat!
>
> While this could work in principle, in practice there are likely millions
> of lines of code like this:
>
> >>> if pred == "http://www.w3.org/2004/02/skos/core#altLabel":
> >>>   ...
>
> This is not really the issue, I believe. My (mis?)reading of Pat's
> suggestion is that the https transparency should be implemented whenever
> these IRIs are used as URLs. I.e. at the "linked data" level, not the "RDF
> level".
>
> In other words, all RDF files, RDF database, and code dealing with them,
> should use the <http://www.w3.org/2004/02/skos/core#altLabel>
> <http://www.w3.org/2004/02/skos/core#altLabel> (no "s"). That's the
> identifier of the "alternative label" property in Skos, we should not
> change it.
> However, any code that wishes to dereference this identifier to get more
> info about what it means, could (should?) be updated to automatically
> replace the http: at the beginning by https:. And fallback to http:// it
> the former attempt fails.
>
Oh I see, well that's relatively straightforward, but it's a non-use case
for us in OBO. I'm not aware of anyone ever writing code to obtain more
information about an entity by dereferencing its URI, although we originally
set up a lot of infrastructure <https://pubmed.ncbi.nlm.nih.gov/27733503/>
to do this, this follow your nose thing has always been fantasy in the
biomedical linked data world, and I suspect other domains too. I don't
really get what kind of code would be able to do anything meaningful with
what it gets by grabbing the turtle for skos:altLabel.

In fact most biomedical ontology class PURLs resolve only to HTML:

curl -L -H "Accept: text/turtle"  http://purl.obolibrary.org/obo/CL_0000540

=> html

Only humans care about the links. Machines use the whole ontology.

  pa
>
>
> or
>
> >>> if pred == SKOS.altLabel:
> >>>    ...
>
> That would need to be rewritten to be s-transparent. Perhaps not Y2K code
> rewrite levels, but a lot. For some of those codebases there may be
> efficiency considerations - string equality is fast, string processing can
> be slow.
>
> A lot of libraries use objects rather than strings which would allow for
> custom definitions of ==, but this would be a big breaking change, some
> applications may depend on http and https being inequal.
>
> Nevertheless it might be an idea to build for the future. Core libraries
> like rdflib, jena, owlapi could provide sTransparentEquals operations and
> sNormalize functions such that developers can start writing more
> future-proof code. Care would have to be taken in defining how sTransparent
> and legacy codebases interact. It may be difficult for sTransparent code to
> be s-preserving, which would necessitate complicated re-normalization if
> codebases are to be mixed. I'm imagining strange bugs in what is already
> quite a complicated layered stack (owl over rdf, I'm looking at you). And I
> fear that using a non-standard equality operator would make a lot of semweb
> code look even more opaque than it already is.
>
> I think a lot of information ecosystems would opt to keep the code simple,
> and if forced to make the change, just bite the bullet, rewire all
> accessible RDF and provide converters to help do this.
>
> Both options have high costs, which is why in OBO we have no plans to
> change our existing http PURLs. But we don't know if there will be further
> developments that make continued use of http difficult.
>
>    ...
>    ...
>
> On Tue, Jun 13, 2023 at 12:49 PM Patrick J. Hayes <phayes@ihmc.org> wrote:
>
>> (On a more constructive note…)
>>
>> Chris, greetings. I agree with everything you say here, but wonder
>> whether there might be a slightly less painful way to bring the Sweb up to
>> date than rewriting every extant ontology.
>>
>> The Web is much bigger than the total Sweb, including all the RDF/OWL
>> ontologies, but that is probably bigger than the sum total of the code of
>> Sweb tools that manipulate these ontologies. So on the principle of making
>> the fix where it causes least pain, could we not encourage semantic web
>> tool-builders to make their engines treat URIs in a s-transparent way, so
>> that http:foodleblax and https:foodelax are simply treated as identical
>> when occurring in any RDF triple. I am not a developer but surely this
>> would not be too onerous a task, would it? It's a tweak to some low-level
>> part of the code that extracts URIs from datastructure or text. Call such
>> an RDF tool 'S-transparent', then asking Sweb developers to ensure
>> 'S-tranparency' would seem (?) to solve the problem and still keep other
>> Web developers happy, for surely they do not care what happens to URIs
>> embedded inside RDF triples, which are never used as Web identifiers in any
>> transfer protocol. (Or do they?)
>>
>> Anyway, I will leave y'all with this thought. I'm sure it must have
>> occurred to someone already in any case.
>>
>> If this is nonsense or unworkable, please just ignore it.
>>
>> Best wishes
>>
>> Pat Hayes
>>
>> On Jun 13, 2023, at 10:01 AM, Chris Mungall <cjmungall@lbl.gov> wrote:
>>
>> I think it's important for the semantic web community to communicate
>> clearly, simply, unambiguously, and non-dogmatically when it comes to this
>> issue.
>>
>> While I agree with many points in the TimBL article, the ship has long
>> sailed. I can't show that article to web developers who are asking me why
>> we don't change our PURLs to https, because chrome refuses to allow
>> downloads of them when linked from an https site. They don't understand why
>> we are reluctant to change, because frankly using URLs for identifiers was
>> a pretty odd thing to do in the first place, mixing two separate concerns
>> (semantic identity and network protocols). Browsers and http libraries can
>> happily treat http and https as equivalent, but this is obviously a massive
>> problem for semantic web interoperability.
>>
>> The lack of guidance has led to confusion. For example, it looks like
>> schema.org is in some superposition state where http or https is
>> considered canonical for semantic identifiers.
>>
>> https://github.com/solid/solid-namespace/issues/21
>> https://github.com/linkeddata/rdflib.js/issues/550
>>
>> We are faced with this problem in the OBO community, we adopted http
>> PURLs for both OWL classes and OWL ontologies around 15 years ago,
>> rejecting URN-based LSIDs. We are now faced with the situation where things
>> are breaking as various pieces of web infrastructure start making life for
>> http difficult.
>>
>> We tried reading
>> https://www.w3.org/blog/2016/05/https-and-the-semantic-weblinked-data/
>> But the advice about URI and HSTS is hard to follow for a bunch of
>> ontologists. We just want to make useful ontologies, and not be forced to
>> be network engineers.
>>
>> Our discussion and eventual decisions are recorded here, if it's useful
>> (and comments welcome if we are doing things incorrectly):
>>
>> https://github.com/OBOFoundry/purl.obolibrary.org/issues/705
>>
>> Summary:
>>
>> 1. Our infrastructure supports both https and http URLs, for both terms
>> and ontologies, these both 302 redirect to the relevant browser or download
>> (using cloudflare)
>> 2. We encourage web sites that need to link to an ontology download to
>> use the https URLs in HTML, but to make it clear that the *PURL is the
>> http URI, and the http PURL *must* be used in RDF documents*
>> 3. Even though we support https variants of http PURLs for OWL classes,
>> with both 302 redirecting to the same location,* we strongly discourage
>> their use in any context,* because this can lead to confusion about the
>> canonical URL to use in RDF/OWL documents. We don't want to end up in the
>> schema.org situation. We are building lots of tooling that will check
>> for cases where https is used accidentally in a linked data context, as we
>> expect this to happen a lot.
>>
>> This has been sufficient to placate frustrated web developers, but it
>> feels like we are delaying the inevitable and that there will one day be
>> pressure to deprecate our http PURLs and switch to https. This would have a
>> massive cost in terms of rewiring massive distributed troves of RDF data
>> and OWL documents, database tables, and a highly painful, long, and
>> confusing transition period. But we are hoping that this day never comes or
>> we can delay it as long as possible, or LLMs will make the whole thing
>> irrelevant.
>>
>> On Tue, Jun 13, 2023 at 8:48 AM Melvin Carvalho <melvincarvalho@gmail.com>
>> wrote:
>>
>>>
>>>
>>> út 13. 6. 2023 v 17:37 odesílatel Hubauer, Thomas <
>>> thomas.hubauer@siemens.com> napsal:
>>>
>>>> Hi SemWeb community,
>>>>
>>>>
>>>> One of my projects is considering making some of our ontologies
>>>> accessible to customers. As part of these considerations, we have been
>>>> discussing resolving ontology references (e.g. for imports) which lead us
>>>> to some lengthy arguments about http:// vs. https:// as protocol part
>>>> in our URIs (primarily ontology URIs, potentially element URIs as well).
>>>>
>>>>
>>>> I am aware of a 2016 post (
>>>> https://www.w3.org/blog/2016/05/https-and-the-semantic-weblinked-data/)
>>>> stating that W3C currently considers http and https to be “equivalent” for
>>>> w3c.org. However, the security guys I am working with are not too
>>>> happy with this as using a http URI for downloading imported ontologies is
>>>> vulnerable to a man-in-the-middle attack.
>>>>
>>>>
>>>> I was unable to find any more recent statement by the W3C on the use of
>>>> http vs. https. Specifically, I’d be interested to understand if this
>>>> community (and the W3C) intend to stick with http for the foreseeable
>>>> future, of if there’s any plans to migrate some/all URIs (e.g. ontology
>>>> URIs but not element URIs) to https ? Would be nice for us to understand
>>>> what “the outer world” plans so we can maybe take this as a blueprint for
>>>> our own guidance on URIs.
>>>>
>>>
>>> I'm with TimBL on this:
>>>
>>> "HTTPS Everywhere" considered harmful
>>>
>>> https://www.w3.org/DesignIssues/Security-NotTheS.html
>>>
>>> The Semantic Web has been around for a couple of decades.  Is there any
>>> documented instance of an MITM attack on an ontology ever causing an issue?
>>>
>>>>
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> Thomas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
Received on Wednesday, 14 June 2023 14:28:01 UTC