- From: Chris Mungall <cjmungall@lbl.gov>
- Date: Wed, 14 Jun 2023 07:27:39 -0700
- To: Pierre-Antoine Champin <pierre-antoine@w3.org>
- Cc: "Patrick J. Hayes" <phayes@ihmc.org>, Melvin Carvalho <melvincarvalho@gmail.com>, "Hubauer, Thomas" <thomas.hubauer@siemens.com>, "semantic-web@w3.org" <semantic-web@w3.org>
- Message-ID: <CAN9AiftzgzwN1D83Hx0L-U+KV+Rtu5YKa8TdyuoviDi=hdyEBw@mail.gmail.com>
On Wed, Jun 14, 2023 at 6:28 AM Pierre-Antoine Champin < pierre-antoine@w3.org> wrote: > > On 14/06/2023 00:53, Chris Mungall wrote: > > Hi Pat! > > While this could work in principle, in practice there are likely millions > of lines of code like this: > > >>> if pred == "http://www.w3.org/2004/02/skos/core#altLabel": > >>> ... > > This is not really the issue, I believe. My (mis?)reading of Pat's > suggestion is that the https transparency should be implemented whenever > these IRIs are used as URLs. I.e. at the "linked data" level, not the "RDF > level". > > In other words, all RDF files, RDF database, and code dealing with them, > should use the <http://www.w3.org/2004/02/skos/core#altLabel> > <http://www.w3.org/2004/02/skos/core#altLabel> (no "s"). That's the > identifier of the "alternative label" property in Skos, we should not > change it. > However, any code that wishes to dereference this identifier to get more > info about what it means, could (should?) be updated to automatically > replace the http: at the beginning by https:. And fallback to http:// it > the former attempt fails. > Oh I see, well that's relatively straightforward, but it's a non-use case for us in OBO. I'm not aware of anyone ever writing code to obtain more information about an entity by dereferencing its URI, although we originally set up a lot of infrastructure <https://pubmed.ncbi.nlm.nih.gov/27733503/> to do this, this follow your nose thing has always been fantasy in the biomedical linked data world, and I suspect other domains too. I don't really get what kind of code would be able to do anything meaningful with what it gets by grabbing the turtle for skos:altLabel. In fact most biomedical ontology class PURLs resolve only to HTML: curl -L -H "Accept: text/turtle" http://purl.obolibrary.org/obo/CL_0000540 => html Only humans care about the links. Machines use the whole ontology. pa > > > or > > >>> if pred == SKOS.altLabel: > >>> ... > > That would need to be rewritten to be s-transparent. Perhaps not Y2K code > rewrite levels, but a lot. For some of those codebases there may be > efficiency considerations - string equality is fast, string processing can > be slow. > > A lot of libraries use objects rather than strings which would allow for > custom definitions of ==, but this would be a big breaking change, some > applications may depend on http and https being inequal. > > Nevertheless it might be an idea to build for the future. Core libraries > like rdflib, jena, owlapi could provide sTransparentEquals operations and > sNormalize functions such that developers can start writing more > future-proof code. Care would have to be taken in defining how sTransparent > and legacy codebases interact. It may be difficult for sTransparent code to > be s-preserving, which would necessitate complicated re-normalization if > codebases are to be mixed. I'm imagining strange bugs in what is already > quite a complicated layered stack (owl over rdf, I'm looking at you). And I > fear that using a non-standard equality operator would make a lot of semweb > code look even more opaque than it already is. > > I think a lot of information ecosystems would opt to keep the code simple, > and if forced to make the change, just bite the bullet, rewire all > accessible RDF and provide converters to help do this. > > Both options have high costs, which is why in OBO we have no plans to > change our existing http PURLs. But we don't know if there will be further > developments that make continued use of http difficult. > > ... > ... > > On Tue, Jun 13, 2023 at 12:49 PM Patrick J. Hayes <phayes@ihmc.org> wrote: > >> (On a more constructive note…) >> >> Chris, greetings. I agree with everything you say here, but wonder >> whether there might be a slightly less painful way to bring the Sweb up to >> date than rewriting every extant ontology. >> >> The Web is much bigger than the total Sweb, including all the RDF/OWL >> ontologies, but that is probably bigger than the sum total of the code of >> Sweb tools that manipulate these ontologies. So on the principle of making >> the fix where it causes least pain, could we not encourage semantic web >> tool-builders to make their engines treat URIs in a s-transparent way, so >> that http:foodleblax and https:foodelax are simply treated as identical >> when occurring in any RDF triple. I am not a developer but surely this >> would not be too onerous a task, would it? It's a tweak to some low-level >> part of the code that extracts URIs from datastructure or text. Call such >> an RDF tool 'S-transparent', then asking Sweb developers to ensure >> 'S-tranparency' would seem (?) to solve the problem and still keep other >> Web developers happy, for surely they do not care what happens to URIs >> embedded inside RDF triples, which are never used as Web identifiers in any >> transfer protocol. (Or do they?) >> >> Anyway, I will leave y'all with this thought. I'm sure it must have >> occurred to someone already in any case. >> >> If this is nonsense or unworkable, please just ignore it. >> >> Best wishes >> >> Pat Hayes >> >> On Jun 13, 2023, at 10:01 AM, Chris Mungall <cjmungall@lbl.gov> wrote: >> >> I think it's important for the semantic web community to communicate >> clearly, simply, unambiguously, and non-dogmatically when it comes to this >> issue. >> >> While I agree with many points in the TimBL article, the ship has long >> sailed. I can't show that article to web developers who are asking me why >> we don't change our PURLs to https, because chrome refuses to allow >> downloads of them when linked from an https site. They don't understand why >> we are reluctant to change, because frankly using URLs for identifiers was >> a pretty odd thing to do in the first place, mixing two separate concerns >> (semantic identity and network protocols). Browsers and http libraries can >> happily treat http and https as equivalent, but this is obviously a massive >> problem for semantic web interoperability. >> >> The lack of guidance has led to confusion. For example, it looks like >> schema.org is in some superposition state where http or https is >> considered canonical for semantic identifiers. >> >> https://github.com/solid/solid-namespace/issues/21 >> https://github.com/linkeddata/rdflib.js/issues/550 >> >> We are faced with this problem in the OBO community, we adopted http >> PURLs for both OWL classes and OWL ontologies around 15 years ago, >> rejecting URN-based LSIDs. We are now faced with the situation where things >> are breaking as various pieces of web infrastructure start making life for >> http difficult. >> >> We tried reading >> https://www.w3.org/blog/2016/05/https-and-the-semantic-weblinked-data/ >> But the advice about URI and HSTS is hard to follow for a bunch of >> ontologists. We just want to make useful ontologies, and not be forced to >> be network engineers. >> >> Our discussion and eventual decisions are recorded here, if it's useful >> (and comments welcome if we are doing things incorrectly): >> >> https://github.com/OBOFoundry/purl.obolibrary.org/issues/705 >> >> Summary: >> >> 1. Our infrastructure supports both https and http URLs, for both terms >> and ontologies, these both 302 redirect to the relevant browser or download >> (using cloudflare) >> 2. We encourage web sites that need to link to an ontology download to >> use the https URLs in HTML, but to make it clear that the *PURL is the >> http URI, and the http PURL *must* be used in RDF documents* >> 3. Even though we support https variants of http PURLs for OWL classes, >> with both 302 redirecting to the same location,* we strongly discourage >> their use in any context,* because this can lead to confusion about the >> canonical URL to use in RDF/OWL documents. We don't want to end up in the >> schema.org situation. We are building lots of tooling that will check >> for cases where https is used accidentally in a linked data context, as we >> expect this to happen a lot. >> >> This has been sufficient to placate frustrated web developers, but it >> feels like we are delaying the inevitable and that there will one day be >> pressure to deprecate our http PURLs and switch to https. This would have a >> massive cost in terms of rewiring massive distributed troves of RDF data >> and OWL documents, database tables, and a highly painful, long, and >> confusing transition period. But we are hoping that this day never comes or >> we can delay it as long as possible, or LLMs will make the whole thing >> irrelevant. >> >> On Tue, Jun 13, 2023 at 8:48 AM Melvin Carvalho <melvincarvalho@gmail.com> >> wrote: >> >>> >>> >>> út 13. 6. 2023 v 17:37 odesílatel Hubauer, Thomas < >>> thomas.hubauer@siemens.com> napsal: >>> >>>> Hi SemWeb community, >>>> >>>> >>>> One of my projects is considering making some of our ontologies >>>> accessible to customers. As part of these considerations, we have been >>>> discussing resolving ontology references (e.g. for imports) which lead us >>>> to some lengthy arguments about http:// vs. https:// as protocol part >>>> in our URIs (primarily ontology URIs, potentially element URIs as well). >>>> >>>> >>>> I am aware of a 2016 post ( >>>> https://www.w3.org/blog/2016/05/https-and-the-semantic-weblinked-data/) >>>> stating that W3C currently considers http and https to be “equivalent” for >>>> w3c.org. However, the security guys I am working with are not too >>>> happy with this as using a http URI for downloading imported ontologies is >>>> vulnerable to a man-in-the-middle attack. >>>> >>>> >>>> I was unable to find any more recent statement by the W3C on the use of >>>> http vs. https. Specifically, I’d be interested to understand if this >>>> community (and the W3C) intend to stick with http for the foreseeable >>>> future, of if there’s any plans to migrate some/all URIs (e.g. ontology >>>> URIs but not element URIs) to https ? Would be nice for us to understand >>>> what “the outer world” plans so we can maybe take this as a blueprint for >>>> our own guidance on URIs. >>>> >>> >>> I'm with TimBL on this: >>> >>> "HTTPS Everywhere" considered harmful >>> >>> https://www.w3.org/DesignIssues/Security-NotTheS.html >>> >>> The Semantic Web has been around for a couple of decades. Is there any >>> documented instance of an MITM attack on an ontology ever causing an issue? >>> >>>> >>>> >>>> >>>> Best regards, >>>> >>>> Thomas >>>> >>>> >>>> >>>> >>>> >>> >>
Received on Wednesday, 14 June 2023 14:28:01 UTC