W3C home > Mailing lists > Public > semantic-web@w3.org > July 2017

Re: The use of https: IRIs on the semantic web

From: Nathan Rixham <nathan@webr3.org>
Date: Mon, 10 Jul 2017 14:39:06 +0100
Message-ID: <CANiy74yOezSWeDLdK2GiTMGSPj6r0O=4YZ_TrrVNaSk=Pzb+Og@mail.gmail.com>
To: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
Cc: Richard Smith <richard@ex-parrot.com>, "semantic-web@w3.org" <semantic-web@w3.org>
what about xyz:// and abc://, in ten years we'll have them, what's the well
defined way to have schema's and data visible through multiple protocols?

On Mon, Jul 10, 2017 at 11:56 AM, Stian Soiland-Reyes <
soiland-reyes@manchester.ac.uk> wrote:

> I would wholeheartedly support using a https namespace for a new ontology,
> coupled with a longevity-secured location. With https://letsencrypt.org/
> there is not really an excuse today to have a non-encrypted web service,
> even for experimental work.
> Probably the biggest real danger with https:// namespace is that someone
> will forget to renew the SSL certificates (or recently; SSL clients require
> higher encryption quality) and the vocabulary becomes “inaccessible”.
> SSL expiry is however not as bad as expiry of a custom domain name for the
> ontology (bad idea), as at least there are technical ways to bypass the
> expiry warnings.
> The JSON-LD community has gone with https from day one, as nobody wants
> their @context resolution to be vulnerable, or to not work from a https://
> web application.
> Thus I find that vocabularies using PURLs with a https://w3id.org/
> registered namespace are pretty much always on https – although some of
> them redirect to an insecure http location (a redirection I know Java’s URL
> support will protest against) or may have accidentally used
> http://w3id.org/ in their declared namespaces. Here are some of them
> (where people added a accompanying README): https://github.com/perma-id/
> w3id.org/search?utf8=%E2%9C%93&q=ontology&type=
> --
> Stian Soiland-Reyes, eScience Lab
> School of Computer Science, The University of Manchester
> http://orcid.org/0000-0001-9842-9718
> *From: *Richard Smith <richard@ex-parrot.com>
> *Sent: *07 July 2017 22:45
> *To: *semantic-web@w3.org
> *Subject: *The use of https: IRIs on the semantic web
> I hope this is an appropriate mailing list to ask this
> question.  I'd be happy to be directed elsewhere if not.
> I am defining a new vocabulary.  It's not an extension of an
> existing vocabulary, nor will it use the same domain as any
> existing vocabulary.  Should I use https: IRIs?
> Every source I've consulted says I should prefer http: IRIs.
> This includes the Linked Data book [1], the W3 note on "Cool
> URIs" [2], and the W3 note on best practices for RDF
> vocabularies.
> This surprises me slightly.  The world seems to be moving
> away from HTTP to HTTPS, yet I know of no vocabulary that
> uses https: IRIs, and none of the documents quoted above
> even discuss the question.  I can find discussion on why
> ftp: or urn: are less well suited, but nothing about https:.
> Even though IRIs on the semantic web are primarily
> identifiers rather than locators, certainly in the linked
> data world, the IRI is assumed to be a good place to look
> for more information about entity, and various authorities
> recommend 303 redirects to documents like RDF or OWL schemas
> or other descriptive documents with further information.
> If the IRI is just an identifier, the choice of IRI scheme
> is largely irrelevant, and https: is neither better nor
> worse than http:.  If it's used as a locator, then at least
> the initial request to an http: IRI will be made over plain
> HTTP.  It may then be redirectd to HTTPS, and HSTS headers
> may mean subsequent requests go directly over HTTPS, but the
> first request is still unencrypted.  This has the following
> potential problems:
> * It is susceptible to a man-in-the-middle attack.  A
>    malicious party could inject deliberately inconsistent
>    schema information that may affect processing decisions
>    made by applications, potentially causing a DoS or otherwise
>    disrupting the user experience.
> * ISPs may do a MITM themselves to inject adverts into
>    content (e.g. [4]).  Really they oughtn't to do this for
>    non-HTML content (well, they shouldn't do it at all, but
>    that's another matter), but that relies on the ISP caring
>    enough to get this right.
> * The request might be tracked by ISPs.  The fact that a
>    user is using an application that consults a particular
>    schema is itself valuable information about the customer
>    that can be sold to advertisers which the US Senate
>    recently voted to make legal [5].  People are increasingly
>    privacy conscious and want to minimise this.
> These are all mitigated to a significant degree by using
> https: IRIs.  So far as I can see, the counter-arguments are
> as follows:
> * Not all HTTP client libraries support TLS, but few if any
>    only support HTTP over TLS.
> * Best practice with TLS changes more frequently than plain
>    HTTP.  An HTTP client from 20 years ago will still
>    probably work, but TLS has moved on a lot and few servers
>    now support SSL 3.
> * MITM attacks and injection can still happen over TLS as
>    the "Superfish" fiasco demonstrated [6], are probably
>    better prevented with digital signatures.
> * Use of TLS at the transport level prevents HTTP caching on
>    intermediate servers, unless a trusted root certicicate is
>    used.
> * ISP tracking still happens over TLS because the SNI field
>    is not encrypted, and encrypted SNI seems to have been
>    droped from TLS 1.3 [7].
> * Whether TLS is used at the transport level should be an
>    implementation detail that is not exposed in the
>    vocabulary, a point Berners-Lee has made forcefully [8].
> * In the future it's likely that there the functionality of
>    HSTS will be put in DNS [9].
> These seem fairly weak arguments to me.  Digital signatures
> can be used regardless of whether the resource was fetched
> over TLS, and adding authentication at the top of the
> semantic web stack shouldn't preclude encryption at the
> bottom.  HSTS-in-DNS technologies, which in conjunction with
> DNSSEC would alleviate the problem, seem to be stalled, and
> I've seen no drafts on the subject since 2011 [9].
> I'm wondering whether there's something I'm missing, because
> almost universally people are still defining vocabularies
> using http: IRIs.
> I can see why converting an existing vocabulary from http:
> to https: would be difficult, to the point of being
> undesirable; I can see too that there are logistic
> conveniences to having all vocabulary IRIs on a given domain
> use the same IRI scheme, both points Berners-Lee makes in
> [8].  But these don't apply to new vocabularies.
> Is there some other consideration I'm missing?
> Richard
> [1] http://linkeddatabook.com/editions/1.0/#htoc10
> [2] https://www.w3.org/TR/cooluris/
> [3] https://www.w3.org/TR/swbp-vocab-pub/
> [4] http://preview.tinyurl.com/om3xxdb
> [5] http://preview.tinyurl.com/ybeka8yv
> [6] https://brennan.io/2015/02/20/superfish-explained/
> [7] https://www.ietf.org/mail-archive/web/tls/current/msg23251.html
> [8] https://www.w3.org/DesignIssues/Security-NotTheS.html
> [9] https://tools.ietf.org/html/draft-hallambaker-esrv-01
Received on Monday, 10 July 2017 13:39:40 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:41:56 UTC