W3C home > Mailing lists > Public > semantic-web@w3.org > July 2017

Re: The use of https: IRIs on the semantic web

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Fri, 07 Jul 2017 21:59:35 +0000
Message-ID: <CAE1ny+491Vyis_JAd1s0m4pc1Nw6rSbUvKi7Egtau4==f_PbTg@mail.gmail.com>
To: Richard Smith <richard@ex-parrot.com>, semantic-web@w3.org
The answer should be yes. There is no perfectly safe way of upgrading from
HTTP to HTTPS without cert pinning as well. I have a detailed analysis of
this as regards RDF that I can share soon (it's under review)

On Fri, Jul 7, 2017 at 11:44 PM Richard Smith <richard@ex-parrot.com> wrote:

> I hope this is an appropriate mailing list to ask this
> question.  I'd be happy to be directed elsewhere if not.
> I am defining a new vocabulary.  It's not an extension of an
> existing vocabulary, nor will it use the same domain as any
> existing vocabulary.  Should I use https: IRIs?
> Every source I've consulted says I should prefer http: IRIs.
> This includes the Linked Data book [1], the W3 note on "Cool
> URIs" [2], and the W3 note on best practices for RDF
> vocabularies.
> This surprises me slightly.  The world seems to be moving
> away from HTTP to HTTPS, yet I know of no vocabulary that
> uses https: IRIs, and none of the documents quoted above
> even discuss the question.  I can find discussion on why
> ftp: or urn: are less well suited, but nothing about https:.
> Even though IRIs on the semantic web are primarily
> identifiers rather than locators, certainly in the linked
> data world, the IRI is assumed to be a good place to look
> for more information about entity, and various authorities
> recommend 303 redirects to documents like RDF or OWL schemas
> or other descriptive documents with further information.
> If the IRI is just an identifier, the choice of IRI scheme
> is largely irrelevant, and https: is neither better nor
> worse than http:.  If it's used as a locator, then at least
> the initial request to an http: IRI will be made over plain
> HTTP.  It may then be redirectd to HTTPS, and HSTS headers
> may mean subsequent requests go directly over HTTPS, but the
> first request is still unencrypted.  This has the following
> potential problems:
> * It is susceptible to a man-in-the-middle attack.  A
>    malicious party could inject deliberately inconsistent
>    schema information that may affect processing decisions
>    made by applications, potentially causing a DoS or otherwise
>    disrupting the user experience.
> * ISPs may do a MITM themselves to inject adverts into
>    content (e.g. [4]).  Really they oughtn't to do this for
>    non-HTML content (well, they shouldn't do it at all, but
>    that's another matter), but that relies on the ISP caring
>    enough to get this right.
> * The request might be tracked by ISPs.  The fact that a
>    user is using an application that consults a particular
>    schema is itself valuable information about the customer
>    that can be sold to advertisers which the US Senate
>    recently voted to make legal [5].  People are increasingly
>    privacy conscious and want to minimise this.
> These are all mitigated to a significant degree by using
> https: IRIs.  So far as I can see, the counter-arguments are
> as follows:
> * Not all HTTP client libraries support TLS, but few if any
>    only support HTTP over TLS.
> * Best practice with TLS changes more frequently than plain
>    HTTP.  An HTTP client from 20 years ago will still
>    probably work, but TLS has moved on a lot and few servers
>    now support SSL 3.
> * MITM attacks and injection can still happen over TLS as
>    the "Superfish" fiasco demonstrated [6], are probably
>    better prevented with digital signatures.
> * Use of TLS at the transport level prevents HTTP caching on
>    intermediate servers, unless a trusted root certicicate is
>    used.
> * ISP tracking still happens over TLS because the SNI field
>    is not encrypted, and encrypted SNI seems to have been
>    droped from TLS 1.3 [7].
> * Whether TLS is used at the transport level should be an
>    implementation detail that is not exposed in the
>    vocabulary, a point Berners-Lee has made forcefully [8].
> * In the future it's likely that there the functionality of
>    HSTS will be put in DNS [9].
> These seem fairly weak arguments to me.  Digital signatures
> can be used regardless of whether the resource was fetched
> over TLS, and adding authentication at the top of the
> semantic web stack shouldn't preclude encryption at the
> bottom.  HSTS-in-DNS technologies, which in conjunction with
> DNSSEC would alleviate the problem, seem to be stalled, and
> I've seen no drafts on the subject since 2011 [9].
> I'm wondering whether there's something I'm missing, because
> almost universally people are still defining vocabularies
> using http: IRIs.
> I can see why converting an existing vocabulary from http:
> to https: would be difficult, to the point of being
> undesirable; I can see too that there are logistic
> conveniences to having all vocabulary IRIs on a given domain
> use the same IRI scheme, both points Berners-Lee makes in
> [8].  But these don't apply to new vocabularies.
> Is there some other consideration I'm missing?
> Richard
> [1] http://linkeddatabook.com/editions/1.0/#htoc10
> [2] https://www.w3.org/TR/cooluris/
> [3] https://www.w3.org/TR/swbp-vocab-pub/
> [4] http://preview.tinyurl.com/om3xxdb
> [5] http://preview.tinyurl.com/ybeka8yv
> [6] https://brennan.io/2015/02/20/superfish-explained/
> [7] https://www.ietf.org/mail-archive/web/tls/current/msg23251.html
> [8] https://www.w3.org/DesignIssues/Security-NotTheS.html
> [9] https://tools.ietf.org/html/draft-hallambaker-esrv-01
Received on Friday, 7 July 2017 22:00:22 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:41:56 UTC