Re: The use of https: IRIs on the semantic web

The answer should be yes. There is no perfectly safe way of upgrading from
HTTP to HTTPS without cert pinning as well. I have a detailed analysis of
this as regards RDF that I can share soon (it's under review)

On Fri, Jul 7, 2017 at 11:44 PM Richard Smith <richard@ex-parrot.com> wrote:

>
> I hope this is an appropriate mailing list to ask this
> question.  I'd be happy to be directed elsewhere if not.
>
> I am defining a new vocabulary.  It's not an extension of an
> existing vocabulary, nor will it use the same domain as any
> existing vocabulary.  Should I use https: IRIs?
>
> Every source I've consulted says I should prefer http: IRIs.
> This includes the Linked Data book [1], the W3 note on "Cool
> URIs" [2], and the W3 note on best practices for RDF
> vocabularies.
>
> This surprises me slightly.  The world seems to be moving
> away from HTTP to HTTPS, yet I know of no vocabulary that
> uses https: IRIs, and none of the documents quoted above
> even discuss the question.  I can find discussion on why
> ftp: or urn: are less well suited, but nothing about https:.
>
> Even though IRIs on the semantic web are primarily
> identifiers rather than locators, certainly in the linked
> data world, the IRI is assumed to be a good place to look
> for more information about entity, and various authorities
> recommend 303 redirects to documents like RDF or OWL schemas
> or other descriptive documents with further information.
>
> If the IRI is just an identifier, the choice of IRI scheme
> is largely irrelevant, and https: is neither better nor
> worse than http:.  If it's used as a locator, then at least
> the initial request to an http: IRI will be made over plain
> HTTP.  It may then be redirectd to HTTPS, and HSTS headers
> may mean subsequent requests go directly over HTTPS, but the
> first request is still unencrypted.  This has the following
> potential problems:
>
> * It is susceptible to a man-in-the-middle attack.  A
>    malicious party could inject deliberately inconsistent
>    schema information that may affect processing decisions
>    made by applications, potentially causing a DoS or otherwise
>    disrupting the user experience.
>
> * ISPs may do a MITM themselves to inject adverts into
>    content (e.g. [4]).  Really they oughtn't to do this for
>    non-HTML content (well, they shouldn't do it at all, but
>    that's another matter), but that relies on the ISP caring
>    enough to get this right.
>
> * The request might be tracked by ISPs.  The fact that a
>    user is using an application that consults a particular
>    schema is itself valuable information about the customer
>    that can be sold to advertisers which the US Senate
>    recently voted to make legal [5].  People are increasingly
>    privacy conscious and want to minimise this.
>
> These are all mitigated to a significant degree by using
> https: IRIs.  So far as I can see, the counter-arguments are
> as follows:
>
> * Not all HTTP client libraries support TLS, but few if any
>    only support HTTP over TLS.
>
> * Best practice with TLS changes more frequently than plain
>    HTTP.  An HTTP client from 20 years ago will still
>    probably work, but TLS has moved on a lot and few servers
>    now support SSL 3.
>
> * MITM attacks and injection can still happen over TLS as
>    the "Superfish" fiasco demonstrated [6], are probably
>    better prevented with digital signatures.
>
> * Use of TLS at the transport level prevents HTTP caching on
>    intermediate servers, unless a trusted root certicicate is
>    used.
>
> * ISP tracking still happens over TLS because the SNI field
>    is not encrypted, and encrypted SNI seems to have been
>    droped from TLS 1.3 [7].
>
> * Whether TLS is used at the transport level should be an
>    implementation detail that is not exposed in the
>    vocabulary, a point Berners-Lee has made forcefully [8].
>
> * In the future it's likely that there the functionality of
>    HSTS will be put in DNS [9].
>
> These seem fairly weak arguments to me.  Digital signatures
> can be used regardless of whether the resource was fetched
> over TLS, and adding authentication at the top of the
> semantic web stack shouldn't preclude encryption at the
> bottom.  HSTS-in-DNS technologies, which in conjunction with
> DNSSEC would alleviate the problem, seem to be stalled, and
> I've seen no drafts on the subject since 2011 [9].
>
> I'm wondering whether there's something I'm missing, because
> almost universally people are still defining vocabularies
> using http: IRIs.
>
> I can see why converting an existing vocabulary from http:
> to https: would be difficult, to the point of being
> undesirable; I can see too that there are logistic
> conveniences to having all vocabulary IRIs on a given domain
> use the same IRI scheme, both points Berners-Lee makes in
> [8].  But these don't apply to new vocabularies.
>
> Is there some other consideration I'm missing?
>
> Richard
>
>
> [1] http://linkeddatabook.com/editions/1.0/#htoc10
> [2] https://www.w3.org/TR/cooluris/
> [3] https://www.w3.org/TR/swbp-vocab-pub/
> [4] http://preview.tinyurl.com/om3xxdb
> [5] http://preview.tinyurl.com/ybeka8yv
> [6] https://brennan.io/2015/02/20/superfish-explained/
> [7] https://www.ietf.org/mail-archive/web/tls/current/msg23251.html
> [8] https://www.w3.org/DesignIssues/Security-NotTheS.html
> [9] https://tools.ietf.org/html/draft-hallambaker-esrv-01
>
>

Received on Friday, 7 July 2017 22:00:22 UTC