- From: Harry Halpin <hhalpin@ibiblio.org>
- Date: Fri, 07 Jul 2017 21:59:35 +0000
- To: Richard Smith <richard@ex-parrot.com>, semantic-web@w3.org
- Message-ID: <CAE1ny+491Vyis_JAd1s0m4pc1Nw6rSbUvKi7Egtau4==f_PbTg@mail.gmail.com>
The answer should be yes. There is no perfectly safe way of upgrading from HTTP to HTTPS without cert pinning as well. I have a detailed analysis of this as regards RDF that I can share soon (it's under review) On Fri, Jul 7, 2017 at 11:44 PM Richard Smith <richard@ex-parrot.com> wrote: > > I hope this is an appropriate mailing list to ask this > question. I'd be happy to be directed elsewhere if not. > > I am defining a new vocabulary. It's not an extension of an > existing vocabulary, nor will it use the same domain as any > existing vocabulary. Should I use https: IRIs? > > Every source I've consulted says I should prefer http: IRIs. > This includes the Linked Data book [1], the W3 note on "Cool > URIs" [2], and the W3 note on best practices for RDF > vocabularies. > > This surprises me slightly. The world seems to be moving > away from HTTP to HTTPS, yet I know of no vocabulary that > uses https: IRIs, and none of the documents quoted above > even discuss the question. I can find discussion on why > ftp: or urn: are less well suited, but nothing about https:. > > Even though IRIs on the semantic web are primarily > identifiers rather than locators, certainly in the linked > data world, the IRI is assumed to be a good place to look > for more information about entity, and various authorities > recommend 303 redirects to documents like RDF or OWL schemas > or other descriptive documents with further information. > > If the IRI is just an identifier, the choice of IRI scheme > is largely irrelevant, and https: is neither better nor > worse than http:. If it's used as a locator, then at least > the initial request to an http: IRI will be made over plain > HTTP. It may then be redirectd to HTTPS, and HSTS headers > may mean subsequent requests go directly over HTTPS, but the > first request is still unencrypted. This has the following > potential problems: > > * It is susceptible to a man-in-the-middle attack. A > malicious party could inject deliberately inconsistent > schema information that may affect processing decisions > made by applications, potentially causing a DoS or otherwise > disrupting the user experience. > > * ISPs may do a MITM themselves to inject adverts into > content (e.g. [4]). Really they oughtn't to do this for > non-HTML content (well, they shouldn't do it at all, but > that's another matter), but that relies on the ISP caring > enough to get this right. > > * The request might be tracked by ISPs. The fact that a > user is using an application that consults a particular > schema is itself valuable information about the customer > that can be sold to advertisers which the US Senate > recently voted to make legal [5]. People are increasingly > privacy conscious and want to minimise this. > > These are all mitigated to a significant degree by using > https: IRIs. So far as I can see, the counter-arguments are > as follows: > > * Not all HTTP client libraries support TLS, but few if any > only support HTTP over TLS. > > * Best practice with TLS changes more frequently than plain > HTTP. An HTTP client from 20 years ago will still > probably work, but TLS has moved on a lot and few servers > now support SSL 3. > > * MITM attacks and injection can still happen over TLS as > the "Superfish" fiasco demonstrated [6], are probably > better prevented with digital signatures. > > * Use of TLS at the transport level prevents HTTP caching on > intermediate servers, unless a trusted root certicicate is > used. > > * ISP tracking still happens over TLS because the SNI field > is not encrypted, and encrypted SNI seems to have been > droped from TLS 1.3 [7]. > > * Whether TLS is used at the transport level should be an > implementation detail that is not exposed in the > vocabulary, a point Berners-Lee has made forcefully [8]. > > * In the future it's likely that there the functionality of > HSTS will be put in DNS [9]. > > These seem fairly weak arguments to me. Digital signatures > can be used regardless of whether the resource was fetched > over TLS, and adding authentication at the top of the > semantic web stack shouldn't preclude encryption at the > bottom. HSTS-in-DNS technologies, which in conjunction with > DNSSEC would alleviate the problem, seem to be stalled, and > I've seen no drafts on the subject since 2011 [9]. > > I'm wondering whether there's something I'm missing, because > almost universally people are still defining vocabularies > using http: IRIs. > > I can see why converting an existing vocabulary from http: > to https: would be difficult, to the point of being > undesirable; I can see too that there are logistic > conveniences to having all vocabulary IRIs on a given domain > use the same IRI scheme, both points Berners-Lee makes in > [8]. But these don't apply to new vocabularies. > > Is there some other consideration I'm missing? > > Richard > > > [1] http://linkeddatabook.com/editions/1.0/#htoc10 > [2] https://www.w3.org/TR/cooluris/ > [3] https://www.w3.org/TR/swbp-vocab-pub/ > [4] http://preview.tinyurl.com/om3xxdb > [5] http://preview.tinyurl.com/ybeka8yv > [6] https://brennan.io/2015/02/20/superfish-explained/ > [7] https://www.ietf.org/mail-archive/web/tls/current/msg23251.html > [8] https://www.w3.org/DesignIssues/Security-NotTheS.html > [9] https://tools.ietf.org/html/draft-hallambaker-esrv-01 > >
Received on Friday, 7 July 2017 22:00:22 UTC