- From: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
- Date: Mon, 10 Jul 2017 10:56:49 +0000
- To: Richard Smith <richard@ex-parrot.com>, "semantic-web@w3.org" <semantic-web@w3.org>
- Message-ID: <D5780135E58FC940BDB87E7D499910184B3F08C0@MBXP14.ds.man.ac.uk>
I would wholeheartedly support using a https namespace for a new ontology, coupled with a longevity-secured location. With https://letsencrypt.org/ there is not really an excuse today to have a non-encrypted web service, even for experimental work. Probably the biggest real danger with https:// namespace is that someone will forget to renew the SSL certificates (or recently; SSL clients require higher encryption quality) and the vocabulary becomes “inaccessible”. SSL expiry is however not as bad as expiry of a custom domain name for the ontology (bad idea), as at least there are technical ways to bypass the expiry warnings. The JSON-LD community has gone with https from day one, as nobody wants their @context resolution to be vulnerable, or to not work from a https:// web application. Thus I find that vocabularies using PURLs with a https://w3id.org/ registered namespace are pretty much always on https – although some of them redirect to an insecure http location (a redirection I know Java’s URL support will protest against) or may have accidentally used http://w3id.org/ in their declared namespaces. Here are some of them (where people added a accompanying README): https://github.com/perma-id/w3id.org/search?utf8=%E2%9C%93&q=ontology&type= -- Stian Soiland-Reyes, eScience Lab School of Computer Science, The University of Manchester http://orcid.org/0000-0001-9842-9718 From: Richard Smith<mailto:richard@ex-parrot.com> Sent: 07 July 2017 22:45 To: semantic-web@w3.org<mailto:semantic-web@w3.org> Subject: The use of https: IRIs on the semantic web I hope this is an appropriate mailing list to ask this question. I'd be happy to be directed elsewhere if not. I am defining a new vocabulary. It's not an extension of an existing vocabulary, nor will it use the same domain as any existing vocabulary. Should I use https: IRIs? Every source I've consulted says I should prefer http: IRIs. This includes the Linked Data book [1], the W3 note on "Cool URIs" [2], and the W3 note on best practices for RDF vocabularies. This surprises me slightly. The world seems to be moving away from HTTP to HTTPS, yet I know of no vocabulary that uses https: IRIs, and none of the documents quoted above even discuss the question. I can find discussion on why ftp: or urn: are less well suited, but nothing about https:. Even though IRIs on the semantic web are primarily identifiers rather than locators, certainly in the linked data world, the IRI is assumed to be a good place to look for more information about entity, and various authorities recommend 303 redirects to documents like RDF or OWL schemas or other descriptive documents with further information. If the IRI is just an identifier, the choice of IRI scheme is largely irrelevant, and https: is neither better nor worse than http:. If it's used as a locator, then at least the initial request to an http: IRI will be made over plain HTTP. It may then be redirectd to HTTPS, and HSTS headers may mean subsequent requests go directly over HTTPS, but the first request is still unencrypted. This has the following potential problems: * It is susceptible to a man-in-the-middle attack. A malicious party could inject deliberately inconsistent schema information that may affect processing decisions made by applications, potentially causing a DoS or otherwise disrupting the user experience. * ISPs may do a MITM themselves to inject adverts into content (e.g. [4]). Really they oughtn't to do this for non-HTML content (well, they shouldn't do it at all, but that's another matter), but that relies on the ISP caring enough to get this right. * The request might be tracked by ISPs. The fact that a user is using an application that consults a particular schema is itself valuable information about the customer that can be sold to advertisers which the US Senate recently voted to make legal [5]. People are increasingly privacy conscious and want to minimise this. These are all mitigated to a significant degree by using https: IRIs. So far as I can see, the counter-arguments are as follows: * Not all HTTP client libraries support TLS, but few if any only support HTTP over TLS. * Best practice with TLS changes more frequently than plain HTTP. An HTTP client from 20 years ago will still probably work, but TLS has moved on a lot and few servers now support SSL 3. * MITM attacks and injection can still happen over TLS as the "Superfish" fiasco demonstrated [6], are probably better prevented with digital signatures. * Use of TLS at the transport level prevents HTTP caching on intermediate servers, unless a trusted root certicicate is used. * ISP tracking still happens over TLS because the SNI field is not encrypted, and encrypted SNI seems to have been droped from TLS 1.3 [7]. * Whether TLS is used at the transport level should be an implementation detail that is not exposed in the vocabulary, a point Berners-Lee has made forcefully [8]. * In the future it's likely that there the functionality of HSTS will be put in DNS [9]. These seem fairly weak arguments to me. Digital signatures can be used regardless of whether the resource was fetched over TLS, and adding authentication at the top of the semantic web stack shouldn't preclude encryption at the bottom. HSTS-in-DNS technologies, which in conjunction with DNSSEC would alleviate the problem, seem to be stalled, and I've seen no drafts on the subject since 2011 [9]. I'm wondering whether there's something I'm missing, because almost universally people are still defining vocabularies using http: IRIs. I can see why converting an existing vocabulary from http: to https: would be difficult, to the point of being undesirable; I can see too that there are logistic conveniences to having all vocabulary IRIs on a given domain use the same IRI scheme, both points Berners-Lee makes in [8]. But these don't apply to new vocabularies. Is there some other consideration I'm missing? Richard [1] http://linkeddatabook.com/editions/1.0/#htoc10 [2] https://www.w3.org/TR/cooluris/ [3] https://www.w3.org/TR/swbp-vocab-pub/ [4] http://preview.tinyurl.com/om3xxdb [5] http://preview.tinyurl.com/ybeka8yv [6] https://brennan.io/2015/02/20/superfish-explained/ [7] https://www.ietf.org/mail-archive/web/tls/current/msg23251.html [8] https://www.w3.org/DesignIssues/Security-NotTheS.html [9] https://tools.ietf.org/html/draft-hallambaker-esrv-01
Received on Monday, 10 July 2017 10:57:22 UTC