- From: Pete Rivett <pete.rivett@adaptive.com>
- Date: Fri, 7 Jul 2017 22:03:29 +0000
- To: Richard Smith <richard@ex-parrot.com>, "semantic-web@w3.org" <semantic-web@w3.org>
I agree with your points, Richard. FWIW the Financial Industry Business Ontology (FIBO), which I'm involved in, uses https:// for all IRIs - see the published ontology files [1]: This is an extensive ontology developed by Enterprise Data Management Council (EDMC) and supported by many major financial institutions. Regards Pete [1] https://spec.edmcouncil.org/fibo/ontology/master/latest/tree.html Pete Rivett (pete.rivett@adaptive.com) CTO, Adaptive Inc 65 Enterprise, Aliso Viejo, CA 92656 cell: +1 949 338 3794 Follow me on Twitter @rivettp or http://twitter.com/rivettp -----Original Message----- From: Richard Smith [mailto:richard@ex-parrot.com] Sent: Friday, July 7, 2017 2:35 PM To: semantic-web@w3.org Subject: The use of https: IRIs on the semantic web I hope this is an appropriate mailing list to ask this question. I'd be happy to be directed elsewhere if not. I am defining a new vocabulary. It's not an extension of an existing vocabulary, nor will it use the same domain as any existing vocabulary. Should I use https: IRIs? Every source I've consulted says I should prefer http: IRIs. This includes the Linked Data book [1], the W3 note on "Cool URIs" [2], and the W3 note on best practices for RDF vocabularies. This surprises me slightly. The world seems to be moving away from HTTP to HTTPS, yet I know of no vocabulary that uses https: IRIs, and none of the documents quoted above even discuss the question. I can find discussion on why ftp: or urn: are less well suited, but nothing about https:. Even though IRIs on the semantic web are primarily identifiers rather than locators, certainly in the linked data world, the IRI is assumed to be a good place to look for more information about entity, and various authorities recommend 303 redirects to documents like RDF or OWL schemas or other descriptive documents with further information. If the IRI is just an identifier, the choice of IRI scheme is largely irrelevant, and https: is neither better nor worse than http:. If it's used as a locator, then at least the initial request to an http: IRI will be made over plain HTTP. It may then be redirectd to HTTPS, and HSTS headers may mean subsequent requests go directly over HTTPS, but the first request is still unencrypted. This has the following potential problems: * It is susceptible to a man-in-the-middle attack. A malicious party could inject deliberately inconsistent schema information that may affect processing decisions made by applications, potentially causing a DoS or otherwise disrupting the user experience. * ISPs may do a MITM themselves to inject adverts into content (e.g. [4]). Really they oughtn't to do this for non-HTML content (well, they shouldn't do it at all, but that's another matter), but that relies on the ISP caring enough to get this right. * The request might be tracked by ISPs. The fact that a user is using an application that consults a particular schema is itself valuable information about the customer that can be sold to advertisers which the US Senate recently voted to make legal [5]. People are increasingly privacy conscious and want to minimise this. These are all mitigated to a significant degree by using https: IRIs. So far as I can see, the counter-arguments are as follows: * Not all HTTP client libraries support TLS, but few if any only support HTTP over TLS. * Best practice with TLS changes more frequently than plain HTTP. An HTTP client from 20 years ago will still probably work, but TLS has moved on a lot and few servers now support SSL 3. * MITM attacks and injection can still happen over TLS as the "Superfish" fiasco demonstrated [6], are probably better prevented with digital signatures. * Use of TLS at the transport level prevents HTTP caching on intermediate servers, unless a trusted root certicicate is used. * ISP tracking still happens over TLS because the SNI field is not encrypted, and encrypted SNI seems to have been droped from TLS 1.3 [7]. * Whether TLS is used at the transport level should be an implementation detail that is not exposed in the vocabulary, a point Berners-Lee has made forcefully [8]. * In the future it's likely that there the functionality of HSTS will be put in DNS [9]. These seem fairly weak arguments to me. Digital signatures can be used regardless of whether the resource was fetched over TLS, and adding authentication at the top of the semantic web stack shouldn't preclude encryption at the bottom. HSTS-in-DNS technologies, which in conjunction with DNSSEC would alleviate the problem, seem to be stalled, and I've seen no drafts on the subject since 2011 [9]. I'm wondering whether there's something I'm missing, because almost universally people are still defining vocabularies using http: IRIs. I can see why converting an existing vocabulary from http: to https: would be difficult, to the point of being undesirable; I can see too that there are logistic conveniences to having all vocabulary IRIs on a given domain use the same IRI scheme, both points Berners-Lee makes in [8]. But these don't apply to new vocabularies. Is there some other consideration I'm missing? Richard [1] http://linkeddatabook.com/editions/1.0/#htoc10 [2] https://www.w3.org/TR/cooluris/ [3] https://www.w3.org/TR/swbp-vocab-pub/ [4] http://preview.tinyurl.com/om3xxdb [5] http://preview.tinyurl.com/ybeka8yv [6] https://brennan.io/2015/02/20/superfish-explained/ [7] https://www.ietf.org/mail-archive/web/tls/current/msg23251.html [8] https://www.w3.org/DesignIssues/Security-NotTheS.html [9] https://tools.ietf.org/html/draft-hallambaker-esrv-01
Received on Friday, 7 July 2017 22:04:16 UTC