- From: Nathan Rixham <nathan@webr3.org>
- Date: Mon, 10 Jul 2017 14:39:06 +0100
- To: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
- Cc: Richard Smith <richard@ex-parrot.com>, "semantic-web@w3.org" <semantic-web@w3.org>
- Message-ID: <CANiy74yOezSWeDLdK2GiTMGSPj6r0O=4YZ_TrrVNaSk=Pzb+Og@mail.gmail.com>
what about xyz:// and abc://, in ten years we'll have them, what's the well defined way to have schema's and data visible through multiple protocols? On Mon, Jul 10, 2017 at 11:56 AM, Stian Soiland-Reyes < soiland-reyes@manchester.ac.uk> wrote: > I would wholeheartedly support using a https namespace for a new ontology, > coupled with a longevity-secured location. With https://letsencrypt.org/ > there is not really an excuse today to have a non-encrypted web service, > even for experimental work. > > > > > > Probably the biggest real danger with https:// namespace is that someone > will forget to renew the SSL certificates (or recently; SSL clients require > higher encryption quality) and the vocabulary becomes “inaccessible”. > > > > SSL expiry is however not as bad as expiry of a custom domain name for the > ontology (bad idea), as at least there are technical ways to bypass the > expiry warnings. > > > > > > The JSON-LD community has gone with https from day one, as nobody wants > their @context resolution to be vulnerable, or to not work from a https:// > web application. > > > > Thus I find that vocabularies using PURLs with a https://w3id.org/ > registered namespace are pretty much always on https – although some of > them redirect to an insecure http location (a redirection I know Java’s URL > support will protest against) or may have accidentally used > http://w3id.org/ in their declared namespaces. Here are some of them > (where people added a accompanying README): https://github.com/perma-id/ > w3id.org/search?utf8=%E2%9C%93&q=ontology&type= > > > > > > -- > Stian Soiland-Reyes, eScience Lab > School of Computer Science, The University of Manchester > http://orcid.org/0000-0001-9842-9718 > > > > *From: *Richard Smith <richard@ex-parrot.com> > *Sent: *07 July 2017 22:45 > *To: *semantic-web@w3.org > *Subject: *The use of https: IRIs on the semantic web > > > > I hope this is an appropriate mailing list to ask this > question. I'd be happy to be directed elsewhere if not. > > I am defining a new vocabulary. It's not an extension of an > existing vocabulary, nor will it use the same domain as any > existing vocabulary. Should I use https: IRIs? > > Every source I've consulted says I should prefer http: IRIs. > This includes the Linked Data book [1], the W3 note on "Cool > URIs" [2], and the W3 note on best practices for RDF > vocabularies. > > This surprises me slightly. The world seems to be moving > away from HTTP to HTTPS, yet I know of no vocabulary that > uses https: IRIs, and none of the documents quoted above > even discuss the question. I can find discussion on why > ftp: or urn: are less well suited, but nothing about https:. > > Even though IRIs on the semantic web are primarily > identifiers rather than locators, certainly in the linked > data world, the IRI is assumed to be a good place to look > for more information about entity, and various authorities > recommend 303 redirects to documents like RDF or OWL schemas > or other descriptive documents with further information. > > If the IRI is just an identifier, the choice of IRI scheme > is largely irrelevant, and https: is neither better nor > worse than http:. If it's used as a locator, then at least > the initial request to an http: IRI will be made over plain > HTTP. It may then be redirectd to HTTPS, and HSTS headers > may mean subsequent requests go directly over HTTPS, but the > first request is still unencrypted. This has the following > potential problems: > > * It is susceptible to a man-in-the-middle attack. A > malicious party could inject deliberately inconsistent > schema information that may affect processing decisions > made by applications, potentially causing a DoS or otherwise > disrupting the user experience. > > * ISPs may do a MITM themselves to inject adverts into > content (e.g. [4]). Really they oughtn't to do this for > non-HTML content (well, they shouldn't do it at all, but > that's another matter), but that relies on the ISP caring > enough to get this right. > > * The request might be tracked by ISPs. The fact that a > user is using an application that consults a particular > schema is itself valuable information about the customer > that can be sold to advertisers which the US Senate > recently voted to make legal [5]. People are increasingly > privacy conscious and want to minimise this. > > These are all mitigated to a significant degree by using > https: IRIs. So far as I can see, the counter-arguments are > as follows: > > * Not all HTTP client libraries support TLS, but few if any > only support HTTP over TLS. > > * Best practice with TLS changes more frequently than plain > HTTP. An HTTP client from 20 years ago will still > probably work, but TLS has moved on a lot and few servers > now support SSL 3. > > * MITM attacks and injection can still happen over TLS as > the "Superfish" fiasco demonstrated [6], are probably > better prevented with digital signatures. > > * Use of TLS at the transport level prevents HTTP caching on > intermediate servers, unless a trusted root certicicate is > used. > > * ISP tracking still happens over TLS because the SNI field > is not encrypted, and encrypted SNI seems to have been > droped from TLS 1.3 [7]. > > * Whether TLS is used at the transport level should be an > implementation detail that is not exposed in the > vocabulary, a point Berners-Lee has made forcefully [8]. > > * In the future it's likely that there the functionality of > HSTS will be put in DNS [9]. > > These seem fairly weak arguments to me. Digital signatures > can be used regardless of whether the resource was fetched > over TLS, and adding authentication at the top of the > semantic web stack shouldn't preclude encryption at the > bottom. HSTS-in-DNS technologies, which in conjunction with > DNSSEC would alleviate the problem, seem to be stalled, and > I've seen no drafts on the subject since 2011 [9]. > > I'm wondering whether there's something I'm missing, because > almost universally people are still defining vocabularies > using http: IRIs. > > I can see why converting an existing vocabulary from http: > to https: would be difficult, to the point of being > undesirable; I can see too that there are logistic > conveniences to having all vocabulary IRIs on a given domain > use the same IRI scheme, both points Berners-Lee makes in > [8]. But these don't apply to new vocabularies. > > Is there some other consideration I'm missing? > > Richard > > > [1] http://linkeddatabook.com/editions/1.0/#htoc10 > [2] https://www.w3.org/TR/cooluris/ > [3] https://www.w3.org/TR/swbp-vocab-pub/ > [4] http://preview.tinyurl.com/om3xxdb > [5] http://preview.tinyurl.com/ybeka8yv > [6] https://brennan.io/2015/02/20/superfish-explained/ > [7] https://www.ietf.org/mail-archive/web/tls/current/msg23251.html > [8] https://www.w3.org/DesignIssues/Security-NotTheS.html > [9] https://tools.ietf.org/html/draft-hallambaker-esrv-01 > >
Received on Monday, 10 July 2017 13:39:40 UTC