Re: HTTPS and the Semantic Web

Hi Phil,

Thank you for opening the discussion on this topic.  I've been thinking
about it in the context of IRI dereferencing (is a dereference of https://a
allowed to return statements about http://a?), but I will answer the
questions you ask in your blog post first.

> Firstly, is the community agreed that if two URIs differ only in the
scheme (http://, https:// and perhaps
> whatever comes in future) then they identify the same resource?

According to the RDF 1.1 specification an HTTP and an HTTPS IRI are _not_
to be considered equal names.  Specifically: "Two IRIs are equal if and
only if they are equivalent under Simple String Comparison [...].  Further
normalization MUST NOT be performed when comparing IRIs for equality."

One may reason that even though HTTP and HTTPS IRIs are different names,
they can still identify the same resource.  However, the only way in which
this can be reliably expressed is through explicit identity assertion.  If
HTTP and HTTPS IRIs would denote the same resource through community
convention then this would conflict with many aspects of the current RDF
and OWL standards.  For instance, such a convention would contradict valid
explicit assertions such as〈http://a, owl:differentFrom, https://a〉.  Also,〈
http://a, http://b, http://c〉 currently does not entail the six HTTP/HTTPS
permutations of the same triple that would encode the same proposition
according to the community convention.

> Secondly, some members of the Semantic Web community have already moved
to HTTPS [...].
> How steep is the path from where we are today to moving to a more secure
Semantic Web,
> i.e. one that habitually uses HTTPS rather than HTTP?

I have updated an RDF-based Web Site that was running on the SWI-Prolog
based ClioPatria <https://github.com/ClioPatria/ClioPatria> triple store
today.  IME if the API IRIs are converted at the same time as the database
IRIs then the conversion is painless.  If only one of the two is updated
then one would have to make cumbersome translations.

> Thirdly, [...] editing definitions in turtle files such as the one at
http://www.w3.org/ns/dcat#
> to make it explicit that http://www.w3.org/ns/dcat#Dataset is
owl:equivalentClass
> to https://www.w3.org/ns/dcat#Dataset (or even worse, having to go
through and actually
> duplicate all the definitions with the different subject).

If a namespace makes the move from HTTP to HTTPS then identity statements
will have to be created for every concept and instance in the dataset.
This is quite cumbersome, but (as I explain in my answer to your first
question) having HTTP and HTTPS IRIs denote the same resource by community
convention does not play well with existing RDF and OWL standards.  The
cumbersome linkset that will result from making HTTP/HTTPS identity
assertions will only have to be used when the old and new version of the
data are used together.

What's your take on IRI dereferencing in this context?  I can imagine
situations in which a Web server is updated to HTTPS but the names in the
exposed dataset are not.  Should dereferencing https://a return statements
about http://a in that case?  The de facto implementations of IRI
dereferencing do not close the dereferenced result set under identity, so
there may even be cases in which http://a dereferences to a different set
of propositions than https://a?

---
Cheers,
Wouter.

On Fri, May 20, 2016 at 8:08 PM, Phil Archer <phila@w3.org> wrote:
>
> Not a moan about spam, or a CfP, but an actual discussion point, yay!
>
> I've just blogged about our use of HTTPS across www.w3.org which raises
> some questions for this community. Please see
> https://www.w3.org/blog/2016/05/https-and-the-semantic-weblinked-data/
>
> Comments welcome.
>
> Thanks
>
> --
>
>
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
>
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>

Received on Friday, 20 May 2016 22:29:42 UTC