Re: The ability to automatically upgrade a reference to HTTPS from HTTP from Reto Gmür on 2014-08-27 (semantic-web@w3.org from August 2014)

From: Reto Gmür <reto@gmuer.ch>
Date: Wed, 27 Aug 2014 10:35:39 +0200
To: Hugh Glaser <hugh@glasers.org>
Cc: Tim Berners-Lee <timbl@w3.org>, SW-forum Web <semantic-web@w3.org>, Public TAG List <www-tag@w3.org>
Message-ID: <CALvhUEWhsREF95OmvbfrzT5pJu2WT7LK_JPw8qFWGU_rr0XN3A@mail.gmail.com>
On Mon, Aug 25, 2014 at 5:46 PM, Hugh Glaser <hugh@glasers.org> wrote:

> Thanks Reto,
>
> > On 25 Aug 2014, at 15:00, Reto Gmür <reto@gmuer.ch> wrote:
> >
> >
> >
> >
> >> On Sat, Aug 23, 2014 at 10:13 PM, Hugh Glaser <hugh@glasers.org> wrote:
> >> Hi,
> >> Of course I can’t comment on TAG issues, but you did cc this to SemWeb,
> so I maybe I dare to comment.
> >> (Although I have always found indicating the security in the URI rather
> a strange thing, but I am sure there were/are good reasons for it.)
> >>
> >> There seems to be some confusion here between the URI and what it
> serves, something well know to SemWeb people :-)
> >> Or at least some stuff that might usefully be picked apart.
> >> And I think I detect it in your initial email.
> >>
> >> On 22 Aug 2014, at 18:00, Tim Berners-Lee <timbl@w3.org> wrote:
> >>
> >> >
> >> > There is a massive and reasonable push to get everything from HTTP
> space into HTTPS.
> >> > While this is laudable, the effect on the web as a hypertext system
> could be
> >> > very severe, in that links into http: space will basically break all
> over the place.
> >> > Basically every link in the HTTP web we are used to breaks.
> >> >
> >> > Here is a proposal, that we need this convention:
> >> >
> >> >        If two URIs differ only in the 's' of 'https:', then they may
> never be used for different things.
> >> This, to me, implies the URIs identify the same “thing” (NIR?).
> >> So
> >> <http://foo.bar/baz> owl:sameAs <https://foo.bar/baz>
> >> for all foo, bar and baz…
> >> That sounds like a pretty sound suggestion to me.
> >> Especially in the SemWeb/Linked Data world.
> >>
> >> Then you go on to talk about GET and the documents that come back in a
> similar vein.
> >> When we consider what gets served from a GET (or whatever we do), the
> owl:sameAs doesn’t mean that the documents served have the same content.
> >> As you say, for html, there might be a message saying to use the  other
> one - and of course that might actually be true of the RDF.
> >> I am well used to getting different information about NIRs from a
> variety of resources - after all, no-one enforces that the ttl, n3 and rdf
> documents have exactly the same stuff in them, and sometimes they don’t.
> And if html is also served, it certainly is very different from the RDF.
> >>
> >> As you say later, switching from one to the other should not give
> misleading information.
> >> Simply by considering the two URIs owl:sameAs, that should essentially
> make that happen, even if the server chooses to provide different RDF on
> the two URIs.
> >> They are just two (possibly) different documents about the same NIR.
> >>
> >> If we allow the sameAs meaning, then we can go on to discuss whether
> the documents actually can be assumed to be the same...
> >>
> > If the resource A and B are owl:sameAs we can conclude that if X is an
> appropriate representation of A it is also an appropriate representation of
> B. So while A and B might happen to dereference to different
> representations each of there representation is a representation of the
> single resource identified by both A and B.
> I am actually not quite sure I understand what the “problem” you are
> describing is.
>

What I was trying to say is that the set of represenations a server might
reasonably return on an HTTP response than on HTTPS request. But rethinking
it I think I was wrong to conclude from this that the resources are not
owl:sameAs. The server might well exclude some representations basing on
other criteria than the meaning of the requested URI (e.g. accept hesders,
authenticated user, encryption level) so this is no argument against the
identity of the resource.

> >
> > You can also see the problem when you try to express your redirect
> message in RDF. You cannot just send back the triple
> >
> > <http://foo.bar/baz> <eg:redirectsTo> <https://foo.bar/baz> .
> Possibly because I am not talking about a “redirect”, and I am not sure
> what your interpretation of <eg:redirectsTo> is.
> >
> >
> > As if <http://foo.bar/baz> <owl:sameAs> <https://foo.bar/baz>,
> >
> > the following triple can infered: <https://foo.bar/baz>
> <eg:redirectsTo> <http://foo.bar/baz> - the opposite of what was intended.
> Well, yes, since it is actually the same information as the first
> redirects triple the other way round, if we have the sameAs triple in our
> system, or have decided that we will have virtual samosas for all the URIs..
> It is sort of strange to say something <eg:redirectsTo> to itself, which
> is what both of them say (I think), for any meaning I can think of for
> <eg:redirectsTo>.
>
> In fact saying <eg:redirectsTo> at all between the URIs is sort of
> superfluous.
> Operationally the sameAs suggests to the consuming agent that if they want
> to find out about http://foo.bar/baz they may well want to get the stuff
> from https://foo.bar/baz, and vice versa (and in this case would probably
> be well advised to).
>
> It seems to me that this addresses Tim’s issue fine.
> If I have a http URI, then I can assume that any similar https URI is the
> same resource (for purposes of Linked Data/SemWeb, primarily).
>

I think regardless wether eg:redirectsTo is usefull the point remains that
if the resources are automatically identical it becomes harder to express
some things in RDF. If I know that HTTP connection to my server are
regularly intercepted and the client gets fake responses from an evil man
in the middle I want to make sure people never dereference the HTTP URI. If
the two URIs are defined to identify the same resource it would require
some ontological terms to express "derefence me over HTTPS only".


> If I decide I actually want to know about the http URI, I can fetch the
> https document as well/instead (particularly if the http URI 404ed).
> The https GET may 404, of course, or it may give me back a document.
>
404 is very unspecific, a 403 with an appropriate message might be better.


> Note that the document doesn’t actually have to have any RDF about either
> URI at all! Or even RDF. But if it does have RDF about the https URI (or
> the http one for that matter), then it will be useful, and I have the
> sameAs between them to keep everything running smoothly.
>
> One might think that seeAlso is similar - it is for documents, and might
> be close to your <eg:redirectsTo>, but doesn’t let me do the really rich ID
> stuff that Tim (I think) wants.
> >
> > If you wanted to reference to a specific resource name in RDF you would
> have to use literals, but this would make things significantly more
> complex. Think of an HTML redirect page with RDFa, the value of the href
> attribute is considered to name a resource by IRI, so by owl-inference it
> would be exchangeable with the other one.
> >
> > So I think the original owl:sameAs suggestion ("If two URIs differ only
> in the 's' of 'https:', then they may never be used for different things")
> should be relaxed to allow to describe a distinction between the two such
> as needed to express a redirect.
> I think I am still worried about what <eg:redirectsTo> means. It seems to
> be about documents, where as sameAs is about resources.
> If I understand you correctly, then you are here pointing out the problem
> of it not being possible to actually “unpick” sameAs relations - you either
> get them or you don’t (well, at least in theory).
> This is true, and a much wider issue we should avoid, I think.
>
The set of resources is a superset of the set of documents. Each
representation of a resource can have its own URI (which should be the
value of the Content-Location header), there is no fundamental problem in
saying that the HTTP ans HTTPS URI identify the same resource in every
case, but can cause difficulties in combination with the best practice that
dereferencing any HTTP-URI should return a representation of the resource
(especially if we have some man in the middle sending fake responses).

Cheers,
Reto



>
> Cheers
> >
> > Cheers,
> > Reto
> >
> >
> >> Hope that helps a bit :-)
> >>
> >> Hugh
> >>
> >> PS
> >> I might fix sameAs.org to automatically give the other URI for any
> requests - would that be a good idea?
> >> Anyway want to tell me it would help them?
> >> >
> >> > That's sounds like a double negative way of putting it, but avoids
> saying things we don't want to mean.
> >> > I don't mean you must always serve up https or always serve up http.
> >> > Basically we are saying the 's' isn't a part of the identity of the
> resource, it is just a tip.
> >> >
> >> > So if I have successfully retrieved https:x  (for some value of x)
> and I have a link to http:x then I can satisfy following the link, by
> presenting what I got from https:x.
> >> > I know that whatever I get if I do do the GET on the http:x, it can't
> be different from what I have.
> >> >
> >> > The opposite however is NOT true, as a page which links to https:x
> requires the transaction to be made securely.  Even if I have already
> looked up http:x < i can't assume that I can use it for htts:x.  But for
> reasons of security alone -- it would still be against the principle if the
> server did deliberately serve something different.
> >> >
> >> > This means that if you have built two completely separate web sites
> in HTTPS and HTTP space, and you may have used the same path (module the
> 's') for different things, then you are in trouble. But who would do that?
>  I assume the large search engines know who.
> >> >
> >> > I suppose an exception for human readable pages may be that the http:
> version has a warning on it that the user should accessing the https: one.
> >> >
> >> > With linked data pages, where a huge amount of the Linked Open Data
> cloud is in http: space last time I looked, systems using URIs for
> identifiers need to be able to canonicalize them so tht anything said about
> http:x applies equally to https:x.
> >> >
> >> > What this means is that a client given an http:  URL in a reference
> is always free to try out the HTTPS, just adding an S, and use result if
> the   is successful.
> >> > Sometimes, if bowser security prevents a https-origin web page from
> loading any http resources as Firefox proudly does, [1], is you are writing
> a general purpose web app which has to read arbitrary web resources with
> XHR, ironically, you have to serve it over HTTP!     In the mean time, many
> client libraries will I assume need to just try HTTPS as that is all they
> are allowed.
> >> >
> >> > Or do we have to only build serious internet applications as browser
> extensions or native apps?
> >> >
> >> > For this any many related reasons, we need to first get a very high
> level principle that if a client switches from http to http of its own
> accord, then it can't be given misleading data as a result.
> >> >
> >> > I suspect has been discussed in many fora -- apologies if the issue
> is already noted and resolved, and do point to where it has
> >> >
> >> > TimBL
> >> >
> >> > [1]
> https://blog.mozilla.org/tanvi/2013/04/10/mixed-content-blocking-enabled-in-firefox-23/
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > In order for this switch to be made, transitions
> >> >
> >>
> >> --
> >> Hugh Glaser
> >>    20 Portchester Rise
> >>    Eastleigh
> >>    SO50 4QS
> >> Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
> >>
> >>
> >>
> >
> --
> Hugh Glaser
>    20 Portchester Rise
>    Eastleigh
>    SO50 4QS
> Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
>
>
Received on Wednesday, 27 August 2014 08:36:09 UTC