Re: Best practices for choosing http: or https: scheme for URIs from Christopher Gutteridge on 2022-08-03 (public-lod@w3.org from August 2022)

From: Christopher Gutteridge <totl@soton.ac.uk>
Date: Wed, 3 Aug 2022 11:23:52 +0100
To: Franck Michel <franck.michel@inria.fr>, John Walker <john.walker@semaku.com>, "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <45581325-58f1-ff11-69b0-b723ecacbf06@soton.ac.uk>
Resolvable URIs are a fundamentally flawed system because they rely on 
DNS registrations which will not last indefinitely and are already 
failing. They also seem to suffer from the assumption that the thing you 
get back from resolving a URI is then identified by a URI, but there's 
no guarantee that the result of two resolutions of the same URI will 
give the same content... I really wish the convention had been that you 
got back something relevant to the URI you resolve but the URI of the 
response was part of the response headers not assumed from the request.

We also went through the pain of deciding if we should move to using 
HTTPS URIs and decided not to but our open data service was already on 
the decline by that point as it didn't provide enough measurable value 
to the organisation to justify ongoing support. It's main value was as a 
data warehouse of data approved for publication that could be used on 
small infrastructure web projects without having to get permission for 
everything all over again every time. But the university got rid of the 
innovation team in the IT department so even that's no longer useful -- 
we're only doing formal process services now. The days of the Wild Web 
are over. But the Wild West only lasted 30 years...


On 03/08/2022 10:07, Franck Michel wrote:
> *CAUTION:* This e-mail originated outside the University of Southampton.
> Hi John,
>
> Thank for raising this issue because I think we will al eventually 
> have to deal with this sort of thin.
>
> I have a related use case on a domain where all applications are 
> hidden behind a web application firewall (WAF). The WAF does not allow 
> http requests, whatsoever. It always replies with a 301 redirect with 
> the https version of the url. The next time, the client may send the 
> Referer header since this now complies with the default referrer policy.
>
> So, whenever I get a URI dereferencing query, the URI has the https 
> scheme.
>
> To dereference it, I need to submit a request to my triple store, 
> something like https://mytriplesstore.org/describe?uri=<the_uri>.
> So the option I'm thinking of is to do internal uri rewirting and 
> proxying while switching https to http. If I receive a GET request for:
> *https*://myuri.org/something
> I'll rewrite it into:
> https://mytriplesstore.org/describe?uri=*http*://myuri.org/something
> and transparently proxy the result back to the client.
>
> To be honest, I've not yet made sure yet I can do that with Apache 
> rewriting module... But this way, a client should be able to submit a 
> GET for an https uri that would include the Referer header.
>
> Hope that helps.
>
> Franck.
>
> Le 18/07/2022 à 14:39, John Walker a écrit :
>> Greetings fellow linked data lovers,
>>
>> I've been working on a project where linked data is used in the enterprise setting over the past several years.
>> We've relatively recently chosen to change the approach to naming resources which has thrown up some interesting challenges.
>> For many years we operated what you might call a typical DTAP approach with separate environments for development/testing, acceptance and production.
>> Years ago we had opted to use different URIs per environment for the same resource, for which we used subdomains.
>> So for a product T-800 we might have<http://www.id.example.com/product/t-800>  in production,<http://acceptance.id.example.com/product/t-800>  in acceptance, and so forth.
>>
>> This allowed up to cleanly separate environments and have those URIs resolve to the (representation of) the data from the respective environment.
>> One downside was having to rewrite URIs when copying data between environments.
>>
>> So we recently changed the approach to use the same URI for a resource across all environments.
>> Part of the thinking here is that we would not use environment-specific URIs in an ontology or vocabulary, so why keep that practice for other identifiers?
>> This does make it easier to move data and queries between environments.
>>
>> However it does mean that to do linked data, where representations are retrievable (dereferenceable) using these names, then we have one service that needs to handle redirects for all environments.
>> So a client running in the acceptance environment should get the representation with data from the acceptance environment.
>> To enable this we intend to use a Referer-based deflector pattern to redirect requests based on the Referer from which the request came.
>>
>> This means the server handling the redirects needs to know in advance the which Referer maps to which environment.
>> We think in the enterprise setting that this is manageable to maintain such a list, or handle via some pattern matching.
>>
>> The problem comes when we have applications that show these URIs as clickable hyperlinks in the user interface.
>> The applications are generally served over HTTPS, whilst we use http: scheme in the URIs.
>> So when clicking these links, the default strict-origin-when-cross-origin referrer policy means the client browser does not send the Referer header to less secure destinations.
>> As a result our Referer-based deflector does not work as intended (-_-)
>>
>> So far as I see we have a couple of options around this:
>>
>> 1. Change the URIs from http: to https: scheme, such that the default referrer policy allows to send Referer header. This could be accomplished in a couple of ways:
>>      - Change the scheme in the data as it is stored in the RDF store to use https: in the URIs
>>      - Rewrite the schema to https: in URIs as part of application logic (e.g. in SPARQL query, or in presentation layer)
>> 2. Change the referrer policy to something more permissive like 'origin-when-cross-origin' that will send the origin hostname to less secure destination (HTTPS-->HTTP). This could be accomplished in two ways:
>>      - The application serving the HTML page where link is shown should add the Referrer-Policy response header on the HTML resource
>>      - The application serving the HTML page where link is shown should set the referrer policy in the HTML content
>>
>> I am loth to make this a concern for all applications that work with these names/URIs as it seems a big ask to the developers to all comprehend these concerns and implement accordingly.
>> If things are not implemented correctly, then clients might leak sensitive information over HTTP (where even the name/URI might be considered sensitive information).
>>
>> Whilst the pedantic web in me thinks that http: scheme would be a cooler URI for the long term from persistence angle.
>>
>> What are the thoughts around these options?
>> Do any strike as being (non-)preferred, or is there an existing best practice [ld-bp] to follow in regards to use of http: or https: scheme in URIs?
>> Is there some other approach we've not considered, like Content-Security-Policy, that would indicate to clients that they should upgrade to https: for these navigational requests?
>>
>> [ld-bp]https://www.w3.org/TR/ld-bp/
>>
>> Regards,
>>
>> John Walker
>> Principal Consultant & co-founder
>>
>> Semaku B.V. | Torenallee 20 (SFJ 3.065) | 5617 BC Eindhoven | T +31 6 42590072 |https://semaku.com/
>> KvK: 58031405 | BTW: NL852842156B01 | IBAN: NL94 INGB 0008 3219 95
>>
>>
>
-- 

Christopher Gutteridge <totl@soton.ac.uk>
You should read our team blog at http://blog.soton.ac.uk/webteam/

Industrial Action

Sadly my trade union is currently in dispute over pay, pensions and 
casualisation. You can read more at 
https://www.ucu.org.uk/article/11896/Why-were-taking-action

The Southampton branch is currently working on "Action Short Of a 
Strike" (ASOS). This means only doing work we are contracted to do, so 
no working on any additional voluntary tasks. It's frustating, but so 
are below inflation pay rises.

As a result, so far I've had to turn down or stop working on:

  * Coordinating the iSolutions Communities of Practice program
  * Coordinating the System Documentation Community of Pracice
  * Helping with a workshop on data visualisation
  * Providing a Minecraft activity for the Archaeology family day
  * Helping another team recruit someone for a post
  * Not helped a colleague debug something in a service I'm an expert on
    but is no longer my responsibility
  * Not offering to "keep an eye" on changes impacting our systems while
    I'm on holiday

I look forward to getting back into these kinds of activity as soon as 
the industrial action permits.

Please do not cover for people taking ASOS. If it causes problems, it is 
helpful to make management aware. The most unhelpful thing is for people 
to mitigate the impacts of industrial action or hide it from management. 
The best thing to help is to join the union and the action and/or donate 
to the strike fund.
Received on Wednesday, 3 August 2022 10:24:10 UTC