Re: Best practices for choosing http: or https: scheme for URIs

Hi John,

Thank for raising this issue because I think we will al eventually have 
to deal with this sort of thin.

I have a related use case on a domain where all applications are hidden 
behind a web application firewall (WAF). The WAF does not allow http 
requests, whatsoever. It always replies with a 301 redirect with the 
https version of the url. The next time, the client may send the Referer 
header since this now complies with the default referrer policy.

So, whenever I get a URI dereferencing query, the URI has the https scheme.

To dereference it, I need to submit a request to my triple store, 
something like https://mytriplesstore.org/describe?uri=<the_uri>.
So the option I'm thinking of is to do internal uri rewirting and 
proxying while switching https to http. If I receive a GET request for:
*https*://myuri.org/something
I'll rewrite it into:
https://mytriplesstore.org/describe?uri=*http*://myuri.org/something
and transparently proxy the result back to the client.

To be honest, I've not yet made sure yet I can do that with Apache 
rewriting module... But this way, a client should be able to submit a 
GET for an https uri that would include the Referer header.

Hope that helps.

Franck.

Le 18/07/2022 à 14:39, John Walker a écrit :
> Greetings fellow linked data lovers,
>
> I've been working on a project where linked data is used in the enterprise setting over the past several years.
> We've relatively recently chosen to change the approach to naming resources which has thrown up some interesting challenges.
> For many years we operated what you might call a typical DTAP approach with separate environments for development/testing, acceptance and production.
> Years ago we had opted to use different URIs per environment for the same resource, for which we used subdomains.
> So for a product T-800 we might have<http://www.id.example.com/product/t-800>  in production,<http://acceptance.id.example.com/product/t-800>  in acceptance, and so forth.
>
> This allowed up to cleanly separate environments and have those URIs resolve to the (representation of) the data from the respective environment.
> One downside was having to rewrite URIs when copying data between environments.
>
> So we recently changed the approach to use the same URI for a resource across all environments.
> Part of the thinking here is that we would not use environment-specific URIs in an ontology or vocabulary, so why keep that practice for other identifiers?
> This does make it easier to move data and queries between environments.
>
> However it does mean that to do linked data, where representations are retrievable (dereferenceable) using these names, then we have one service that needs to handle redirects for all environments.
> So a client running in the acceptance environment should get the representation with data from the acceptance environment.
> To enable this we intend to use a Referer-based deflector pattern to redirect requests based on the Referer from which the request came.
>
> This means the server handling the redirects needs to know in advance the which Referer maps to which environment.
> We think in the enterprise setting that this is manageable to maintain such a list, or handle via some pattern matching.
>
> The problem comes when we have applications that show these URIs as clickable hyperlinks in the user interface.
> The applications are generally served over HTTPS, whilst we use http: scheme in the URIs.
> So when clicking these links, the default strict-origin-when-cross-origin referrer policy means the client browser does not send the Referer header to less secure destinations.
> As a result our Referer-based deflector does not work as intended (-_-)
>
> So far as I see we have a couple of options around this:
>
> 1. Change the URIs from http: to https: scheme, such that the default referrer policy allows to send Referer header. This could be accomplished in a couple of ways:
>      - Change the scheme in the data as it is stored in the RDF store to use https: in the URIs
>      - Rewrite the schema to https: in URIs as part of application logic (e.g. in SPARQL query, or in presentation layer)
> 2. Change the referrer policy to something more permissive like 'origin-when-cross-origin' that will send the origin hostname to less secure destination (HTTPS-->HTTP). This could be accomplished in two ways:
>      - The application serving the HTML page where link is shown should add the Referrer-Policy response header on the HTML resource
>      - The application serving the HTML page where link is shown should set the referrer policy in the HTML content
>
> I am loth to make this a concern for all applications that work with these names/URIs as it seems a big ask to the developers to all comprehend these concerns and implement accordingly.
> If things are not implemented correctly, then clients might leak sensitive information over HTTP (where even the name/URI might be considered sensitive information).
>
> Whilst the pedantic web in me thinks that http: scheme would be a cooler URI for the long term from persistence angle.
>
> What are the thoughts around these options?
> Do any strike as being (non-)preferred, or is there an existing best practice [ld-bp] to follow in regards to use of http: or https: scheme in URIs?
> Is there some other approach we've not considered, like Content-Security-Policy, that would indicate to clients that they should upgrade to https: for these navigational requests?
>
> [ld-bp]https://www.w3.org/TR/ld-bp/
>
> Regards,
>
> John Walker
> Principal Consultant & co-founder
>
> Semaku B.V. | Torenallee 20 (SFJ 3.065) | 5617 BC Eindhoven | T +31 6 42590072 |https://semaku.com/
> KvK: 58031405 | BTW: NL852842156B01 | IBAN: NL94 INGB 0008 3219 95
>
>

Received on Wednesday, 3 August 2022 09:07:33 UTC