- From: Franck Michel <franck.michel@inria.fr>
- Date: Wed, 3 Aug 2022 11:07:17 +0200
- To: John Walker <john.walker@semaku.com>, "public-lod@w3.org" <public-lod@w3.org>
- Message-ID: <bb0a90b8-af46-bab5-091f-8cf6c2e83381@inria.fr>
Hi John, Thank for raising this issue because I think we will al eventually have to deal with this sort of thin. I have a related use case on a domain where all applications are hidden behind a web application firewall (WAF). The WAF does not allow http requests, whatsoever. It always replies with a 301 redirect with the https version of the url. The next time, the client may send the Referer header since this now complies with the default referrer policy. So, whenever I get a URI dereferencing query, the URI has the https scheme. To dereference it, I need to submit a request to my triple store, something like https://mytriplesstore.org/describe?uri=<the_uri>. So the option I'm thinking of is to do internal uri rewirting and proxying while switching https to http. If I receive a GET request for: *https*://myuri.org/something I'll rewrite it into: https://mytriplesstore.org/describe?uri=*http*://myuri.org/something and transparently proxy the result back to the client. To be honest, I've not yet made sure yet I can do that with Apache rewriting module... But this way, a client should be able to submit a GET for an https uri that would include the Referer header. Hope that helps. Franck. Le 18/07/2022 à 14:39, John Walker a écrit : > Greetings fellow linked data lovers, > > I've been working on a project where linked data is used in the enterprise setting over the past several years. > We've relatively recently chosen to change the approach to naming resources which has thrown up some interesting challenges. > For many years we operated what you might call a typical DTAP approach with separate environments for development/testing, acceptance and production. > Years ago we had opted to use different URIs per environment for the same resource, for which we used subdomains. > So for a product T-800 we might have<http://www.id.example.com/product/t-800> in production,<http://acceptance.id.example.com/product/t-800> in acceptance, and so forth. > > This allowed up to cleanly separate environments and have those URIs resolve to the (representation of) the data from the respective environment. > One downside was having to rewrite URIs when copying data between environments. > > So we recently changed the approach to use the same URI for a resource across all environments. > Part of the thinking here is that we would not use environment-specific URIs in an ontology or vocabulary, so why keep that practice for other identifiers? > This does make it easier to move data and queries between environments. > > However it does mean that to do linked data, where representations are retrievable (dereferenceable) using these names, then we have one service that needs to handle redirects for all environments. > So a client running in the acceptance environment should get the representation with data from the acceptance environment. > To enable this we intend to use a Referer-based deflector pattern to redirect requests based on the Referer from which the request came. > > This means the server handling the redirects needs to know in advance the which Referer maps to which environment. > We think in the enterprise setting that this is manageable to maintain such a list, or handle via some pattern matching. > > The problem comes when we have applications that show these URIs as clickable hyperlinks in the user interface. > The applications are generally served over HTTPS, whilst we use http: scheme in the URIs. > So when clicking these links, the default strict-origin-when-cross-origin referrer policy means the client browser does not send the Referer header to less secure destinations. > As a result our Referer-based deflector does not work as intended (-_-) > > So far as I see we have a couple of options around this: > > 1. Change the URIs from http: to https: scheme, such that the default referrer policy allows to send Referer header. This could be accomplished in a couple of ways: > - Change the scheme in the data as it is stored in the RDF store to use https: in the URIs > - Rewrite the schema to https: in URIs as part of application logic (e.g. in SPARQL query, or in presentation layer) > 2. Change the referrer policy to something more permissive like 'origin-when-cross-origin' that will send the origin hostname to less secure destination (HTTPS-->HTTP). This could be accomplished in two ways: > - The application serving the HTML page where link is shown should add the Referrer-Policy response header on the HTML resource > - The application serving the HTML page where link is shown should set the referrer policy in the HTML content > > I am loth to make this a concern for all applications that work with these names/URIs as it seems a big ask to the developers to all comprehend these concerns and implement accordingly. > If things are not implemented correctly, then clients might leak sensitive information over HTTP (where even the name/URI might be considered sensitive information). > > Whilst the pedantic web in me thinks that http: scheme would be a cooler URI for the long term from persistence angle. > > What are the thoughts around these options? > Do any strike as being (non-)preferred, or is there an existing best practice [ld-bp] to follow in regards to use of http: or https: scheme in URIs? > Is there some other approach we've not considered, like Content-Security-Policy, that would indicate to clients that they should upgrade to https: for these navigational requests? > > [ld-bp]https://www.w3.org/TR/ld-bp/ > > Regards, > > John Walker > Principal Consultant & co-founder > > Semaku B.V. | Torenallee 20 (SFJ 3.065) | 5617 BC Eindhoven | T +31 6 42590072 |https://semaku.com/ > KvK: 58031405 | BTW: NL852842156B01 | IBAN: NL94 INGB 0008 3219 95 > >
Received on Wednesday, 3 August 2022 09:07:33 UTC