W3C home > Mailing lists > Public > public-csv-wg@w3.org > June 2015

Re: test URL redirections...

From: Ivan Herman <ivan@w3.org>
Date: Sat, 13 Jun 2015 20:13:27 +0200
Cc: Gregg Kellogg <gregg@greggkellogg.net>, Jeni Tennison <jeni@jenitennison.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <0F313C3C-33ED-4BC4-A03D-D0D3233130F2@w3.org>
To: Gregg Kellogg <gregg@greggkellogg.com>
Ah, ok, I understand.

Actually, I would expect a tool to be prepared to local files, too, whether using file:/// url-s, or direct file access. But that is a detail.

Thanks.

Ivan

---
Ivan Herman
Tel:+31 641044153
http://www.ivan-herman.net

(Written on mobile, sorry for brevity and misspellings...)



> On 13 Jun 2015, at 15:23, Gregg Kellogg <gregg@greggkellogg.com> wrote:
> 
>> On Jun 13, 2015, at 12:20 AM, Ivan Herman <ivan@w3.org> wrote:
>> 
>> 
>>> On 12 Jun 2015, at 19:29 , Gregg Kellogg <gregg@greggkellogg.net> wrote:
>>> 
>>> Caveat to using a local copy is for missing files. The metadata resolution calls for looking at a sequence of paths, and relies upon getting back a 404 Not Found. We would need to add something to the test entry to tell a client to act as if the requested file is not found; right now, my tests actually try to retrieve the file. It’s not for that many tests, but it will cause requests to be sent to the test-suite endpoint.
>> 
>> Hm. I am not really sure I understand. If a request goes first to the w3c site, but that request is then sent to github, doesn't the latter sends back a 404 to the original client? Ie, if the 303 response is handled properly, why wouldn't that work?
> 
> It works just fine, but our advice is for developers to download files and run locally. My point is that even in this case, you can't avoid a remote request. At least the way I've implemented it, if I have a request with a URI prefix that is in a cache directory, I first look for the file there, and use that, if it exists. Otherwise, I request the original URL, and in this case get back a 404.
> 
> I suppose simply returning the 404 if the file doesn't exist locally would work, but I've typically used that to handle other corner-cases before.
> 
> Gregg
> 
>> Ivan
>> 
>>> 
>>> Gregg Kellogg
>>> gregg@greggkellogg.net
>>> 
>>>> On Jun 12, 2015, at 7:58 AM, Gregg Kellogg <gregg@greggkellogg.com> wrote:
>>>> 
>>>> On Jun 12, 2015, at 2:21 AM, Ivan Herman <ivan@w3.org> wrote:
>>>>> 
>>>>> 
>>>>>>> On 11 Jun 2015, at 21:17 , Gregg Kellogg <gregg@greggkellogg.net> wrote:
>>>>>>> 
>>>>>>> On Jun 11, 2015, at 4:13 AM, Ivan Herman <ivan@w3.org> wrote:
>>>>>>> 
>>>>>>> Gregg,
>>>>>>> 
>>>>>>> I must admit I am in a territory that I do not really know…
>>>>>>> 
>>>>>>> I did set up a redirection, through .htaccess, in http://www.w3.org/2013/csvw/ using:
>>>>>> 
>>>>>> Should this be 2014 or 2015? 2013 was based on the RDF WG active time, AFAIK.
>>>>> 
>>>>> /2013/csvw is the official homepage of the group (2013 being the year when it started).
>>>> 
>>>> Okay, seems like a long time ago now!
>>>> 
>>>>>>> RewriteRule ^tests/(.*) http://w3c.github.io/csvw/tests/$1 [R=303]
>>>>>>> 
>>>>>>> However, I am not sure it will really work for the test suites. If, in a browser, I type in, say,
>>>>>>> 
>>>>>>> http://www.w3.org/2013/csvw/tests/test011/result.json
>>>>>>> 
>>>>>>> then indeed get to the relevant json file whose URI is http://w3c.github.io/csvw/tests/test011/tree-ops.csv. However, the redirection is made through a 303 flag, but that means that the browser address bar will show the w3c.github.io address, not the www.w3.org one. I do not know whether that matters.
>>>>>> 
>>>>>> It works for my test runner (mostly), and I think this is a reasonable way to go about it.
>>>>> 
>>>>> Great.
>>>> 
>>>> RDF and Validation tests all work, JSON will need to be updated with that location.
>>>> 
>>>>>> In the case of the JSON tests, because they contain absolute URLs, they will need to be updated with this location. (RDF tests can make use of the result location and use relative URLs internally).
>>>>>> 
>>>>>> Unfortunately, without doing a real proxy, we can’t also set HTTP response headers beyond the redirect. A caching proxy might also better allow HTTP caching of the results, so as to not burden the W3C infrastructure, and allow clients to reasonable perform client-side caching; it might be worth investigating with the systems team if something like this is possible.
>>>>> 
>>>>> I am almost sure that the security aspects will dominate: a proxy being, essentially, github.com would not be acceptable. And, I must admit, I understand...
>>>>> 
>>>>>> Another possibility would do a post-receive hook to pull the data from GitHub on commit; we do this with rdfa.info and json-ld.org to automatically update site contents on commit. This would avoid any redirect issues. It’s implementable using several different mechanisms available in PHP, Ruby and most other infrastructures. It works by setting up a URL to receive an HTTP POST when commits are made, which causes it to do a git pull to refresh a local directory. That might be the simplest thing, if it is possible. See https://github.com/json-ld/json-ld.org/tree/master/utils for how json-ld.org does it using PHP.
>>>>> 
>>>>> I will ask, but the issue is that all this is related to CVS, too (I presume in contrast to the json-ld.org site): the W3C Web site is, for better or worse, one giant CVS repository…
>>>>> 
>>>>> I think we should not rely on this for now.
>>>>> 
>>>>> Does the test suite works, for the most part, offline? I mean if I download the test suite then I should be able to test most of the features, right? If so, we should add a notice asking people to download things from github to avoid network bottlenecks. We can then install the test suite manually on W3C when we issue a Proposed Recommendation.
>>>> 
>>>> With enough client setup, yes it does. I intercept all requests to load a URL and replace them with my local path for test-suite, vocabulary and .well-known files, faking HTTP headers. Of course, this also makes things run much faster. Anyone depending on this for development would likely want to do something similar.
>>>> 
>>>>>>> More importantly, what it requires is for the csvw clients to use a URI library that automatically handles redirection. Is that always the case? I do not know. (I tried to use the [P] flag, but that (understandably) does not work because it would instruct apache to use an external server as proxy which W3C does not allow for security reasons.)
>>>>>> 
>>>>>> Or it requires that developers do this on their own; I typically handle redirect myself to make sure that redirect semantics are handled properly, as most URL libraries don’t honor the details of this very well, IMO.
>>>>>> 
>>>>>>> Strangely enough: I tried to do a wget on the w3c address, and I got a 404. I then realized that doing a wget on the github.io address leads to a 404; I am not sure how github handles these requests.
>>>>>> 
>>>>>> curl -L http://www.w3.org/2013/csvw/tests/test011/result.json works okay.
>>>>> 
>>>>> O.k. I will not try to understand the difference between curl and wget:-)
>>>> 
>>>> Okay, I'll update the test-suite and test vocabulary documentation with this location.
>>>> 
>>>> Gregg
>>>> 
>>>>> Cheers
>>>>> 
>>>>> Ivan
>>>>> 
>>>>>> 
>>>>>>> So… do you think it is possible to use the test cases with such caveats? Of course it would be nice but I am a bit afraid some clients may have issues handling the 303…
>>>>>>> 
>>>>>>> Any good ideas?
>>>>>> 
>>>>>> I trust developers to work it out properly. Once we settle on a permanent location, we can make that change. I do want to track down my specific test failures, though.
>>>>>> 
>>>>>> Gregg
>>>>>> 
>>>>>>> Ivan
>>>>>>> 
>>>>>>> P.S. I am still waiting for our system people to set up /.well-known for me.
>>>>>>> 
>>>>>>> ----
>>>>>>> Ivan Herman, W3C
>>>>>>> Digital Publishing Activity Lead
>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>>> mobile: +31-641044153
>>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>>> 
>>>>> 
>>>>> ----
>>>>> Ivan Herman, W3C
>>>>> Digital Publishing Activity Lead
>>>>> Home: http://www.w3.org/People/Ivan/
>>>>> mobile: +31-641044153
>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>> 
>> 
>> 
>> 
Received on Saturday, 13 June 2015 18:13:39 UTC

This archive was generated by hypermail 2.3.1 : Saturday, 13 June 2015 18:13:40 UTC