Re: Some thoughts on a new publication approach from Robin Berjon on 2013-10-29 (spec-prod@w3.org from October to December 2013)

From: Robin Berjon <robin@w3.org>
Date: Tue, 29 Oct 2013 17:35:43 +0100
To: Simon Sapin <simon.sapin@exyr.org>
CC: spec-prod@w3.org
Message-ID: <526FE3DF.1090709@w3.org>

On 29/10/2013 16:39 , Simon Sapin wrote:
> Le 29/10/2013 15:22, Robin Berjon a écrit :
>> (There are also resource issues to consider, a spider going through all
>> the history of a long and complex draft would likely use up
>> non-negligible resources.)
>
> I don’t think a spider is needed.

It's not, I meant what happens if someone starts spidering the history.

> It could be server side-software that
> serves files directly from the repository based on a commit hash in the
> URL, which AFAIK is not very resource-intensive.

My knowledge of git internals is a bit rusty, but at the very least I 
believe that you need to:

- grab and parse the commit object
- grab (parse, etc.) the root tree that it points to
- depending on the resource you're serving, possibly get at several 
subtrees in sequence until you get a file SHA
- get the file SHA and return that
- do the same again for every subresource loaded by the page

If your implementation language has a good, low-level git library it's 
probably not the end of the world. But if you have to shell out at every 
step, you're going to have a bad time.

Another alternative is that whenever a given SHA is requested, you 
create a clone with its working area set to point to that commit. That 
would be a lot less processing-intensive once the first request is 
handled, but it could possibly use up a lot of space.

In any case, I'm not saying that it's impossible, just that I want to be 
cautious about this. It's not required for v0, we'll look at it once we 
already have something usable.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon

Received on Tuesday, 29 October 2013 16:35:50 UTC