Re: Thinking about cross references and ReSpec from Peter Linss on 2014-10-02 (spec-prod@w3.org from October to December 2014)

From: Peter Linss <peter.linss@hp.com>
Date: Thu, 2 Oct 2014 10:08:46 -0700
To: Tobie Langel <tobie.langel@gmail.com>
Cc: Robin Berjon <robin@w3.org>, Shane McCarron <shane@aptest.com>, "spec-prod@w3.org Prod" <spec-prod@w3.org>
Message-Id: <7E5541B0-2E2D-4363-825B-491A4BCD024A@hp.com>

On Oct 2, 2014, at 4:41 AM, Tobie Langel <tobie.langel@gmail.com> wrote:

> On Thu, Oct 2, 2014 at 12:10 PM, Robin Berjon <robin@w3.org> wrote:
> On 02/10/2014 10:10 , Tobie Langel wrote:
> My plan for this solution is to do daily crawling of relevant specs and
> extract the dfn and put them in a DB. Further refinements could include
> a search API, like I added for Specref and exposed within Respec.
> 
> Could you somehow reuse or modify what Shepherd does here? If it includes enough information (or additional extraction can be easily added) and new specs can be added to its crawling (which I suspect ought to be relatively easy — I recall Peter's code being able to process quite a lot of different documents)

Yes, adding specs to it’s crawl is trivial.

> then we can all align, which I reckon is a win (even without counting the saved cycles).
> 
> I've bumped into way too many painful issues with non browser-based HTML parsers to waste more time with them.

FWIW, Shepherd uses html5lib and AFAICT sees a browser equivalent DOM which it traverses. This hasn’t been an issue to date.

> I'm also very interested in gathering data from editor's draft which requires a JS runtime for those which use ReSpec.

At one point I did start to add code to Shepherd’s spec parser (which actually has been completely factored out of Shepherd these days) to handle ReSpec source files. I stopped because ReSpec was under heavy development at the time and I didn’t want to chase a moving target.

Finishing this wouldn’t be that big a deal (and would be made easier if ReSpec uses the Bikeshed dfn markup).

> 
> Shepherd exposes an API that allows you to just simply dump the data it has. If you look inside update.py in Bikeshed you can see how it works. What Bikeshed does is, instead of querying services live, allow the user to regularly call bikeshed update and get a fresh DB (of a bunch of stuff). The same could be injected into SpecRef.

Yes, and it’s all JSON over http(s). You can currently query anchor data per spec or simply dump the entire DB. More advanced queries can be added to the API easily.

The API is also self-described via a json-home page (per [1]), Bikeshed uses a Python APIClient I wrote that uses the json-home page to process requests, it’s available stand-alone on GitHub[2]. 

> 
> That sounds like a worthwhile idea to explore but seems somewhat orthogonal to this project, no?
> 
> My focus will be on the gathering the data and providing a JSON API. Not
> on actual implementation within ReSpec (which I won't have cycles for at
> that time, I'm afraid).
> 
> The hard part is getting the data. Hooking it into ReSpec oughtn't be difficult, unless I'm missing something.
> 
> Good. (I haven't thought about this at all, so I'll take your word for it). 
> 
> --tobie

[1] http://tools.ietf.org/html/draft-nottingham-json-home-03
[2] https://github.com/plinss/apiclient

Received on Thursday, 2 October 2014 17:09:19 UTC