Re: Thinking about cross references and ReSpec from Tobie Langel on 2014-10-02 (spec-prod@w3.org from October to December 2014)

From: Tobie Langel <tobie.langel@gmail.com>
Date: Thu, 2 Oct 2014 20:57:50 +0200
To: Peter Linss <peter.linss@hp.com>
Cc: Robin Berjon <robin@w3.org>, Shane McCarron <shane@aptest.com>, "spec-prod@w3.org Prod" <spec-prod@w3.org>
Message-ID: <CAMK=o4fRcJ=ZofhewNHdWudxtM6odeW+eFMqosqda2rSrnK2-A@mail.gmail.com>

On Thu, Oct 2, 2014 at 7:08 PM, Peter Linss <peter.linss@hp.com> wrote:

>
> On Oct 2, 2014, at 4:41 AM, Tobie Langel <tobie.langel@gmail.com> wrote:
>
> On Thu, Oct 2, 2014 at 12:10 PM, Robin Berjon <robin@w3.org> wrote:
>
>> On 02/10/2014 10:10 , Tobie Langel wrote:
>>
>>> My plan for this solution is to do daily crawling of relevant specs and
>>> extract the dfn and put them in a DB. Further refinements could include
>>> a search API, like I added for Specref and exposed within Respec.
>>>
>>
>> Could you somehow reuse or modify what Shepherd does here? If it includes
>> enough information (or additional extraction can be easily added) and new
>> specs can be added to its crawling (which I suspect ought to be relatively
>> easy — I recall Peter's code being able to process quite a lot of different
>> documents)
>
> Yes, adding specs to it’s crawl is trivial.
>
> then we can all align, which I reckon is a win (even without counting the
>> saved cycles).
>>
>
> I've bumped into way too many painful issues with non browser-based HTML
> parsers to waste more time with them.
>
> FWIW, Shepherd uses html5lib and AFAICT sees a browser equivalent DOM
> which it traverses. This hasn’t been an issue to date.
>

So does jsdom[1]. Yet I've bumped into plenty of very annoying issues with
it (even though jsdom actually has a JS runtime, which afaik html5lib
doesn't).


> I'm also very interested in gathering data from editor's draft which
> requires a JS runtime for those which use ReSpec.
>
>
> At one point I did start to add code to Shepherd’s spec parser (which
> actually has been completely factored out of Shepherd these days) to handle
> ReSpec source files. I stopped because ReSpec was under heavy development
> at the time and I didn’t want to chase a moving target.
>
> Finishing this wouldn’t be that big a deal (and would be made easier if
> ReSpec uses the Bikeshed dfn markup).
>

 Unfortunately, I need a solution that works for ReSpec drafts right away.

> Shepherd exposes an API that allows you to just simply dump the data it
>> has. If you look inside update.py in Bikeshed you can see how it works.
>> What Bikeshed does is, instead of querying services live, allow the user to
>> regularly call bikeshed update and get a fresh DB (of a bunch of stuff).
>> The same could be injected into SpecRef.
>
> Yes, and it’s all JSON over http(s). You can currently query anchor data
> per spec or simply dump the entire DB. More advanced queries can be added
> to the API easily.
>

Neat.

--tobie
---
[1]: https://github.com/tmpvar/jsdom

Received on Thursday, 2 October 2014 18:58:19 UTC