- From: Tobie Langel <tobie.langel@gmail.com>
- Date: Thu, 2 Oct 2014 20:57:50 +0200
- To: Peter Linss <peter.linss@hp.com>
- Cc: Robin Berjon <robin@w3.org>, Shane McCarron <shane@aptest.com>, "spec-prod@w3.org Prod" <spec-prod@w3.org>
- Message-ID: <CAMK=o4fRcJ=ZofhewNHdWudxtM6odeW+eFMqosqda2rSrnK2-A@mail.gmail.com>
On Thu, Oct 2, 2014 at 7:08 PM, Peter Linss <peter.linss@hp.com> wrote: > > On Oct 2, 2014, at 4:41 AM, Tobie Langel <tobie.langel@gmail.com> wrote: > > On Thu, Oct 2, 2014 at 12:10 PM, Robin Berjon <robin@w3.org> wrote: > >> On 02/10/2014 10:10 , Tobie Langel wrote: >> >>> My plan for this solution is to do daily crawling of relevant specs and >>> extract the dfn and put them in a DB. Further refinements could include >>> a search API, like I added for Specref and exposed within Respec. >>> >> >> Could you somehow reuse or modify what Shepherd does here? If it includes >> enough information (or additional extraction can be easily added) and new >> specs can be added to its crawling (which I suspect ought to be relatively >> easy — I recall Peter's code being able to process quite a lot of different >> documents) > > Yes, adding specs to it’s crawl is trivial. > > then we can all align, which I reckon is a win (even without counting the >> saved cycles). >> > > I've bumped into way too many painful issues with non browser-based HTML > parsers to waste more time with them. > > FWIW, Shepherd uses html5lib and AFAICT sees a browser equivalent DOM > which it traverses. This hasn’t been an issue to date. > So does jsdom[1]. Yet I've bumped into plenty of very annoying issues with it (even though jsdom actually has a JS runtime, which afaik html5lib doesn't). > I'm also very interested in gathering data from editor's draft which > requires a JS runtime for those which use ReSpec. > > > At one point I did start to add code to Shepherd’s spec parser (which > actually has been completely factored out of Shepherd these days) to handle > ReSpec source files. I stopped because ReSpec was under heavy development > at the time and I didn’t want to chase a moving target. > > Finishing this wouldn’t be that big a deal (and would be made easier if > ReSpec uses the Bikeshed dfn markup). > Unfortunately, I need a solution that works for ReSpec drafts right away. > Shepherd exposes an API that allows you to just simply dump the data it >> has. If you look inside update.py in Bikeshed you can see how it works. >> What Bikeshed does is, instead of querying services live, allow the user to >> regularly call bikeshed update and get a fresh DB (of a bunch of stuff). >> The same could be injected into SpecRef. > > Yes, and it’s all JSON over http(s). You can currently query anchor data > per spec or simply dump the entire DB. More advanced queries can be added > to the API easily. > Neat. --tobie --- [1]: https://github.com/tmpvar/jsdom
Received on Thursday, 2 October 2014 18:58:19 UTC