- From: Peter Linss <peter.linss@hp.com>
- Date: Thu, 6 Jun 2013 17:17:16 -0700
- To: Takashi Hayakawa <T.Hayakawa@CableLabs.com>
- Cc: Tobie Langel <tobie@w3.org>, public-test-infra <public-test-infra@w3.org>, ext Robin Berjon <robin@w3.org>
- Message-Id: <05F53876-81E8-464B-AA9D-A8FB87DB10DA@hp.com>
Hello Tak,
there's a compatibility wrapper for html5lib's parser in the code (our server is currently using an older patched version, I'm working on removing that dependency on the server). When using V1.0+ of html5lib you probably need to remove the wrapper.
You can either modify the _parseData method in your local copy or subclass the parser and override that method. Either way, replace the line:
tree = self.mParser.parse(StringReader(data), encoding = encoding) # XXX remove stringreader when upgrade to html5lib 1.0
with:
tree = self.mParser.parse(data, encoding = encoding)
The root section should indeed have a list of the top level sections and the anchors present at the root level. You can then traverse the section tree to find the sub-sections.
Peter
On Jun 6, 2013, at 5:06 PM, Takashi Hayakawa wrote:
> Hello Peter and all,
>
> I started looking into the media element part of HTML-5 as a member of
> the small team at CableLabs that Bob Lund mentioned a couple of weeks
> ago.
>
> Peter Linss wrote:
>> Done. I pulled the spec parsing bits into [1], all the Shepherd
>> specific code remains in SynchronizeSpec.py. There are no dependencies
>> on any other Shepherd code.
>
> I am tring this spec parser to see if this helps our development. But
> it doesn't look working for me so far. The code at the bottom of this
> message produces the following output in my environment (Ubuntu 12.04
> and Python 2.7.3 with html5lib-1.0b1.tar.gz and six-1.3.0.tar.gz
> installed):
>
> $ sha1sum specificationparser.py
> c8fe736b74cab561c84a699b99abed2471fa0bc5 specificationparser.py
> $ python a.py http://www.w3.org/TR/css3-background/Overview.html
> parser = <specificationparser.SpecificationParser object at 0xb727ed2c>
> uri = http://www.w3.org/TR/css3-background/Overview.html
> DEBUG ('Loading: ', 'http://www.w3.org/TR/css3-background/', '\n')
> STATUS ('Processing ', 'http://www.w3.org/TR/css3-background/', '\n')
> parse_result = True
> root = <specificationparser.Section object at 0x8b541cc>
> root.mSectionName = None
> root.mHeadingLevel = 0
> root.mAnchors = {}
> root.mSubSections = {}
> $
>
> Can someone show a sample code that works?
>
> What would be a good input page/URI to it? What are the meaning ways to
> use the resulting RootSection? I am expecting root.mAnchors and
> root.mSubSections to be nonempty; is that the right expectation?
>
> Thank you,
>
> --tak
>
> --
> Takashi Hayakawa <t.hayakawa@cablelabs.com>
> CableLabs
>
>
> ################################################################
>
> #!/usr/bin/python
>
> import sys
> import specificationparser
>
> ################
>
> class myUi:
> def __init__(self):
> pass
>
> def status(self, *x):
> print "STATUS", x
>
> def warn(self, *x):
> print "WARN", x
>
> def debug(self, *x):
> print "DEBUG", x
>
> ################
>
> ui = myUi()
> parser = specificationparser.SpecificationParser(ui)
> print "parser =", parser
>
> uri = sys.argv[1]
> print "uri =", uri
>
> parse_result = parser.parseSpec(uri)
> print "parse_result =", parse_result
>
> parser.postProcess()
>
> root = parser.getRootSection()
> print "root =", root
> print "root.mSectionName =", root.mSectionName
> print "root.mHeadingLevel =", root.mHeadingLevel
> print "root.mAnchors =", root.mAnchors
> print "root.mSubSections =", root.mSubSections
Received on Friday, 7 June 2013 00:17:40 UTC