Re: Spec parser

Hello Tak,

there's a compatibility wrapper for html5lib's parser in the code (our server is currently using an older patched version, I'm working on removing that dependency on the server). When using V1.0+ of html5lib you probably need to remove the wrapper.

You can either modify the _parseData method in your local copy or subclass the parser and override that method. Either way, replace the line:
            tree = self.mParser.parse(StringReader(data), encoding = encoding)  # XXX remove stringreader when upgrade to html5lib 1.0

with:
            tree = self.mParser.parse(data, encoding = encoding)


The root section should indeed have a list of the top level sections and the anchors present at the root level. You can then traverse the section tree to find the sub-sections.

Peter

On Jun 6, 2013, at 5:06 PM, Takashi Hayakawa wrote:

> Hello Peter and all,
> 
> I started looking into the media element part of HTML-5 as a member of
> the small team at CableLabs that Bob Lund mentioned a couple of weeks
> ago.
> 
> Peter Linss wrote:
>> Done. I pulled the spec parsing bits into [1], all the Shepherd
>> specific code remains in SynchronizeSpec.py. There are no dependencies
>> on any other Shepherd code.
> 
> I am tring this spec parser to see if this helps our development.  But
> it doesn't look working for me so far.  The code at the bottom of this
> message produces the following output in my environment (Ubuntu 12.04
> and Python 2.7.3 with html5lib-1.0b1.tar.gz and six-1.3.0.tar.gz
> installed):
> 
>    $ sha1sum specificationparser.py
>    c8fe736b74cab561c84a699b99abed2471fa0bc5  specificationparser.py
>    $ python a.py http://www.w3.org/TR/css3-background/Overview.html
>    parser = <specificationparser.SpecificationParser object at 0xb727ed2c>
>    uri = http://www.w3.org/TR/css3-background/Overview.html
>    DEBUG ('Loading: ', 'http://www.w3.org/TR/css3-background/', '\n')
>    STATUS ('Processing ', 'http://www.w3.org/TR/css3-background/', '\n')
>    parse_result = True
>    root = <specificationparser.Section object at 0x8b541cc>
>    root.mSectionName = None
>    root.mHeadingLevel = 0
>    root.mAnchors = {}
>    root.mSubSections = {}
>    $ 
> 
> Can someone show a sample code that works?
> 
> What would be a good input page/URI to it?  What are the meaning ways to
> use the resulting RootSection?  I am expecting root.mAnchors and
> root.mSubSections to be nonempty; is that the right expectation?
> 
> Thank you,
> 
> --tak
> 
> --
> Takashi Hayakawa <t.hayakawa@cablelabs.com>
> CableLabs
> 
> 
> ################################################################
> 
> #!/usr/bin/python
> 
> import sys
> import specificationparser
> 
> ################
> 
> class myUi:
>    def __init__(self):
>        pass
> 
>    def status(self, *x):
>        print "STATUS", x
> 
>    def warn(self, *x):
>        print "WARN", x
> 
>    def debug(self, *x):
>        print "DEBUG", x
> 
> ################
> 
> ui = myUi()
> parser = specificationparser.SpecificationParser(ui)
> print "parser =", parser
> 
> uri = sys.argv[1]
> print "uri =", uri
> 
> parse_result = parser.parseSpec(uri)
> print "parse_result =", parse_result
> 
> parser.postProcess()
> 
> root = parser.getRootSection()
> print "root =", root
> print "root.mSectionName =", root.mSectionName
> print "root.mHeadingLevel =", root.mHeadingLevel
> print "root.mAnchors =", root.mAnchors
> print "root.mSubSections =", root.mSubSections

Received on Friday, 7 June 2013 00:17:40 UTC