- From: Peter Linss <peter.linss@hp.com>
- Date: Thu, 6 Jun 2013 17:17:16 -0700
- To: Takashi Hayakawa <T.Hayakawa@CableLabs.com>
- Cc: Tobie Langel <tobie@w3.org>, public-test-infra <public-test-infra@w3.org>, ext Robin Berjon <robin@w3.org>
- Message-Id: <05F53876-81E8-464B-AA9D-A8FB87DB10DA@hp.com>
Hello Tak, there's a compatibility wrapper for html5lib's parser in the code (our server is currently using an older patched version, I'm working on removing that dependency on the server). When using V1.0+ of html5lib you probably need to remove the wrapper. You can either modify the _parseData method in your local copy or subclass the parser and override that method. Either way, replace the line: tree = self.mParser.parse(StringReader(data), encoding = encoding) # XXX remove stringreader when upgrade to html5lib 1.0 with: tree = self.mParser.parse(data, encoding = encoding) The root section should indeed have a list of the top level sections and the anchors present at the root level. You can then traverse the section tree to find the sub-sections. Peter On Jun 6, 2013, at 5:06 PM, Takashi Hayakawa wrote: > Hello Peter and all, > > I started looking into the media element part of HTML-5 as a member of > the small team at CableLabs that Bob Lund mentioned a couple of weeks > ago. > > Peter Linss wrote: >> Done. I pulled the spec parsing bits into [1], all the Shepherd >> specific code remains in SynchronizeSpec.py. There are no dependencies >> on any other Shepherd code. > > I am tring this spec parser to see if this helps our development. But > it doesn't look working for me so far. The code at the bottom of this > message produces the following output in my environment (Ubuntu 12.04 > and Python 2.7.3 with html5lib-1.0b1.tar.gz and six-1.3.0.tar.gz > installed): > > $ sha1sum specificationparser.py > c8fe736b74cab561c84a699b99abed2471fa0bc5 specificationparser.py > $ python a.py http://www.w3.org/TR/css3-background/Overview.html > parser = <specificationparser.SpecificationParser object at 0xb727ed2c> > uri = http://www.w3.org/TR/css3-background/Overview.html > DEBUG ('Loading: ', 'http://www.w3.org/TR/css3-background/', '\n') > STATUS ('Processing ', 'http://www.w3.org/TR/css3-background/', '\n') > parse_result = True > root = <specificationparser.Section object at 0x8b541cc> > root.mSectionName = None > root.mHeadingLevel = 0 > root.mAnchors = {} > root.mSubSections = {} > $ > > Can someone show a sample code that works? > > What would be a good input page/URI to it? What are the meaning ways to > use the resulting RootSection? I am expecting root.mAnchors and > root.mSubSections to be nonempty; is that the right expectation? > > Thank you, > > --tak > > -- > Takashi Hayakawa <t.hayakawa@cablelabs.com> > CableLabs > > > ################################################################ > > #!/usr/bin/python > > import sys > import specificationparser > > ################ > > class myUi: > def __init__(self): > pass > > def status(self, *x): > print "STATUS", x > > def warn(self, *x): > print "WARN", x > > def debug(self, *x): > print "DEBUG", x > > ################ > > ui = myUi() > parser = specificationparser.SpecificationParser(ui) > print "parser =", parser > > uri = sys.argv[1] > print "uri =", uri > > parse_result = parser.parseSpec(uri) > print "parse_result =", parse_result > > parser.postProcess() > > root = parser.getRootSection() > print "root =", root > print "root.mSectionName =", root.mSectionName > print "root.mHeadingLevel =", root.mHeadingLevel > print "root.mAnchors =", root.mAnchors > print "root.mSubSections =", root.mSubSections
Received on Friday, 7 June 2013 00:17:40 UTC