W3C home > Mailing lists > Public > public-publ-wg@w3.org > September 2018

simple TOC parsing example

From: Jeff Buehler <jeff.buehler@knowbly.com>
Date: Fri, 14 Sep 2018 13:18:22 -0700
Message-ID: <CAGr-Qs-yZ-Nk6U8HAgXUfSOfi8C7b8oBJuFNtvH4D=2mm2VGew@mail.gmail.com>
To: PubWG <public-publ-wg@w3.org>
Hi everyone -

I created a very simple example (called tocparse.js) to show parsing a TOC
as a *starting point* for the conversation regarding how much control we
want to impose over TOC structure for WP and PWP, discussed here:

https://github.com/w3c/wpub/issues/291#issuecomment-415510019

During the meeting on this there was discussion about variance in TOCs - I
am not familiar with how much TOCs might actually vary across all
industries and needs, so this script is not  intended to be a "catch all",
just a start, and has been kept simple rather than hardened.  It  shows a
simple solution for determining level of nesting and pulling out pertinent
data and returns the information in stdout.  It expects only a <nav> (or
series of <nav>s) and <a> tags within the <nav>s.

tocparse.js is a nodejs command line script, and it is available here:

https://github.com/buehlerart/tocparse

It should run on OSX, Windows or Linux variants with more or less any
version of nodejs installed.  I wrote and tested it using nodejs v 8.12.0.

As I mentioned during the meeting on this, this is a hugely simplified
version based on another tool I wrote for specific needs a while back.  It
uses xpath and xmldom for DOM parsing, which fit my needs originally.

If anyone has any questions please feel free to contact me.

Thanks,
Jeff

Jeffry Buehler

Client Solutions

@learnknowbly <https://twitter.com/learnknowbly/>

kls.knowbly.com


This message contains information which may be confidential and privileged.
Unless you are the addressee (or authorized to receive for the addressee),
you may not use, copy or disclose to anyone the message or any of its
contents. If you have received the message in error, please advise by
replying to this e-mail and deleting the message.

-- 
This message contains information which may be confidential and privileged. 
Unless you are the addressee (or authorized to receive for the addressee), 
you may not use, copy or disclose to anyone the message or any of its 
contents. If you have received the message in error, please advise by 
replying to this e-mail and deleting the message.
Received on Friday, 14 September 2018 20:18:48 UTC

This archive was generated by hypermail 2.3.1 : Friday, 14 September 2018 20:18:48 UTC