W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2008

Re: Using TidyLib as an HTML parser

From: John Snelson <john.snelson@oracle.com>
Date: Tue, 22 Jan 2008 21:05:08 +0000
Message-ID: <47965A84.9030909@oracle.com>
To: Arnaud Desitter <arnaud02@users.sourceforge.net>
CC: html-tidy@w3.org

I've uploaded my patch that implements tidyNodeGetValue(), which can be 
found here:

http://sourceforge.net/tracker/index.php?func=detail&aid=1877642&group_id=27659&atid=390965

John

Arnaud Desitter wrote:
> On 22/01/2008, John Snelson <john.snelson@oracle.com> wrote:
>> Arnaud Desitter wrote:
>>> On 22/01/2008, John Snelson <john.snelson@oracle.com> wrote:
>>>> Is there a better way to do what I want? I would be quite happy to
>>>> implement a new API method to do this if that's required - does anyone
>>>> else think this would be useful?
>>> Please refer to http://tidy.sf.net/issue/1636028.
>>> Your contribution to a new API would be welcome. Please post it using the
>>> tidy patch tracker.
>> Thanks for the pointer. From the bug report linked, it's not obvious
>> what the correct way to fix this is. Should I change tidyNodeGetText()
>> to return the unescaped value of the node, or should I add a new method?
> 
>>From the bug reports, please add a new function.
> 
>> Here's what I propose - I'll add a new method:
>>
>> Bool tidyNodeGetValue( TidyDoc tdoc, TidyNode tnod, TidyBuffer* buf );
>>
>> For attribute, text, comment, and processing instruction nodes this
>> method will fill the buffer with the value of the node. The value will
>> be unescaped, and not serialized (no "<!--" or "<?" etc.).
>>
>> Some questions:
>>
>> 1) Are there other node types the method should work for?
>> 2) Should I respect the specified output encoding, or use UTF-8? (For
>> instance, the tidyNodeGetName() function always returns UTF-8)
> 
> Could you add that to include/tidy.h please ?
> 
>> 3) What should I do about unrepresentable characters?
> 
> IMO, UTF8 is a good choice. Bjorn or others may comment.
> Because it is a new function, there is no backward compatibility issue
> so it can be modified until it feels right.

-- 
John Snelson, Oracle Corporation            http://snelson.org.uk/john
Berkeley DB XML:        http://www.oracle.com/database/berkeley-db/xml
XQilla:                                  http://xqilla.sourceforge.net
Received on Tuesday, 22 January 2008 21:07:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:58 GMT