- From: Arnaud Desitter <arnaud02@users.sourceforge.net>
- Date: Wed, 23 Jan 2008 09:54:34 +0000
- To: "John Snelson" <john.snelson@oracle.com>
- Cc: html-tidy@w3.org
Thanks. I will get down to it when time allows. You can always post a revised patch in issue 1877642 if you have new ideas. Regards, On 22/01/2008, John Snelson <john.snelson@oracle.com> wrote: > I've uploaded my patch that implements tidyNodeGetValue(), which can be > found here: > > http://sourceforge.net/tracker/index.php?func=detail&aid=1877642&group_id=27659&atid=390965 > > John > > Arnaud Desitter wrote: > > On 22/01/2008, John Snelson <john.snelson@oracle.com> wrote: > >> Arnaud Desitter wrote: > >>> On 22/01/2008, John Snelson <john.snelson@oracle.com> wrote: > >>>> Is there a better way to do what I want? I would be quite happy to > >>>> implement a new API method to do this if that's required - does anyone > >>>> else think this would be useful? > >>> Please refer to http://tidy.sf.net/issue/1636028. > >>> Your contribution to a new API would be welcome. Please post it using the > >>> tidy patch tracker. > >> Thanks for the pointer. From the bug report linked, it's not obvious > >> what the correct way to fix this is. Should I change tidyNodeGetText() > >> to return the unescaped value of the node, or should I add a new method? > > > >>From the bug reports, please add a new function. > > > >> Here's what I propose - I'll add a new method: > >> > >> Bool tidyNodeGetValue( TidyDoc tdoc, TidyNode tnod, TidyBuffer* buf ); > >> > >> For attribute, text, comment, and processing instruction nodes this > >> method will fill the buffer with the value of the node. The value will > >> be unescaped, and not serialized (no "<!--" or "<?" etc.). > >> > >> Some questions: > >> > >> 1) Are there other node types the method should work for? > >> 2) Should I respect the specified output encoding, or use UTF-8? (For > >> instance, the tidyNodeGetName() function always returns UTF-8) > > > > Could you add that to include/tidy.h please ? > > > >> 3) What should I do about unrepresentable characters? > > > > IMO, UTF8 is a good choice. Bjorn or others may comment. > > Because it is a new function, there is no backward compatibility issue > > so it can be modified until it feels right. > > -- > John Snelson, Oracle Corporation http://snelson.org.uk/john > Berkeley DB XML: http://www.oracle.com/database/berkeley-db/xml > XQilla: http://xqilla.sourceforge.net >
Received on Wednesday, 23 January 2008 09:56:53 UTC