- From: John Snelson <john.snelson@oracle.com>
- Date: Tue, 22 Jan 2008 13:30:40 +0000
- To: Arnaud Desitter <arnaud02@users.sourceforge.net>
- CC: html-tidy@w3.org
Arnaud Desitter wrote: > On 22/01/2008, John Snelson <john.snelson@oracle.com> wrote: >> Is there a better way to do what I want? I would be quite happy to >> implement a new API method to do this if that's required - does anyone >> else think this would be useful? > > Please refer to http://tidy.sf.net/issue/1636028. > Your contribution to a new API would be welcome. Please post it using the > tidy patch tracker. Thanks for the pointer. From the bug report linked, it's not obvious what the correct way to fix this is. Should I change tidyNodeGetText() to return the unescaped value of the node, or should I add a new method? Here's what I propose - I'll add a new method: Bool tidyNodeGetValue( TidyDoc tdoc, TidyNode tnod, TidyBuffer* buf ); For attribute, text, comment, and processing instruction nodes this method will fill the buffer with the value of the node. The value will be unescaped, and not serialized (no "<!--" or "<?" etc.). Some questions: 1) Are there other node types the method should work for? 2) Should I respect the specified output encoding, or use UTF-8? (For instance, the tidyNodeGetName() function always returns UTF-8) 3) What should I do about unrepresentable characters? John -- John Snelson, Oracle Corporation http://snelson.org.uk/john Berkeley DB XML: http://www.oracle.com/database/berkeley-db/xml XQilla: http://xqilla.sourceforge.net
Received on Tuesday, 22 January 2008 13:31:53 UTC