- From: Christopher Woods <cwoods_eol@yahoo.com>
- Date: Thu, 20 Oct 2005 07:58:49 -0700 (PDT)
- To: html-tidy@w3.org
I'm not 100% this falls into the "bug" category or not so I'll start this out on the lists and see where it goes from there... The Issue: I think there needs to be a change to the behavior of Tidy with respect to implied/inferred tables (those tables where Tidy injects the "missing" table block start element). In the current functionality, once an inferred table is started it is treated just like a regular table with any non-table content* being moved before the table and using the standard exit points to determine when the end of the table has been encountered. For inferred tables, this can result in significant reordering of content and potentially merging of two tables into one - altering the layout of both**. For example, the following HTML snippet: <body> <tr><td>T1 - R1 - C1</td><td>T1 - R1 - C2</td></tr> This is content after T1 <table> <tr><td>T2 - R1 - C1</td><td>T2 - R1 - C2</td></tr> </table> This is content after T2 would be "Tidy"d to: <body> This is content after T1 <table> <tr><td>T1 - R1 - C1</td><td>T1 - R1 - C2</td></tr> <tr><td>T2 - R1 - C1</td><td>T2 - R1 - C2</td></tr> </table> This is content after T2 While a very simple sample, it does show the potential issue - imagine that "this is content after T1" was rather several hundred lines of text or that table 1 had a narrow first column and wide second column while table 2 had the inverse. The Proposed Behavior: I think what Tidy is attempting to do by injecting the "missing" table element and processing as a table block has merit. I'm not proposing that we change the intent - only the scope of the content that is treated as part of that inferred table. What I'd like to propose is that in addition to the current exit points***, an "inferred table" is exited upon the first content (inline/block/text [including the table start tag itself] - excluding whitespace) encountered within the block but outside of any TBODY, THEAD, TFOOT, or TR sub-blocks. Using the example from above, the "Tidy"d output would become: <body> <table> <tr><td>T1 - R1 - C1</td><td>T1 - R1 - C2</td></tr> </table> This is content after T1 <table> <tr><td>T2 - R1 - C1</td><td>T2 - R1 - C2</td></tr> </table> This is content after T2 The Results: The most noticable and intended results are the minimization of content reordering and the prevention of merging implied/inferred tables with other tables (either properly stated or implied/inferred). Tidy's goal in general is to attempt to preserve the author's original intent while correcting flawed HTML. By minimizing content re-ordering, limiting the scope of implied tables, and preventing table merges, this proposed change would result in keeping more intune to that goal than the current behavior. As a side note, I have not reviewed exit points for other areas of the code where "inferred" elements are inserted. It is possible that these other areas need to be reviewed as well for similar behavioral changes. The code: I think that most likely this change would require a new state flag in the lexer to denote the entry into a block via insertion of inferred block start elements. This would probably allow for the cleanest set of adjustments to the various exit points for any/all blocks. Footnotes: * Block/Inline elements and/or Text data that appears within a table block but outside of any th/td/caption blocks. ** When combined with issue #1331849 [http://tidy.sf.net/issue/1331849], once an inferred table is entered the parser/lexer potentially never exits the table processing until the end of the file is reached and all tables from the inferred on could be merged into a single table placed at the end of the output. Using the example, the "Tidy"d output becomes: <body> This is content after T1 This is content after T2 <table> <tr><td>T1 - R1 - C1</td><td>T1 - R1 - C2</td></tr> <tr><td>T2 - R1 - C1</td><td>T2 - R1 - C2</td></tr> </table> *** See related issues #1316307 [http://tidy.sf.net/issue/1316307] and #1316258 [http://tidy.sf.net/issue/1316258] for problems with exit points of current table/row Parse functionality and suggested changes to correct. __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com
Received on Thursday, 20 October 2005 14:58:58 UTC