Proposed change for Exiting from "Inferred" Tables

I'm not 100% this falls into the "bug" category or not so I'll start this out
on the lists and see where it goes from there... 

The Issue:
I think there needs to be a change to the behavior of Tidy with respect to
implied/inferred tables (those tables where Tidy injects the "missing" table
block start element).  In the current functionality, once an inferred table is
started it is treated just like a regular table with any non-table content*
being moved before the table and using the standard exit points to determine
when the end of the table has been encountered.  For inferred tables, this can
result in significant reordering of content and potentially merging of two
tables into one - altering the layout of both**.

For example, the following HTML snippet:

    <body>
    <tr><td>T1 - R1 - C1</td><td>T1 - R1 - C2</td></tr>
    This is content after T1
    <table>
    <tr><td>T2 - R1 - C1</td><td>T2 - R1 - C2</td></tr>
    </table>
    This is content after T2

would be "Tidy"d to:

    <body>
    This is content after T1
    <table>
    <tr><td>T1 - R1 - C1</td><td>T1 - R1 - C2</td></tr>
    <tr><td>T2 - R1 - C1</td><td>T2 - R1 - C2</td></tr>
    </table>
    This is content after T2

While a very simple sample, it does show the potential issue - imagine that
"this is content after T1" was rather several hundred lines of text or that
table 1 had a narrow first column and wide second column while table 2 had the
inverse.

The Proposed Behavior:
I think what Tidy is attempting to do by injecting the "missing" table element
and processing as a table block has merit.  I'm not proposing that we change
the intent - only the scope of the content that is treated as part of that
inferred table.

What I'd like to propose is that in addition to the current exit points***, an
"inferred table" is exited upon the first content (inline/block/text [including
the table start tag itself] - excluding whitespace) encountered within the
block but outside of any TBODY, THEAD, TFOOT, or TR sub-blocks.

Using the example from above, the "Tidy"d output would become:

    <body>
    <table>
    <tr><td>T1 - R1 - C1</td><td>T1 - R1 - C2</td></tr>
    </table>
    This is content after T1
    <table>
    <tr><td>T2 - R1 - C1</td><td>T2 - R1 - C2</td></tr>
    </table>
    This is content after T2

The Results:
The most noticable and intended results are the minimization of content
reordering and the prevention of merging implied/inferred tables with other
tables (either properly stated or implied/inferred).  Tidy's goal in general is
to attempt to preserve the author's original intent while correcting flawed
HTML.  By minimizing content re-ordering, limiting the scope of implied tables,
and preventing table merges, this proposed change would  result in keeping more
intune to that goal than the current behavior.

As a side note, I have not reviewed exit points for other areas of the code
where "inferred" elements are inserted.  It is possible that these other areas
need to be reviewed as well for similar behavioral changes.

The code:
I think that most likely this change would require a new state flag in the
lexer to denote the entry into a block via insertion of inferred block start
elements.  This would probably allow for the cleanest set of adjustments to the
various exit points for any/all blocks.

Footnotes:
* Block/Inline elements and/or Text data that appears within a table block but
outside of any th/td/caption blocks.

** When combined with issue #1331849 [http://tidy.sf.net/issue/1331849], once
an inferred table is entered the parser/lexer potentially never exits the table
processing until the end of the file is reached and all tables from the
inferred on could be merged into a single table placed at the end of the
output.  Using the example, the "Tidy"d output becomes:

    <body>
    This is content after T1
    This is content after T2
    <table>
    <tr><td>T1 - R1 - C1</td><td>T1 - R1 - C2</td></tr>
    <tr><td>T2 - R1 - C1</td><td>T2 - R1 - C2</td></tr>
    </table>

*** See related issues #1316307 [http://tidy.sf.net/issue/1316307] and #1316258
[http://tidy.sf.net/issue/1316258] for problems with exit points of current
table/row Parse functionality and suggested changes to correct.




	
		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

Received on Thursday, 20 October 2005 14:58:58 UTC