Re: Validator fails to detect invalid document

[ NOTE WELL: The below is "FYI" and not arguments in the discussion! ]



Ian Hickson <ian@hixie.ch> wrote:

>On Mon, 11 Nov 2002, Terje Bless wrote:
>
>>But see
>><URL:http://validator.w3.org:8001/check?uri=http://2onsdag.dk/
>>index_wrong. php>. Depending on feedback, this may or may not change
>>before release.
>
>If you leave this enabled, please make it _abundantly_ clear that this
>is not technically correct. Better would be to add an "uber strict" mode
>which checks this as well.
>
>Note that this will also prevent users from omitting #FIXED or other
>non-#IMPLIED attributes, it will disallow the well-supported literal
>delimiter omission feature (foo=bar vs foo="bar"), and it will disallow
>the partly-supported attribute name omission feature (<h1 center> vs <h1
>align="center">).
>
>Now, it would also disallow the commonly misunderstood <foo/ syntax
>(with its resulting trailing '>' in the output).

Ah, I was not entirely clear about what we do[0].

What's going on here is that we've turned on the "-wunclosed" flag to
OpenSP. This has the effect of changing the SGML Declaration used from:

...
  FEATURES
    MINIMIZE
      SHORTTAGS YES
...

to approximately

...
FEATURES
  MINIMIZE
    SHORTTAG
      STARTTAG UNCLOSED NO
      ENDTAG   UNCLOSED NO
...

with the rest of the FETAURES MINIMIZE SHORTTAGS set the way they were
orginally ("FEATURES MINIMIZE SHORTTAGS YES"). IOW, it will allow all your
examples above except the missing ">" on a start or end tag.

Ideally though, the SGML Decl would list:

...
  FEATURES
    MINIMIZE
      SHORTTAG
        STARTTAG
          EMPTY    NO  -- outlaws "<>" -- 
          UNCLOSED NO  -- outlaws "<foo" --
          NETENABL NO  -- outlaws "<p/text<em/more text/ nested/" --
        ENDTAG
          EMPTY    NO  -- outlaws "</>" -- 
          UNCLOSED NO  -- outlaws "</foo" --
        ATTRIB
          DEFAULT  YES -- allows defaulted attributes --
          OMITNAME NO  -- possibly "YES" for compat; allows "<gi attr>" --
          VALUE    NO  -- allows unquoted attrs; "<gi att=val>" --
...

and, frankly, I consider it a bug in the HTML Rec that it doesn't; that's
more in line with how HTML appears to have been intended to work and only
ignorance and NIH (or possibly predating WebSGML) caused it to not have
been specified that way in the first place.



[0] -  Quite naturally, since I don't really understand this subject
       matter in that much detail. But with the gracious help of
       Daniel Biddle on IRC, the details are beginning to emerge.

Received on Monday, 11 November 2002 13:48:25 UTC