[Bug 5031] Doctype detection fails if root element includes non "word" character

http://www.w3.org/Bugs/Public/show_bug.cgi?id=5031

           Summary: Doctype detection fails if root element includes non
                    "word" character
           Product: Validator
           Version: 0.8.1
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: check
        AssignedTo: ot@w3.org
        ReportedBy: ot@w3.org
         QAContact: www-validator-cvs@w3.org


The doctype detection routine in preparse_doctype() has the following regexp to
detect FPI and SI:

m(<!DOCTYPE\s+(\w+)\s+(?:PUBLIC|SYSTEM)\s+...
the first (\w+) is the name of the document type, which has to be the root
element
(ref: http://www.w3.org/TR/xml/#vc-roottype )
but the \w+ is incorrect, as the root element can (among others) have a dash or
dot.
(ref: http://www.w3.org/TR/xml/#IDANQDS )

This half-breaks detection of the doctype for languages with root element
including non "perl word (alphanum plus _)" characters.

Received on Tuesday, 11 September 2007 06:36:01 UTC