[check] strange usage of $File->{Version}

Terje, all.

I'd appreciate if we could figure out what, in check, $File->{Version}
actually is.

In most cases, it's used as "the best name we have for the document
type". 

e.g.:
[[
  # Set Version to be the FPI initially.
  $File->{Version} = $File->{DOCTYPE};
]] -- check, lines 749-750, sub parse()

Eventually, it holds the "pretty" version of the document type
[[
# Get the pretty text version of the FPI if a mapping exists.
if (my $prettyver = $CFG->{Types}->{$File->{Version}}->{Display}) {
  $File->{Version} = $prettyver;
}
]] -- check, lines 779 - 782, sub parse()

and that's what is passed to the templates 
[[
  if (! $File->{Doctype} and ($File->{Version} eq 'unknown' or $File->{Version} eq 'SGML')) {
      $T->param(file_version => '(no Doctype found)');
  }
  else {
    $T->param(file_version => $File->{Version});
  }
]] -- check, lines 848-853, sub parse()

Fine, so $File->{Version} is basically $File->{DOCTYPE}, later
prettified for output. Dubious choice of name notwithstanding, that's
fine. 

Except that it also seems to be used to hold the content of the
"version" attribute of the root element, "sniffed" from the ESIS:
[[
  # Extract any version attribute from the ESIS.
  for (@{$File->{ESIS}}) {
    no warnings 'uninitialized';
    next unless /^AVERSION CDATA (.*)/i;
    if ($1 =~ '-//W3C//DTD (SGML|XML) Fallback//EN') {
      $File->{Tentative} |= (T_ERROR | T_FALL);
      my $dtd = $1 eq 'SGML' ? 'HTML 4.01 Transitional' : 'XHTML 1.0 Strict';
      &add_warning('W09', { W09_dtd => $dtd });
    }
    $File->{Version} = $1;
  }
]] -- check, lines 756-763, sub parse()

And while that bit of information can probably be useful (for what?), I
don't think it's equivalent to the FPI/SI.
yet:
[[
  $File->{Version} = $File->{DOCTYPE} unless $File->{Version};
]] -- check, line 775, sub parse()

Questions:
(mostly for Terje, since a large part of the quoted code comes from 
http://dev.w3.org/cvsweb/validator/httpd/cgi-bin/check#rev1.200.2.18 
but if you have an idea or any answer, please chime in)

- is my analysis correct? Did I miss or misunderstand anything?
- is that a honest mistake, due to some confusion in variable names, or
  is it on purpose?
  * if the former, I think we could use $File->{Version} only for the
  version attribute info, and using a new var for the "Document
  Type". And document them in the code...
  * if the latter, please explain. It may just be part of the design I
  don't get, or a spec I don't know well enough, I don't know. But it
  doesn't make sense to me at this point. 
  
Note that this is, as far as I can tell, the reason for the "Valid 1.1"
Bug: http://lists.w3.org/Archives/Public/www-validator/2005May/0134.html
(using the content of <svg version="" instead of the pretty-version of
the detected FPI). Therefore, I don't think we can conveniently file
this under "Resolved: LATER".

Thanks,
-- 
olivier

Received on Friday, 3 June 2005 00:07:07 UTC