Re: SGML Id Tool (was: suggest validator prefer URI to FPI)

* Sean B. Palmer wrote:
>Though SP doesn't include external identifiers in its ESIS output, it
>does provide access to them through its generic API, making it possible
>to build a tool on top of it in C++ that outputs just the PubID and
>SysID (if present) on two consecutive lines. And I've done just that.

Using Perl SAX2 (and XML::SAX::Expat) you could do

  #!perl
  package PrintXmlId;
  use base qw(XML::SAX::Base);
  
  sub start_dtd
  {
      my $self = shift;
      my $dtd = shift;
      
      printf "PUBLIC: %s\n", $dtd->{PublicId}
        if exists $dtd->{PublicId};
      printf "SYSTEM: %s\n", $dtd->{SystemId}
        if exists $dtd->{SystemId};
  }
  
  package main;
  use XML::SAX::Expat;
  
  die "Usage: $0 file.xml\n" unless @ARGV;
  XML::SAX::Expat->new(Handler=>PrintXmlId->new)->parse_uri(shift);

For XML documents. Our current plan is to write a wrapper for OpenSP's
generic interface to Perl that would be compatible with Perl SAX2, the
PrintXmlId handler would thus work for all SGML/XML documents. Using
just XML::Parser it is even simpler,

  % perl -MXML::Parser -e "XML::Parser->new(Handlers=>{Doctype=>sub{ \
    printf qq(PUBLIC: %s\n), $_[3] if defined $_[3]; \
    printf qq(SYSTEM: %s\n), $_[2] if defined $_[2]; \
    }})->parsefile(shift)" ...

However, whatever special things the Validator should do, it's best to
write a XML::SAX::Base based handler to do it.

>Note that SP may raise various errors along the way that you have to
>redirect off to /dev/null as shown.

You can avoid that using egp->inhibitMessages(true); 

>Whatever the tool used, I suggest you simply peek at the PubId and
>SysId, and if they don't match raise some kind of achingly obvious
>"Fatal Warning" to users: I don't believe that defaulting to either ID
>is acceptable given that inconsistency will never be intentional on the
>behalf of the user or their authoring tool.

http://www.w3.org/TR/xhtml1-schema/ does that intentionally (unless you
consider http://www.w3.org/2002/08/xhtml/xhtml1-strict.dtd to match the
XHTML 1.0 Strict document type definition).

Received on Saturday, 21 August 2004 22:27:07 UTC