W3C home > Mailing lists > Public > public-qa-dev@w3.org > September 2004

Re: S::P::O and temp files

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sun, 12 Sep 2004 13:18:20 +0200
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: public-qa-dev@w3.org
Message-ID: <41442e35.107007899@smtp.bjoern.hoehrmann.de>

* Bjoern Hoehrmann wrote:
>  Another issue that needs to be resolved in order to use S::P::O is the
>creation of temporary files. While OpenSP generally supports doing IO on
>strings, the implementation is unusable as it consumes hundreds of times
>the memory the string comsumes. OpenSP also supports reading from file
>descriptors, but this is not portable, e.g. on my Win32 system Perl and
>the XS/OpenSP would use different runtimes and the file descriptor table
>is local to the runtime. Now there are two approaches, one is to rely on
>temporary file names and the other would be to optimize this on systems
>where reading from <OSFD>s is supported. 

This should have been something like

  use strict;
  use warnings;

  use SGML::Parser::OpenSP qw();
  use File::Temp qw();

  # this would be a config setting or somesuch
  our $SUPPORTS_OSFD_READING = 0;

  # high security on systems that support it
  File::Temp->safe_level(File::Temp::HIGH);

  # new parser
  my $p = SGML::Parser::OpenSP->new;

  sub x::new{bless{},shift}
  sub x::start_element{use Data::Dumper; print Dumper\@_}

  # null handler
  $p->handler(x->new);

  # the html to parse
  my $html = "<!DOCTYPE html []><p>...";

  # create temp file, this would croak if it fails, so
  # there is no need for us to check the return value
  my $fh = File::Temp->new();

  # ...
  File::Temp::unlink0($fh, $fh->filename);

  # store content
  print $fh $html;

  # seek to start
  seek $fh, 0, 0;

  if ($SUPPORTS_OSFD_READING) {
      $p->parse_file("<OSFD>" . fileno($fh));
  } else {
      $p->parse_file($fh->filename);
  }

which, I am certain, still contains some minor flaws...
Received on Sunday, 12 September 2004 11:19:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 19 August 2010 18:12:44 GMT