W3C home > Mailing lists > Public > public-qa-dev@w3.org > September 2004

S::P::O and temp files

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sun, 12 Sep 2004 05:45:43 +0200
To: public-qa-dev@w3.org
Message-ID: <4143c446.79888493@smtp.bjoern.hoehrmann.de>

Hi,

  Another issue that needs to be resolved in order to use S::P::O is the
creation of temporary files. While OpenSP generally supports doing IO on
strings, the implementation is unusable as it consumes hundreds of times
the memory the string comsumes. OpenSP also supports reading from file
descriptors, but this is not portable, e.g. on my Win32 system Perl and
the XS/OpenSP would use different runtimes and the file descriptor table
is local to the runtime. Now there are two approaches, one is to rely on
temporary file names and the other would be to optimize this on systems
where reading from <OSFD>s is supported. 

  use strict;
  use warnings;

  use SGML::Parser::OpenSP qw();
  use File::Temp qw();

  our $SUPPORTS_OSFD_READING = 0;

  # high security on systems that support it
  File::Temp->safe_level(File::Temp::HIGH);

  # new parser
  my $p = SGML::Parser::OpenSP->new;

  sub x::new{bless{},shift}
  sub x::start_element{use Data::Dumper; print Dumper\@_}

  # null handler
  $p->handler(x->new);

  # the html to parse
  my $html = "<!DOCTYPE html []><p>...";

  # create temp file
  my $fh = File::Temp->new();

  # store content
  print $fh $html;

  # seek to start
  seek $fh, 0, 0;

  if ($SUPPORTS_OSFD_READING)
  {
      $p->parse_file("<OSFD>" . fileno($fh));
  }
  else
  {
      require Fcntl;

      # not all systems have F_SETFD
      eval { Fcntl::F_SETFD };

      unless ($@)
      {
          fcntl($fh, Fcntl::F_SETFD, 0)
            or die "Can't clear close-on-exec flag on temp fh: $!\n";
      }

      $p->parse_file($fh->filename);
  }

  undef $fh;

With $SUPPORTS_OSFD_READING false this would work on both, Linux and
Win32, with $SUPPORTS_OSFD_READING it should work on Linux. Are there
any major issues with this approach and/or implementation?
Received on Sunday, 12 September 2004 03:46:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 19 August 2010 18:12:44 GMT