RE: Betsie / Gateway comparison

Hi,

> You've hard-coded a lot of details into Betsie...

You're right. I have. It's not good and it needs to be fixed. Thankyou for
pointing this out.
 
> It therefore needs to be able to cope with just about 
> anything the 
> web can throw at it - and if the web throws something at it that it 
> can't cope with, I have to fix it.

While I take your point about the differences between the two gateways, I
think that the need to be able to cope with a wide range of dodgy code
really applies to both your gateway and Betsie. The thing is that there does
have to be a limit somewhere in terms of what either program is prepared to
take on, since regardless of how lenient the gateway is with dodgy HTML
code, there will always be a set of pages which that gateway cannot fix up
for accessibility. There will also always be a large tranche of problems
that require editorial fixes rather than automated code munging, and I'm not
sure what the point is of moving hell and high water to get code to make
sense where the actual content of the page still doesn't and never will
until it gets rewritten by a human who understands what an ALT attribute is
and that one cannot assume that the user gets what the designer of the page
thinks they get. ("On the left of your page you will see a large button
marked 'Not If You're Blind You Won't, Mate!'")

But thankyou for pointing out five things wrong with Betsie.

#1 

> ... a style sheet (untouched by Betsie and hence 
> can override 
> all font choices...

Style sheets. Oops. Ta.

#2

> spaces and 
> new lines at the start of an APPLET tag (Betsie processes the applets 
> before it remove the spaces, so it won't remove that applet)

Argh! Doh! Ta, again, for pointing this out.

#3

> an 
> old-style comment (if the string --> does not occur in the 
> document then 
> you should treat > as a comment terminator), and the < and > signs 
> within quotes in a tag (which may confuse your tag parsing).

Ug. Ouch.

#4

> It also 
> contains the common idiom of using /TD to terminate a link; 
> when Betsie 
> removes the /TD, the link goes too far because it isn't aware 
> of this.

And... yes, but it's not a common idiom, it's a common error. Mind you, it
would be good if Betsie could fix that too.

#5

> Also, the link is to a graphics file, which Betsie is unaware of and 
> tries to redirect through itself.

Um.. doh! Again! Hadn't thought of that. (Not that it happens often).

> Not only is Perl an interpreted 
> language, but 
> the way you've done string parsing seems to introduce 
> countless passes 
> through the document.  For example, you pass right through the whole 
> document removing surplus spaces from tags, and you pass 
> through again 
> to remove applets, and so on.  I don't know much Perl but this is 
> probably a natural side-effect of the language.  If it's given a big 
> document, that can mean quite a chunk of processing.  I take 
> it the BBC 
> have a good high-end server then!

Firstly, I cannot by any means claim to be the best Perl programmer in the
world. One of the luckiest, perhaps, in that I get to work in a very
supportive environment on projects that are clearly worthwhile, but I can't
pretend to know enough about Perl to be sure or not of whether the way I
have written things in Betsie is the best way all the time, and I am more or
less certain that there are various things that could be enhanced or tweaked
to be a lot smoother and sleeker. And faster.

Clearly, you have already pointed out at least five or six potential bugs or
issues, which I have noted with thanks and will resolve ASAP. There's
another bug you missed, in fact, which is that you can point Betsie at
herself and she doesn't mind, and you can keep doing that until your browser
crashes with an over-long URL. This has to be fixed too. (My strong
suspicion is that you've already sorted that one out in yours.)

In terms of multiple passes through the page, though, they're not countless,
no, but in order to implement the site specific code manipulation stuff it
seemed to be necessary to have the whole page in order to make sure that
whatever is supposed to be there is there (it might not be) and then to
manupulate it. The idea was to do as little as possible in the 'countless
passes through the document' stage and as much as possible in the 'one last
line by line pass through the document' stage, although some of the 'delete
all instances of this' stuff seemed to sit better in the first rather than
the second of the two.

Thing is, though, it turns out that Larry Wall and the Perl people are gods
and geniuses and that the current Perl regular expressions functionality not
merely allows but positively encourages (it seems) scripts such as Betsie to
make countless passes through documents because.. you can, and it isn't too
slow as long as you start sending as soon as possible.. which Betsie does.
If there's a better way of doing all this stuff.. I need to learn C++ and
look at your code, Silas, don't I.

Meanwhile, as pages get larger the Betsie version of the page seems to
actually increase in download speed, proportionately, compared to the speed
it would have been anyway. This is the thing. Betsie is fast enough, though
obviously, yes, C++ is way faster... true. But Perl is easier. :)

> By the way, I'm not sure what you were saying about Java 
> security.  When
> it's absolutely necessary for me to get to Java applets, I 
> can check the
> "enable applets" box in my gateway and it seems to work fine (it just
> re-directs the code URL to point back to the original page).

Um.. oh. It's quite possible that I'm wrong then, but I couldn't get it to
work. I'll have to look at how it was you got yours to do it. How *did* you
get yours to do it? I think I tried to do what you suggested above and it
didn't want to work. Then I just thought sod it and deleted the lot. It
would be nice to bring it back though, maybe... Hmm.

But thankyou very much indeed for the analysis, the errors and the
suggestions,

Cheers etc.,

Wayne

Received on Friday, 26 February 1999 07:03:38 UTC