A more general way to find ASCII Art would be to use statistical properties 
of English (or whatever language is in use).  For example, if you look at 
the frequency of letter pairs, some are relatively high like "th" and some 
are relatively low, like "mq".

There are lots of refs on this.   It's a classic topic.  If you're into 
50's style Experimental Psychology, You can find references in any 
intermediate psych textbook that deals with "information theory".  Garner's 
a good author.  For the Engineering inclined, check out elmentary 
infromation theory textbooks. Computer science fans can check out 
compression theory.  Cryptography devotees can check out elementary methods 
for substitution cyphers.   As you see, it's used all over place.

And there are standard statistical tests to see if distributions 
match.  See any intermediate stat book.

So if you compare the contents of <PRE> or <XMP> with the statistics for 
English (or whatever language is in use) and the match is poor, it's 
probably ACSII Art.  Unless you have someone who likes to write long 
strings of Acronyms.  But hey, acronyms are arguably ACSII Art in a sense 

So you may want to set your intern loose on this approach...

Actually, what you really want are statistics that take into account use of 
other characters like underlines, spaces, other ACSII characters.  The 
sorts of things that showed up in the ad hoc rules in  So what you really want is a 
program that just does those statistics, which you can turn loose on 
ordinary web pages, and get distributions to compare against.

(of course you can also check statistics of strings of 3, 4, 5... letters 
but for this purpose I bet 2 is enough.)


These statistics can also be used to check what language something is 
written in.

At 03:38 PM 1/3/00 -0500, you wrote:
>For technique 1.1.K ( we
>need to determine if a page contains ASCII art. Our intern had a look at a
>how ASCII art is used on the web and prepared the following document:
>Note: this does not deal with emoticons " :) " etc.
> >From this report I think we can create an algorithm that will reliably test
>a page for ASCII art. I'll code something this week and test it on several

Leonard R. Kasday, Ph.D.
Institute on Disabilities/UAP, and
Department of Electrical Engineering
Temple University
423 Ritter Annex, Philadelphia, PA 19122

(215) 204-2247 (voice)
(800) 750-7428 (TTY)

Received on Tuesday, 4 January 2000 17:00:27 UTC