# Re: ASCII Art

From: Leonard R. Kasday <kasday@acm.org>
Date: Tue, 04 Jan 2000 17:02:15 -0500
Message-Id: <4.2.2.20000104163853.00d93ab0@pop3.concentric.net>
To: "Chris Ridpath" <chris.ridpath@utoronto.ca>

```A more general way to find ASCII Art would be to use statistical properties
of English (or whatever language is in use).  For example, if you look at
the frequency of letter pairs, some are relatively high like "th" and some
are relatively low, like "mq".

There are lots of refs on this.   It's a classic topic.  If you're into
50's style Experimental Psychology, You can find references in any
intermediate psych textbook that deals with "information theory".  Garner's
a good author.  For the Engineering inclined, check out elmentary
infromation theory textbooks. Computer science fans can check out
compression theory.  Cryptography devotees can check out elementary methods
for substitution cyphers.   As you see, it's used all over place.

And there are standard statistical tests to see if distributions
match.  See any intermediate stat book.

So if you compare the contents of <PRE> or <XMP> with the statistics for
English (or whatever language is in use) and the match is poor, it's
probably ACSII Art.  Unless you have someone who likes to write long
strings of Acronyms.  But hey, acronyms are arguably ACSII Art in a sense
anyway.

So you may want to set your intern loose on this approach...

Actually, what you really want are statistics that take into account use of
other characters like underlines, spaces, other ACSII characters.  The
sorts of things that showed up in the ad hoc rules in
http://www.w3.org/WAI/ER/IG/ert/AsciiArt.htm.  So what you really want is a
program that just does those statistics, which you can turn loose on
ordinary web pages, and get distributions to compare against.

(of course you can also check statistics of strings of 3, 4, 5... letters
but for this purpose I bet 2 is enough.)

Len

p.s.
These statistics can also be used to check what language something is
written in.

At 03:38 PM 1/3/00 -0500, you wrote:
>For technique 1.1.K (http://www.w3.org/WAI/ER/IG/ert/#Technique1.1.K) we
>need to determine if a page contains ASCII art. Our intern had a look at a
>how ASCII art is used on the web and prepared the following document:
>
>http://www.w3.org/WAI/ER/IG/ert/AsciiArt.htm
>
>Note: this does not deal with emoticons " :) " etc.
>
> >From this report I think we can create an algorithm that will reliably test
>a page for ASCII art. I'll code something this week and test it on several
>sites.
>
>Chris
>

-------
Leonard R. Kasday, Ph.D.
Institute on Disabilities/UAP, and
Department of Electrical Engineering
Temple University
423 Ritter Annex, Philadelphia, PA 19122

kasday@acm.org
http://astro.temple.edu/~kasday

(215) 204-2247 (voice)
(800) 750-7428 (TTY)
```
Received on Tuesday, 4 January 2000 17:00:27 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:01:29 UTC