Bugs in the Semantic Data Extractor from Mihai Danila on 2007-03-29 (public-qa-dev@w3.org from March 2007)

From: Mihai Danila <viridium@gmail.com>
Date: Thu, 29 Mar 2007 12:44:28 -0400
To: public-qa-dev@w3.org
Message-ID: <b00576380703290944k3c123d24ne30dcd1e463e4cc4@mail.gmail.com>

Two issues with the semantic data extractor (
http://www.w3.org/2003/12/semantic-extractor.html) off the bat:

1. The input URL comes back escaped

    To reproduce:
    Open the semantic data extractor
    Type an URL (example http://www.viridium.ro/)
    Submit
    On the report page, the input is present but the URL is URL escaped

2. When submitting an invalid URL, the page blows

    To reproduce:
    Open the semantic data extractor
    Input this URL: http%3A%2F%2Fwww.viridium.ro%2F
    Submit

Using org.apache.xerces.parsers.SAXParser
Exception java.io.IOException: Server returned HTTP response code: 403
for URL: http://cgi.w3.org/cgi-bin/tidy-if?docAddr=http%253A%252F%252Fwww.viridium.ro%252F
Server returned HTTP response code: 403 for URL:
http://cgi.w3.org/cgi-bin/tidy-if?docAddr=http%253A%252F%252Fwww.viridium.ro%252F



I suspect this is a launching ramp for some attacks. Moreover, I got
information about the technology in use.

Received on Friday, 30 March 2007 02:48:54 UTC