Bugs in the Semantic Data Extractor

Two issues with the semantic data extractor (
http://www.w3.org/2003/12/semantic-extractor.html) off the bat:

1. The input URL comes back escaped

    To reproduce:
    Open the semantic data extractor
    Type an URL (example http://www.viridium.ro/)
    Submit
    On the report page, the input is present but the URL is URL escaped

2. When submitting an invalid URL, the page blows

    To reproduce:
    Open the semantic data extractor
    Input this URL: http%3A%2F%2Fwww.viridium.ro%2F
    Submit

Using org.apache.xerces.parsers.SAXParser
Exception java.io.IOException: Server returned HTTP response code: 403
for URL: http://cgi.w3.org/cgi-bin/tidy-if?docAddr=http%253A%252F%252Fwww.viridium.ro%252F
Server returned HTTP response code: 403 for URL:
http://cgi.w3.org/cgi-bin/tidy-if?docAddr=http%253A%252F%252Fwww.viridium.ro%252F



I suspect this is a launching ramp for some attacks. Moreover, I got
information about the technology in use.

Received on Friday, 30 March 2007 02:48:54 UTC