Fwd: [pedantic-web] Encoding issues when dereferencing "formats:" URIs

Forwarded from the pedantic-web list.

Initially this was (erroneously) reported as an issue with ARP and UTF-8 BOMs, but there's no BOM involved and ARP has never had an issue with BOMs.

It seems that validating (all?) rdf files under www.w3.org results in errors of the form:

"An attempt to load the RDF from URI 'http://www.w3.org/ns/formats/data/RDF_XML' failed. (Undecodable data when reading URI at byte 0 using encoding 'UTF-8'. Please check encoding and encoding declaration of your document.)"

But the byte value may vary, e.g. 24574 for http://www.w3.org/ns/ma-ont.rdf.

I understand that the same file (RDF_XML) validated without issue when copied to a remote server.

The code is question is presumably:

	try {// read whole file as characters
	    int c;
	    while ((c = isr.read()) != -1) {
		sb.append((char)c);
		bytenum++;
	    }
	} 
	catch (IOException e){
	    throw new getRDFException("Undecodable data when reading URI at byte "+bytenum+" using encoding '"+finalCharset+"'."+" Please check encoding and encoding declaration of your document.");
	}

<http://dev.w3.org/cvsweb/2006/RDFValidator/WEB-INF/src/org/w3c/rdfvalidator/ARPServlet.java?rev=1.6>

So the issue may not be encoding, the same message being reported for any IO exception.

Thanks for your help,

Damian Steer

Begin forwarded message:

> From: Damian Steer <pldms@mac.com>
> Subject: Re: [pedantic-web] Encoding issues when dereferencing "formats:" URIs
> Date: 25 April 2012 16:07:07 GMT+01:00
> To: pedantic-web@googlegroups.com
> Reply-To: pedantic-web@googlegroups.com
> 
> On 25/04/12 15:49, Andreas Radinger wrote:
>> Hi,
>> 
>> I don't think any of these files (neither .rdf nor .ttl) have a BOM at
>> the beginning of the file.
>> http://people.w3.org/rishida/utils/bomtester/index.php?filename=http%3A%2F%2Fwww.w3.org%2Fns%2Fformats%2Fdata%2FRDF_XML.rdf
>> 
>> The W3C RDF Validator has also no bug in dealing with RDF/XML files that
>> have a BOM.
> 
> +1.
> 
> I tried another file under ns/:
> 
> <http://www.w3.org/ns/ma-ont.rdf>
> 
> => "Undecodable data when reading URI at byte 24574 using encoding 'UTF-8'."
> 
> And then the rdf namespace:
> 
> => "... byte 0 ..."
> 
> But <http://people.w3.org/simon/foaf.rdf> was fine.
> 
> Hypothesis: validating rdf under the www.w3.org domain is broken.
> 
> It may be unrelated to encoding. The error is triggered by any
> IOException reading characters from an input stream reader.
> 
> Damian

Received on Thursday, 26 April 2012 15:33:48 UTC