- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 27 May 2002 15:31:02 +0900
- To: Aaron Swartz <me@aaronsw.com>, Misha.Wolf@reuters.com
- Cc: www-tag@w3.org, www-rdf-validator@w3.org
[I'm copying www-rdf-validator@w3.org, because there is an error report for the validator, and some suggestions of how to fix it.] At 19:18 02/05/24 -0500, Aaron Swartz wrote: >On Friday, May 24, 2002, at 05:04 PM, Misha.Wolf@reuters.com wrote: >>Which utilities? > >All the current RDF tools, I think. I don't think any of them have been >updated to support normalization or Unicode storage. Certainly all the >tools I've written don't support it. If you take a look at the RDF >Validator[1] you'll find that it %-encodes characters like 端, as most of >the RDF tools I know do. How much work would it be for the RDF Validator to change this? My guess is that it would be quite easy, and it would result in overall less code. I would be very glad to help. By the way, I just tested the RDF Validator with some simple input. While it gets to the correct %hh escaping in URIs, it messes up the literals. That's because the validator input page is labeled as being in iso-8859-1, and the output is labeled as being in UTF-8, but for literals, there is no coversion in between. To fix it, the following steps are needed: - Set the encoding of http://www.w3.org/RDF/Validator/Overview.html to UTF-8. I can do that in about one minute. Please tell me when to do it. - Find the place in the code where the URIs are converted from iso-8859-1 to UTF-8. Remove that conversion. This should be rather easy. Please tell me if you need help. - Fix graphVis. This seems to currently run under the assumption that everything (.dot files,...) is in iso-8859-1. In the short run, it could be called by converting from UTF-8 to iso-8859-1 and replacing characters not representable in iso-8859-1 with something like a ? or so. In the long term, it should be changed so that it can correctly render more than just iso-8859-1. This applies only to PNG and GIF; for SVG, graphVis currently does gigo (garbage in, garbage out), but feeding it UTF-8 would do the right thing. For the others, the easiest would be to use a batch SVG renderer. - Go through the collection of RDF saved for test purposes, and change the first line of anything that contains bytes higher than 0x7F from <?xml version="1.0"?> to <?xml version="1.0" encoding='iso-8859-1'?> and additionally check the data for garbage cases. I may be able to help with this, too. My conclusions from this are: - Yes, there are indeed problems with RDF tools and i18n. - Such problems should be fixed asap. - The problems start with literals, not with resource identifiers. - Fixing the problems with literals will fix the problems with resource identifiers too, in most cases. - For most part, fixing the problems probably takes less time than this discussion. Regards, Martin.
Received on Monday, 27 May 2002 02:34:08 UTC