W3C home > Mailing lists > Public > public-rdf-wg@w3.org > June 2012

Re: Turtle LC Draft

From: Ivan Herman <ivan@w3.org>
Date: Fri, 22 Jun 2012 15:32:41 +0200
Cc: RDF-WG WG <public-rdf-wg@w3.org>, "Eric Prud'hommeaux" <eric@w3.org>
Message-Id: <5F27A2DA-174D-4D69-8E8D-B1641322DE38@w3.org>
To: Gavin Carothers <gavin@carothers.name>

On Jun 22, 2012, at 15:20 , Gavin Carothers wrote:


>> - Appendix A says: "The character encoding of the embedded Turtle will match the HTML documents encoding.". Isn't this in contradiction to the fact that Turtle must be UTF-8? Formally, that means a turtle parser cannot just take the content of the <script> element and parse it, it has to make it sure that it is converted into UTF-8 first. Propose: add a remark to A.2 that the content of the <script> element must be converted to UTF-8 before being parsed by a Turtle parser.
> Not exactly true. If parsing from HTML the parser may need to simply
> accept a character stream rather than a byte stream. Providing
> specific instructions on what to do with the encoding feels a bit too
> restrictive on how it could be parsed. For example a JavaScript parser
> would likely see the character stream from the DOM and not even care
> about the encoding. If you used libhtml5 or another DOM parser to
> parse the HTML the same would also be true.

I do not want to get into a bike-shed-painting exercise here, the whole section being non-normative anyway. But isn't there a contradiction in the sense that the Turtle document specifies Turtle to be UTF-8, and, according to the current text, the script content may well be, say, UTF-16 'cause it was written in China or, worse, can be one of the non-UTF windows encoding? I guess something has to be said in the text, even if it may be less prescriptive than what I wrote...


Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Friday, 22 June 2012 13:33:07 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:04:18 UTC