- From: Markus Lanthaler <markus.lanthaler@gmx.net>
- Date: Wed, 13 Nov 2013 10:23:06 +0100
- To: "'RDF WG'" <public-rdf-wg@w3.org>
Hi, I've just had a look at the section defining literals in RDF Concepts [1] and believe it needs some love. Currently it says: A literal in an RDF graph consists of two or three elements: . a lexical form, being a Unicode [UNICODE] string, which SHOULD be in Normal Form C [NFC], . a datatype IRI, being an IRI identifying a datatype that Determines how the lexical form maps to a literal value. The third element, the language tag, isn't described at all in that list. IMO we should add it. Then Concepts goes on and says: A literal is a language-tagged string if and only if its datatype IRI is http://www.w3.org/1999/02/22-rdf-syntax-ns#langString, and only in this case the third element is present: . a non-empty language tag as defined by [BCP47]. The language tag MUST be well-formed according to section 2.2.9 of [BCP47]. Lexical representations of language tags MAY be converted to lower case. The value space of language tags is always in lower case. . A badly formed language tag MUST be treated as a syntax error. Implementors might wish to note that language tags conform to the regular expression '@' [a-zA-Z]{1,8} ('-' [a-zA-Z0-9]{1,8})* before normalizing to lowercase. Not only does this contain grammatical glitches and a wrong regex (as pointed out earlier) but it probably also confuses readers. None (!) of our syntaxes allows to serialize a literal with both a datatype *and* and language tag. In fact, apart from JSON-LD and Turtle none of the syntax specs even mention rdf:langString which has to be fixed. Despite that, using a datatype and a language tag always results in a syntax error, even if you would use rdf:langString as datatype. The statement that follows the description above is even made worse by the sentence that follows it: Concrete syntaxes MAY support simple literals, consisting of only a lexical form without any datatype IRI or language tag. This leaves the impression that it is fine to serialize a literal without datatype and without language tag but doesn't mention that it is also fine to serialize it with just a language tag and thus the natural conclusion seems to be that that's not allowed. I know why rdf:langString has been introduced in the first place and you know that I'm not happy with restricting language-tagging to that type - but there's very little we can do about that at this stage given our charter I think. What we could do though, is to define language-tagged strings so that the datatype is implicit, i.e., a valid language-tagged string consists of a lexical form and a language tag and always has the implicit type rdf:langString (which formally isn't a datatype anyway). Perhaps it would also make sense to introduce a term like "typed value" (as used in JSON-LD, but I would be fine with typed literal as well) to make it easier to talk about literals which are not language-tagged strings. Thoughts? Cheers, Markus [1] http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal -- Markus Lanthaler @markuslanthaler
Received on Wednesday, 13 November 2013 09:23:40 UTC