Re: Rethinking how literals are defined

Markus,

The section in the way it’s currently written is the result of long and protracted arguments. If you think you can improve the wording, it would be helpful if you could make a concrete proposal.

Concrete syntaxes need to say that the datatype of a literal is implicitly rdf:langString if a language tag is present, and that it is implicitly xsd:string if neither datatype nor language string are present.

I agree that the regex is entirely counterproductive and should be removed.

Best,
Richard


On 13 Nov 2013, at 09:23, Markus Lanthaler <markus.lanthaler@gmx.net> wrote:

> Hi,
> 
> I've just had a look at the section defining literals in RDF Concepts [1]
> and believe it needs some love. Currently it says:
> 
>  A literal in an RDF graph consists of two or three elements:
>    . a lexical form, being a Unicode [UNICODE] string, which 
>      SHOULD be in Normal Form C [NFC],
>    . a datatype IRI, being an IRI identifying a datatype that
>      Determines how the lexical form maps to a literal value.
> 
> The third element, the language tag, isn't described at all in that list.
> IMO we should add it. Then Concepts goes on and says:
> 
>  A literal is a language-tagged string if and only if its datatype IRI
>  is http://www.w3.org/1999/02/22-rdf-syntax-ns#langString, and only in
>  this case the third element is present:
>    . a non-empty language tag as defined by [BCP47]. The language tag
>      MUST be well-formed according to section 2.2.9 of [BCP47]. Lexical
>      representations of language tags MAY be converted to lower case. The
>      value space of language tags is always in lower case.
>    . A badly formed language tag MUST be treated as a syntax error.
> 
>  Implementors might wish to note that language tags conform to the regular
>  expression '@' [a-zA-Z]{1,8} ('-' [a-zA-Z0-9]{1,8})* before normalizing
>  to lowercase.
> 
> Not only does this contain grammatical glitches and a wrong regex (as
> pointed out earlier) but it probably also confuses readers. None (!) of our
> syntaxes allows to serialize a literal with both a datatype *and* and
> language tag. In fact, apart from JSON-LD and Turtle none of the syntax
> specs even mention rdf:langString which has to be fixed. Despite that, using
> a datatype and a language tag always results in a syntax error, even if you
> would use rdf:langString as datatype.
> 
> The statement that follows the description above is even made worse by the
> sentence that follows it:
> 
>  Concrete syntaxes MAY support simple literals, consisting of only a 
>  lexical form without any datatype IRI or language tag.
> 
> This leaves the impression that it is fine to serialize a literal without
> datatype and without language tag but doesn't mention that it is also fine
> to serialize it with just a language tag and thus the natural conclusion
> seems to be that that's not allowed.
> 
> I know why rdf:langString has been introduced in the first place and you
> know that I'm not happy with restricting language-tagging to that type - but
> there's very little we can do about that at this stage given our charter I
> think. What we could do though, is to define language-tagged strings so that
> the datatype is implicit, i.e., a valid language-tagged string consists of a
> lexical form and a language tag and always has the implicit type
> rdf:langString (which formally isn't a datatype anyway).
> 
> Perhaps it would also make sense to introduce a term like "typed value" (as
> used in JSON-LD, but I would be fine with typed literal as well) to make it
> easier to talk about literals which are not language-tagged strings.
> 
> Thoughts?
> 
> 
> Cheers,
> Markus
> 
> 
> [1] http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
> 
> 
> --
> Markus Lanthaler
> @markuslanthaler
> 
> 

Received on Thursday, 14 November 2013 21:46:00 UTC