W3C home > Mailing lists > Public > public-rdf-comments@w3.org > November 2016

Re: Are any tags valid as literal language tags?

From: Andy Seaborne <andy@apache.org>
Date: Thu, 17 Nov 2016 11:52:36 +0000
To: public-rdf-comments@w3.org
Message-ID: <eccfe7e0-b1de-51af-2852-f26cd9d6efb5@apache.org>
Hi Rob,

In [1] RDF says that language tags must be syntactically correct but 
does not require them to be registered.  As the register is not fixed, 
it would not make sense otherwise.

https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
"""
The language tag MUST be well-formed according to section 2.2.9 of [BCP47].
"""

"well-formed" is a syntax condition from BCP47:

BCP47, section 2.2.9 says:
""
A tag is considered "well-formed" if it conforms to the ABNF
    (Section 2.1).  Language tags may be well-formed in terms of syntax
    but not valid in terms of content.
"""

 > Does that mean that RDF language literals that have language tags not
 > in that list is invalid?
 >
 > E.g. "Foo@en" is a valid language literal node, whereas "Foo@zz" is an
 > invalid language literal node?

"Foo"@zz is legal RDF, with an unregistered language tag.  It does make 
them a good idea though.

As long as they are syntactically correct it's legal RDF.  The grammar 
for Turtle and other formats use a weaker form than even the older RFC 
3066 syntax:

[144s] 	LANGTAG 	::= 	'@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*

See BCP 47 section 2.2.9
https://tools.ietf.org/html/bcp47#section-2.2.9
for the discussion on this.

This is a pragmatic design to provide some syntax even though further 
validation is necessary. (The same is true of IRIs - the language 
grammar is a step in process but not a complete syntax.)

	Andy

On 17/11/16 10:30, Rob Stewart wrote:
> Hi,
>
> Document "Resource Description Framework (RDF): Concepts and Abstract
> Syntax" from 2004:
> https://www.w3.org/TR/2004/REC-rdf-concepts-20040210
>
> In section 6.5 says "Plain literals have a lexical form and optionally
> a language tag as defined by [RFC-3066], normalized to lowercase."
>
> Here's a copy of RFC-3066:
> https://www.ietf.org/rfc/rfc3066.txt
>
> In Section 2.2 "Language tag sources", it says:
>
> "All 2-letter subtags are interpreted according to assignments found
> in ISO standard 639, "Code for the representation of names of
> languages" [ISO 639]"
>
> And here are the list of ISO-639-1 codes:
> https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
>
> Does that mean that RDF language literals that have language tags not
> in that list is invalid?
>
> E.g. "Foo@en" is a valid language literal node, whereas "Foo@zz" is an
> invalid language literal node?
>
> Thanks,
>
> --
> Rob
>
Received on Thursday, 17 November 2016 11:53:11 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 17 November 2016 11:53:12 UTC