- From: Antoine Zimmermann <antoine.zimmermann@insa-lyon.fr>
- Date: Thu, 26 May 2011 15:43:01 +0000
- Cc: public-rdf-wg <public-rdf-wg@w3.org>
Le 26/05/2011 16:27, Richard Cyganiak a écrit : > On 25 May 2011, at 17:50, Antoine Zimmermann wrote: >> Adding datatypes for each language tags may work as follows: > > Thanks for writing this up Antoine. Would be great if you could > maintain this in a wiki page as well! Yes, I'm going to do it. >> For a language tag {langTag}, "xxx"@{langTag} would be interpreted >> as a typed literal of type rdf:lang{langTag}. > > Make that rdf:string-{langTag}, so we'd end up with rdf:string-fr, > rdf:string-en-gb and so on. You're right, it looks better. > So, would it be accurate to say that "xxx"@en is syntactic sugar for > "xxx"^^rdf:string-en ? Yes. In any case, the interpretations of "xxx"@en and of "xxx"^^rdf:string-en would be the same, so the syntax "xxx"@en would be preferred for backward compatibility reason. > Would serializers be allowed to emit the "xxx"^^rdf:string-en form? Preferably not. >> For any language tag {langTag}, there is a datatype >> rdf:lang{langTag} such that: >> >> - the lexical space is all unicode strings. - the value space is >> all pairs<string,{langTag}> - the lexical to value space is >> L2V(rdf:lang{langTag})(xxx)=<xxx,{langTag}> > > I see > >> There is an infinite number of lang datatypes and {langTag} SHOULD >> be restricted to what RFC 5646 defines, but implementation MAY >> accept any string for lang tags (e.g., "foo"@mylangtag-bar42 MAY be >> considered as a valid literal by parsers), > > RDF Concepts currently says that the language tag must be valid > according to RFC 5646, and lowercase. So I'd say that anything of the > form rdf:lang{langTag} where {langTag} is not lowercase or not > syntactically valid according to RFC 5646 is an ill-typed literal. The MAY is there to allow implementers to avoid digging into the horrible lang tag specifications when reading the tag as a simple string is enough to do a good job. I'm not sure it would cause any failure if a string such as "mylang" was incorrectly accepted as a language tag. >> in which case, a corresponding datatype rdf:land{langTag} MUST >> exist. > > I don't know what that is supposed to mean. I wanted to say that if {langTag} is a valid language tag (according to RFC 5646), then there is a datatype rdf:lang{langTag} (or rdf:lang-{langTag} if you prefer). >> Additionally, we can add an additional datatype which is a >> superclass of all the lang datatypes (e.g., >> rdf:LangTaggedLiteral). > > Make that rdf:LangTaggedString for increased clarity. Temporarilly, and make it shorter when a better name is found. >> This additional datatype has an empty lexical space but its value >> space is the set of all pairs<string,tag>. > > This doesn't have to be a datatype. Making it a class would be easier > and sufficient for using it in rdfs:range declarations. Yes, and in fact it is better since by the definition of datatype, the lexical space must be a non empty set. >> It follows that the following triples are valid under the >> appropriate entailment regime: >> >> rdf:lang{langTag} rdf:type rdfs:Datatype; rdfs:subClassOf >> rdf:LangTaggedLiteral . > > I see > >> rdf:LangTaggedLiteral rdf:type rdfs:Datatype; > > I'd make this: > > rdf:LangTaggedString a rdfs:Class; Agreed. >> rdfs:subClassOf rdf:PlainLiteral . >> >> In OWL, we have, for all pairs of distinct {langTag1} and >> {langTag2}: >> >> rdf:lang{langTag1} owl:disjointWith rdf:lang{langTag2}. >> rdf:LangTaggedLiteral owl:equivalentClass [ rdf:type >> rdfs:Datatype; owl:onDatatype rdf:PlainLiteral; >> owl:withRestrictions( [rdf:langRange "*"] ) ]. rdf:lang{langTag} >> owl:equivalentClass [ rdf:type rdfs:Datatype; owl:onDatatype >> rdf:PlainLiteral; owl:withRestrictions( [rdf:langRange "{langTag}"] >> ) ]. >> >> DRAWBACKS: - an infinite number of datatypes (but we already have >> an infinite number of RDF properties anyway); - OWL 2 does not talk >> about these new types, so the OWL 2 RDF-based semantics is >> incomplete wrt RDF 1.1 semantics; - there is no relationship >> between "sublanguages" like "en" VS "en-GB". > > This point is no different than in current RDF, nor is it any > different from any other proposal considered so far, so it's not a > drawback. Yes, I just wanted to emphasise that this proposal does not add this feature, because some people in the WG said they would like to see a relationship between, e.g., "foo"@en and "foo"@en-GB (see the answers to your quiz). >> - others? >> >> ADVANTAGES: - compared to rdf:PlainLiteral, we distinguish >> langTagged and non-langTagged literals; and the lexical form is >> more natural; - one can define language-specific range restrictions >> (e.g., ex:englishLabel rdfs:range rdf:langen.) in RDF without the >> need for OWL 2 datatype machinery; - compared to RDF alone, we have >> everything typed, which can be seen as a simplification. - others? >> >> >> Regards, -- Antoine Zimmermann Researcher at: Laboratoire >> d'InfoRmatique en Image et Systèmes d'information Database Group 7 >> Avenue Jean Capelle 69621 Villeurbanne Cedex France Tel: +33(0)4 72 >> 43 61 74 - Fax: +33(0)4 72 43 87 13 Lecturer at: Institut National >> des Sciences Appliquées de Lyon 20 Avenue Albert Einstein 69621 >> Villeurbanne Cedex France antoine.zimmermann@insa-lyon.fr >> http://zimmer.aprilfoolsreview.com/ >> >> > -- Antoine Zimmermann Researcher at: Laboratoire d'InfoRmatique en Image et Systèmes d'information Database Group 7 Avenue Jean Capelle 69621 Villeurbanne Cedex France Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13 Lecturer at: Institut National des Sciences Appliquées de Lyon 20 Avenue Albert Einstein 69621 Villeurbanne Cedex France antoine.zimmermann@insa-lyon.fr http://zimmer.aprilfoolsreview.com/
Received on Saturday, 28 May 2011 10:14:08 UTC