W3C home > Mailing lists > Public > semantic-web@w3.org > November 2018

Re: "Language-tagged strings Re: Toward easier RDF: a proposal"

From: Pat Hayes <phayes@ihmc.us>
Date: Sat, 24 Nov 2018 12:01:19 -0600
To: Christian Chiarcos <christian.chiarcos@web.de>, andy@seaborne.org
Cc: hugh@glasers.org, SW-forum <semantic-web@w3.org>, w.g.j.beek@vu.nl
Message-ID: <bdf2142d-6ba0-a6a0-0ff6-af21b81be1b0@ihmc.us>
On 11/23/18 9:53 AM, Christian Chiarcos wrote:
> Am Fr., 23. Nov. 2018 um 15:55 Uhr schrieb Christian Chiarcos 
> <christian.chiarcos@web.de <mailto:christian.chiarcos@web.de>>:
> 
>     A much more convenient solution would be to identify the
>     language by means of a URI. This can be an ISO 639 category
>     (see under http://id.loc.gov/vocabulary/iso639-2.html and
>     http://id.loc.gov/vocabulary/iso639-1.html; for ISO 639, cf.
>     http://www.lexvo.org/), or provided by another authority
>     (e.g., https://glottolog.org/). Other properties (e.g., xsd
>     datatypes) could also be stated about a literal. Two strings
>     could be considered identical if the values are the same and
>     the properties of one are a proper subset of the properties
>     of the other.
> 
>     Not sure what the right data structure or representation
>     should be. Maybe a kind of container structure for literal
>     metadata (similar to the @ notation and the lang() properties
>     that we have now).
> 
> 
> Thinking about this, a downward-compatible notation is possible:
> - take @ as a short-hand for ^^xsd:string, with language 
> identifiers following
> - if the language identifier is not a URI, it must be BCP47
> - BCP47 codes can be decomposed in the background into their 
> sub-properties
> - permit multiple language URIs/BCP47 codes (if you want to 
> provide both a BCP47 code [indicating region and script] and a 
> URI [unambiguously identifying the language])
> - let plain literals be untypedype

No. All literals MUST have a type, so that queries can have a 
unique response when they ask for the type or specify the type. 
The RDF 1.1 WG spent a lot of time and effort on this. Allowing 
untyped plain literals in RDF 2004 was a bug. Please do not screw 
this up again. Plain literals are syntactically legal (to 
preserve backward compatibility) but they now have type xsd:string.

Pat Hayes

> 
> If literals can carry any number of properties, we get (something 
> like) the following pairs of literals and properties:
> 
> 1. "рука"@sr-RS-Cyrl
> => [ rdf:value "рука"; a xsd:string; dct:language 
> <http://id.loc.gov/vocabulary/iso639-1/sr>; dct:coverage 
> <http://lexvo.org/id/iso3166/RS>; 
> <http://lexvo.org/ontology#usesScript> 
> <http://lexvo.org/id/script/Cyrl> ]
> 
> 2. "рука"
> => [ rdf:value "рука" ]

Which is a xsd:string.

> 
> 3. "рука"@sr
> => [ rdf:value "рука"; a xsd:string; dct:language 
> <http://id.loc.gov/vocabulary/iso639-1/sr>]
> 
> 4. "рука"^^xsd:str
> => [ rdf:value "рука"; a xsd:string ]
> 
> 5. "рука"@<https://glottolog.org/resource/languoid/id/serb1264>
> => [ rdf:value "рука"; a xsd:string; dct:language 
> <https://glottolog.org/resource/languoid/id/serb1264>]
> 
> 6. "рука"@sr-Cyrs
> => [ rdf:value "рука"; a xsd:string; dct:language 
> <http://id.loc.gov/vocabulary/iso639-1/sr>; 
> http://lexvo.org/ontology#usesScript> 
> <http://lexvo.org/id/script/Cyrs> ]
> (Serbian in Cyrillic/Old Church Slavonian variant)
> 
> Assume that equality checks whether values are identical and the 
> properties of one string are a subset of the properties of the 
> other, the strings 1-4 are equal.
> For String 5, it's more complicated, but 
> https://glottolog.org/resource/languoid/id/serb1264 does also 
> provide a ISO639 code. Unfortunately, not with a owl:sameAs link 
> to the ISO639-1/2 maintainers, but only as a string value, but 
> this could be requested from the glottolog maintainers.
> String 6 would be equal to 2,3,4, but not to 1.
> 
> This creates some overhead, but the nice thing about this is that 
> we no longer need to cast between language-specific and plain 
> literals, nor between xsd:string and plain literals. An 
> (unintended?) side-effect would be that a plain literal can match 
> against any language.
> 
> [BTW: No need to model this as blank nodes, but it kind of feels 
> natural here ;) ]
> 
> Best,
> Christian
> -- 
> Prof. Dr. Christian Chiarcos
> Applied Computational Linguistics
> Johann Wolfgang Goethe Universität Frankfurt a. M.
> 60054 Frankfurt am Main, Germany
> 
> office: Robert-Mayer-Str. 10, #401b
> mail: chiarcos@informatik.uni-frankfurt.de 
> <mailto:chiarcos@informatik.uni-frankfurt.de>
> web: http://acoli.cs.uni-frankfurt.de
> tel: +49-(0)69-798-22463
> fax: +49-(0)69-798-28931
> 

-- 
-----------------------------------
call or text to 850 291 0667
www.ihmc.us/groups/phayes/
www.facebook.com/the.pat.hayes
Received on Saturday, 24 November 2018 18:01:58 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:42:03 UTC