- From: Pat Hayes <phayes@ihmc.us>
- Date: Sat, 24 Nov 2018 12:01:19 -0600
- To: Christian Chiarcos <christian.chiarcos@web.de>, andy@seaborne.org
- Cc: hugh@glasers.org, SW-forum <semantic-web@w3.org>, w.g.j.beek@vu.nl
On 11/23/18 9:53 AM, Christian Chiarcos wrote: > Am Fr., 23. Nov. 2018 um 15:55 Uhr schrieb Christian Chiarcos > <christian.chiarcos@web.de <mailto:christian.chiarcos@web.de>>: > > A much more convenient solution would be to identify the > language by means of a URI. This can be an ISO 639 category > (see under http://id.loc.gov/vocabulary/iso639-2.html and > http://id.loc.gov/vocabulary/iso639-1.html; for ISO 639, cf. > http://www.lexvo.org/), or provided by another authority > (e.g., https://glottolog.org/). Other properties (e.g., xsd > datatypes) could also be stated about a literal. Two strings > could be considered identical if the values are the same and > the properties of one are a proper subset of the properties > of the other. > > Not sure what the right data structure or representation > should be. Maybe a kind of container structure for literal > metadata (similar to the @ notation and the lang() properties > that we have now). > > > Thinking about this, a downward-compatible notation is possible: > - take @ as a short-hand for ^^xsd:string, with language > identifiers following > - if the language identifier is not a URI, it must be BCP47 > - BCP47 codes can be decomposed in the background into their > sub-properties > - permit multiple language URIs/BCP47 codes (if you want to > provide both a BCP47 code [indicating region and script] and a > URI [unambiguously identifying the language]) > - let plain literals be untypedype No. All literals MUST have a type, so that queries can have a unique response when they ask for the type or specify the type. The RDF 1.1 WG spent a lot of time and effort on this. Allowing untyped plain literals in RDF 2004 was a bug. Please do not screw this up again. Plain literals are syntactically legal (to preserve backward compatibility) but they now have type xsd:string. Pat Hayes > > If literals can carry any number of properties, we get (something > like) the following pairs of literals and properties: > > 1. "рука"@sr-RS-Cyrl > => [ rdf:value "рука"; a xsd:string; dct:language > <http://id.loc.gov/vocabulary/iso639-1/sr>; dct:coverage > <http://lexvo.org/id/iso3166/RS>; > <http://lexvo.org/ontology#usesScript> > <http://lexvo.org/id/script/Cyrl> ] > > 2. "рука" > => [ rdf:value "рука" ] Which is a xsd:string. > > 3. "рука"@sr > => [ rdf:value "рука"; a xsd:string; dct:language > <http://id.loc.gov/vocabulary/iso639-1/sr>] > > 4. "рука"^^xsd:str > => [ rdf:value "рука"; a xsd:string ] > > 5. "рука"@<https://glottolog.org/resource/languoid/id/serb1264> > => [ rdf:value "рука"; a xsd:string; dct:language > <https://glottolog.org/resource/languoid/id/serb1264>] > > 6. "рука"@sr-Cyrs > => [ rdf:value "рука"; a xsd:string; dct:language > <http://id.loc.gov/vocabulary/iso639-1/sr>; > http://lexvo.org/ontology#usesScript> > <http://lexvo.org/id/script/Cyrs> ] > (Serbian in Cyrillic/Old Church Slavonian variant) > > Assume that equality checks whether values are identical and the > properties of one string are a subset of the properties of the > other, the strings 1-4 are equal. > For String 5, it's more complicated, but > https://glottolog.org/resource/languoid/id/serb1264 does also > provide a ISO639 code. Unfortunately, not with a owl:sameAs link > to the ISO639-1/2 maintainers, but only as a string value, but > this could be requested from the glottolog maintainers. > String 6 would be equal to 2,3,4, but not to 1. > > This creates some overhead, but the nice thing about this is that > we no longer need to cast between language-specific and plain > literals, nor between xsd:string and plain literals. An > (unintended?) side-effect would be that a plain literal can match > against any language. > > [BTW: No need to model this as blank nodes, but it kind of feels > natural here ;) ] > > Best, > Christian > -- > Prof. Dr. Christian Chiarcos > Applied Computational Linguistics > Johann Wolfgang Goethe Universität Frankfurt a. M. > 60054 Frankfurt am Main, Germany > > office: Robert-Mayer-Str. 10, #401b > mail: chiarcos@informatik.uni-frankfurt.de > <mailto:chiarcos@informatik.uni-frankfurt.de> > web: http://acoli.cs.uni-frankfurt.de > tel: +49-(0)69-798-22463 > fax: +49-(0)69-798-28931 > -- ----------------------------------- call or text to 850 291 0667 www.ihmc.us/groups/phayes/ www.facebook.com/the.pat.hayes
Received on Saturday, 24 November 2018 18:01:58 UTC