- From: Christian Chiarcos <christian.chiarcos@web.de>
- Date: Sat, 24 Nov 2018 19:54:28 +0100
- To: phayes@ihmc.us
- Cc: Andy Seaborne <andy@seaborne.org>, Hugh Glaser <hugh@glasers.org>, SW-forum <semantic-web@w3.org>, w.g.j.beek@vu.nl
- Message-ID: <CAC1YGdi_g+3oDRF6bKrPrZvtb4vyCmohu9EJX51vq82hB_fpAQ@mail.gmail.com>
Am .11.2018, 19:01 Uhr, schrieb Pat Hayes <phayes@ihmc.us>: > No. All literals MUST have a type, so that queries can have a unique > response when they ask for the type or specify the type. ... > Plain literals > are syntactically legal (to preserve backward compatibility) but they > now have type xsd:string. Point taken. But this only means that "рука" entails [a xsd:string] below. As far as comparisons between strings are concerned, this makes no difference to the example, as the subset relation between the (implicit) properties of "рука"@sr and "рука" still holds ;) >> => [ rdf:value "рука" ] > > Which is a xsd:string. Right. Best, Christian -- Prof. Dr. Christian Chiarcos Applied Computational Linguistics Johann Wolfgang Goethe Universität Frankfurt a. M. 60054 Frankfurt am Main, Germany office: Robert-Mayer-Str. 10, #401b mail: chiarcos@informatik.uni-frankfurt.de <mailto:chiarcos@informatik.uni-frankfurt.de <chiarcos@informatik.uni-frankfurt.de>> web: http://acoli.cs.uni-frankfurt.de tel: +49-(0)69-798-22463 fax: +49-(0)69-798-28931 Am Sa., 24. Nov. 2018 um 19:01 Uhr schrieb Pat Hayes <phayes@ihmc.us>: > On 11/23/18 9:53 AM, Christian Chiarcos wrote: > > Am Fr., 23. Nov. 2018 um 15:55 Uhr schrieb Christian Chiarcos > > <christian.chiarcos@web.de <mailto:christian.chiarcos@web.de>>: > > > > A much more convenient solution would be to identify the > > language by means of a URI. This can be an ISO 639 category > > (see under http://id.loc.gov/vocabulary/iso639-2.html and > > http://id.loc.gov/vocabulary/iso639-1.html; for ISO 639, cf. > > http://www.lexvo.org/), or provided by another authority > > (e.g., https://glottolog.org/). Other properties (e.g., xsd > > datatypes) could also be stated about a literal. Two strings > > could be considered identical if the values are the same and > > the properties of one are a proper subset of the properties > > of the other. > > > > Not sure what the right data structure or representation > > should be. Maybe a kind of container structure for literal > > metadata (similar to the @ notation and the lang() properties > > that we have now). > > > > > > Thinking about this, a downward-compatible notation is possible: > > - take @ as a short-hand for ^^xsd:string, with language > > identifiers following > > - if the language identifier is not a URI, it must be BCP47 > > - BCP47 codes can be decomposed in the background into their > > sub-properties > > - permit multiple language URIs/BCP47 codes (if you want to > > provide both a BCP47 code [indicating region and script] and a > > URI [unambiguously identifying the language]) > > - let plain literals be untypedype > > No. All literals MUST have a type, so that queries can have a > unique response when they ask for the type or specify the type. > The RDF 1.1 WG spent a lot of time and effort on this. Allowing > untyped plain literals in RDF 2004 was a bug. Please do not screw > this up again. Plain literals are syntactically legal (to > preserve backward compatibility) but they now have type xsd:string. > > Pat Hayes > > > > > If literals can carry any number of properties, we get (something > > like) the following pairs of literals and properties: > > > > 1. "рука"@sr-RS-Cyrl > > => [ rdf:value "рука"; a xsd:string; dct:language > > <http://id.loc.gov/vocabulary/iso639-1/sr>; dct:coverage > > <http://lexvo.org/id/iso3166/RS>; > > <http://lexvo.org/ontology#usesScript> > > <http://lexvo.org/id/script/Cyrl> ] > > > > 2. "рука" > > => [ rdf:value "рука" ] > > Which is a xsd:string. > > > > > 3. "рука"@sr > > => [ rdf:value "рука"; a xsd:string; dct:language > > <http://id.loc.gov/vocabulary/iso639-1/sr>] > > > > 4. "рука"^^xsd:str > > => [ rdf:value "рука"; a xsd:string ] > > > > 5. "рука"@<https://glottolog.org/resource/languoid/id/serb1264> > > => [ rdf:value "рука"; a xsd:string; dct:language > > <https://glottolog.org/resource/languoid/id/serb1264>] > > > > 6. "рука"@sr-Cyrs > > => [ rdf:value "рука"; a xsd:string; dct:language > > <http://id.loc.gov/vocabulary/iso639-1/sr>; > > http://lexvo.org/ontology#usesScript> > > <http://lexvo.org/id/script/Cyrs> ] > > (Serbian in Cyrillic/Old Church Slavonian variant) > > > > Assume that equality checks whether values are identical and the > > properties of one string are a subset of the properties of the > > other, the strings 1-4 are equal. > > For String 5, it's more complicated, but > > https://glottolog.org/resource/languoid/id/serb1264 does also > > provide a ISO639 code. Unfortunately, not with a owl:sameAs link > > to the ISO639-1/2 maintainers, but only as a string value, but > > this could be requested from the glottolog maintainers. > > String 6 would be equal to 2,3,4, but not to 1. > > > > This creates some overhead, but the nice thing about this is that > > we no longer need to cast between language-specific and plain > > literals, nor between xsd:string and plain literals. An > > (unintended?) side-effect would be that a plain literal can match > > against any language. > > > > [BTW: No need to model this as blank nodes, but it kind of feels > > natural here ;) ] > > > > Best, > > Christian > > -- > > Prof. Dr. Christian Chiarcos > > Applied Computational Linguistics > > Johann Wolfgang Goethe Universität Frankfurt a. M. > > 60054 Frankfurt am Main, Germany > > > > office: Robert-Mayer-Str. 10, #401b > > mail: chiarcos@informatik.uni-frankfurt.de > > <mailto:chiarcos@informatik.uni-frankfurt.de> > > web: http://acoli.cs.uni-frankfurt.de > > tel: +49-(0)69-798-22463 > > fax: +49-(0)69-798-28931 > > > > -- > ----------------------------------- > call or text to 850 291 0667 > www.ihmc.us/groups/phayes/ > www.facebook.com/the.pat.hayes > > >
Received on Saturday, 24 November 2018 18:55:02 UTC