W3C home > Mailing lists > Public > semantic-web@w3.org > November 2018

Re: "Language-tagged strings Re: Toward easier RDF: a proposal"

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Sat, 24 Nov 2018 19:54:28 +0100
Message-ID: <CAC1YGdi_g+3oDRF6bKrPrZvtb4vyCmohu9EJX51vq82hB_fpAQ@mail.gmail.com>
To: phayes@ihmc.us
Cc: Andy Seaborne <andy@seaborne.org>, Hugh Glaser <hugh@glasers.org>, SW-forum <semantic-web@w3.org>, w.g.j.beek@vu.nl
     Am .11.2018, 19:01 Uhr, schrieb Pat Hayes <phayes@ihmc.us>:

> No. All literals MUST have a type, so that queries can have a unique
> response when they ask for the type or specify the type. ...
> Plain literals
> are syntactically legal (to preserve backward compatibility) but they
> now have type xsd:string.

Point taken. But this only means that "рука" entails [a xsd:string] below.
As far as comparisons between strings are concerned, this makes no
difference to the example, as the subset relation between the (implicit)
properties of "рука"@sr and "рука" still holds ;)

>> => [ rdf:value "рука" ]
>
> Which is a xsd:string.

Right.

Best,
Christian
-- Prof. Dr. Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany
office: Robert-Mayer-Str. 10, #401b
mail: chiarcos@informatik.uni-frankfurt.de
<mailto:chiarcos@informatik.uni-frankfurt.de
<chiarcos@informatik.uni-frankfurt.de>>
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931

Am Sa., 24. Nov. 2018 um 19:01 Uhr schrieb Pat Hayes <phayes@ihmc.us>:

> On 11/23/18 9:53 AM, Christian Chiarcos wrote:
> > Am Fr., 23. Nov. 2018 um 15:55 Uhr schrieb Christian Chiarcos
> > <christian.chiarcos@web.de <mailto:christian.chiarcos@web.de>>:
> >
> >     A much more convenient solution would be to identify the
> >     language by means of a URI. This can be an ISO 639 category
> >     (see under http://id.loc.gov/vocabulary/iso639-2.html and
> >     http://id.loc.gov/vocabulary/iso639-1.html; for ISO 639, cf.
> >     http://www.lexvo.org/), or provided by another authority
> >     (e.g., https://glottolog.org/). Other properties (e.g., xsd
> >     datatypes) could also be stated about a literal. Two strings
> >     could be considered identical if the values are the same and
> >     the properties of one are a proper subset of the properties
> >     of the other.
> >
> >     Not sure what the right data structure or representation
> >     should be. Maybe a kind of container structure for literal
> >     metadata (similar to the @ notation and the lang() properties
> >     that we have now).
> >
> >
> > Thinking about this, a downward-compatible notation is possible:
> > - take @ as a short-hand for ^^xsd:string, with language
> > identifiers following
> > - if the language identifier is not a URI, it must be BCP47
> > - BCP47 codes can be decomposed in the background into their
> > sub-properties
> > - permit multiple language URIs/BCP47 codes (if you want to
> > provide both a BCP47 code [indicating region and script] and a
> > URI [unambiguously identifying the language])
> > - let plain literals be untypedype
>
> No. All literals MUST have a type, so that queries can have a
> unique response when they ask for the type or specify the type.
> The RDF 1.1 WG spent a lot of time and effort on this. Allowing
> untyped plain literals in RDF 2004 was a bug. Please do not screw
> this up again. Plain literals are syntactically legal (to
> preserve backward compatibility) but they now have type xsd:string.
>
> Pat Hayes
>
> >
> > If literals can carry any number of properties, we get (something
> > like) the following pairs of literals and properties:
> >
> > 1. "рука"@sr-RS-Cyrl
> > => [ rdf:value "рука"; a xsd:string; dct:language
> > <http://id.loc.gov/vocabulary/iso639-1/sr>; dct:coverage
> > <http://lexvo.org/id/iso3166/RS>;
> > <http://lexvo.org/ontology#usesScript>
> > <http://lexvo.org/id/script/Cyrl> ]
> >
> > 2. "рука"
> > => [ rdf:value "рука" ]
>
> Which is a xsd:string.
>
> >
> > 3. "рука"@sr
> > => [ rdf:value "рука"; a xsd:string; dct:language
> > <http://id.loc.gov/vocabulary/iso639-1/sr>]
> >
> > 4. "рука"^^xsd:str
> > => [ rdf:value "рука"; a xsd:string ]
> >
> > 5. "рука"@<https://glottolog.org/resource/languoid/id/serb1264>
> > => [ rdf:value "рука"; a xsd:string; dct:language
> > <https://glottolog.org/resource/languoid/id/serb1264>]
> >
> > 6. "рука"@sr-Cyrs
> > => [ rdf:value "рука"; a xsd:string; dct:language
> > <http://id.loc.gov/vocabulary/iso639-1/sr>;
> > http://lexvo.org/ontology#usesScript>
> > <http://lexvo.org/id/script/Cyrs> ]
> > (Serbian in Cyrillic/Old Church Slavonian variant)
> >
> > Assume that equality checks whether values are identical and the
> > properties of one string are a subset of the properties of the
> > other, the strings 1-4 are equal.
> > For String 5, it's more complicated, but
> > https://glottolog.org/resource/languoid/id/serb1264 does also
> > provide a ISO639 code. Unfortunately, not with a owl:sameAs link
> > to the ISO639-1/2 maintainers, but only as a string value, but
> > this could be requested from the glottolog maintainers.
> > String 6 would be equal to 2,3,4, but not to 1.
> >
> > This creates some overhead, but the nice thing about this is that
> > we no longer need to cast between language-specific and plain
> > literals, nor between xsd:string and plain literals. An
> > (unintended?) side-effect would be that a plain literal can match
> > against any language.
> >
> > [BTW: No need to model this as blank nodes, but it kind of feels
> > natural here ;) ]
> >
> > Best,
> > Christian
> > --
> > Prof. Dr. Christian Chiarcos
> > Applied Computational Linguistics
> > Johann Wolfgang Goethe Universität Frankfurt a. M.
> > 60054 Frankfurt am Main, Germany
> >
> > office: Robert-Mayer-Str. 10, #401b
> > mail: chiarcos@informatik.uni-frankfurt.de
> > <mailto:chiarcos@informatik.uni-frankfurt.de>
> > web: http://acoli.cs.uni-frankfurt.de
> > tel: +49-(0)69-798-22463
> > fax: +49-(0)69-798-28931
> >
>
> --
> -----------------------------------
> call or text to 850 291 0667
> www.ihmc.us/groups/phayes/
> www.facebook.com/the.pat.hayes
>
>
>
Received on Saturday, 24 November 2018 18:55:02 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:57 UTC