Re: "Language-tagged strings Re: Toward easier RDF: a proposal" from Lucie-Aimée Kaffee on 2018-11-23 (semantic-web@w3.org from November 2018)

From: Lucie-Aimée Kaffee <kaffee@soton.ac.uk>
Date: Fri, 23 Nov 2018 13:18:46 +0000
To: frans.knibbe@geodan.nl
Cc: semantic-web@w3.org
Message-ID: <CAHdSuxHpsL_2pkF7yEyzaGXazY+bXmNQPbHa85nqJhC9mLZYVQ@mail.gmail.com>
Pronunciation is usually expressed in languages such as IPA. Lemon suggests
a solution for that, treating it as another language:

:color lemon:representation "ˈkʌl.ə(ɹ)"@en-fonipa
https://lemon-model.net/learn/5mins.php

Therefore, indicating language makes sense, everything else (as in further
linguistic information) might be expressed in statements about an entity.
Plugging Wikidata's linguistic model here, too:
https://www.wikidata.org/wiki/Wikidata:Lexicographical_data


On Fri, 23 Nov 2018 at 13:02, Frans Knibbe <frans.knibbe@geodan.nl> wrote:

> Using a general way to make statements about literals sounds good to me.
> For geographical data I also see too many statements being squashed into a
> single literal.  It is difficult to process and to store.
> Extensibilty could also be an issue. Why have a standard provision for
> indicating the language of a text string and not its pronunciation for
> example? How else can we tell the difference between the English nouns
> "shower" and "shower"?
>
> Regards,
> Frans
>
> Op vr 23 nov. 2018 om 13:07 schreef Hugh Glaser <hugh@glasers.org>:
>
>> Ah, good topic.
>>
>> So another thing I don't understand (:-)) is why we have to have language
>> tags on strings at all, and indeed datatypes.
>> (OK, it's because of XML heritage or something, I guess.)
>> But we have a perfectly good way of representing knowledge about things.
>> It is a real pain to create these 3 component literals and to query for
>> different languages and datatypes in SPARQL.
>> And worse still, if you want to query for strings that may or may not
>> have language tags on, you need to do some real messing about.
>> I often end up adding @en to all the strings, or removing region tags
>> etc., just so I can do things more easily, which is surely a Bad Thing.
>>
>> Surely languages and datatypes should simply be RDF properties of
>> Literals, which are 1 component things?
>> Much easier to explain to developers, and for them to use.
>> (If indeed they want to use raw RDF.)
>>
>> > On 23 Nov 2018, at 11:48, Andy Seaborne <andy@seaborne.org> wrote:
>> >
>> > The RDF 1.1 WG did spend some time of this - both on putting the
>> langtag into the lexical space and putting the lang tag into the datatype.
>> Both are not so easy; in the end the rdf@langString at least meant all
>> literals had a datatype.
>> >
>> > With the lexical form is a pair (string, lang) and squeezing that into
>> a single string, it gets a bit unintuitive when strlen("hello@en") is 8,
>> not 5. SeeAlso rdf:plainLiteral.
>> >
>> > For datatypes, language tags have their own structure and hierarchy
>> (lang-script-region-...) for their requirements which does not really fit
>> with datatype subtyping very well.
>> >
>> > I don't think changes would simplify.
>> >
>> > We have what we have and people have been explaining to the wider
>> community (i.e. it's not just people on this list affected). So
>> "technically better" isn't the criterion, it should be "unlocks potential
>> that is currently, provably blocked".
>> >
>> >    Andy
>> >
>> > On 23/11/2018 08:42, Wouter Beek wrote:
>> >> Dear David, others,
>> >> As another attempt at simplifying RDF, would it be possible to do away
>> >> with the special status of language-tagged strings?
>> >> In RDF 1.1 literals consist of 3 components: lexical form, datatype
>> >> IRI, and language tag.  The last component is only used in
>> >> language-tagged strings.  Would it be possible to define
>> >> `rdf:langString' as a regular datatype IRI and have literals consist
>> >> of 2 components instead?
>> >> RDF 1.1 Concepts and Abstract Syntax currently contains many caveats
>> >> to accommodate the idiosyncratic nature of language-tagged strings,
>> >> e.g.,:
>> >>> Language-tagged strings have the datatype IRI
>> http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. No datatype is
>> formally defined for this IRI because the definition of datatypes does not
>> accommodate language tags in the lexical space. The value space associated
>> with this datatype IRI is the set of all pairs of strings and language tags.
>> >> Would it be possible to define a regular lexical space, e.g.,
>> >> containing "hello@en"^^rdf:langString, together with a value-2-lexical
>> >> and a lexical-2-value mapping?
>> >> The N3 and SPARQL notation "hello"@en will of course still be
>> >> available, and will be syntactic sugar for "hello@en"^^rdf:langString..
>> >> ---
>> >> Best regards,
>> >> Wouter Beek.
>> >> Email: w.g.j.beek@vu.nl
>> >> WWW: https://wouterbeek.org
>> >> Tel: +31647674624
>> >
>>
>> --
>> Hugh
>> 023 8061 5652
>>
>>
>>

-- 
Lucie-Aimée Kaffee
Web and Internet Science Group
School of Electronics and Computer Science
University of Southampton
Received on Friday, 23 November 2018 16:21:36 UTC