- From: Christian Chiarcos <christian.chiarcos@web.de>
- Date: Sat, 24 Nov 2018 21:16:59 +0100
- To: Andy Seaborne <andy@seaborne.org>
- Cc: Hugh Glaser <hugh@glasers.org>, SW-forum <semantic-web@w3.org>
- Message-ID: <CAC1YGdjKhy2i9bosPkv3SsqUurtQ7XNM0sLhDCs3yNWD2An-Kg@mail.gmail.com>
Am Sa., 24. Nov. 2018 um 18:42 Uhr schrieb Andy Seaborne <andy@seaborne.org >: > "chat"^^xsd:string is a string of characters. > > I think of language as a bit like units 23 lb != 23 kg. and neither > aren't 23. This is an oversimplification, because we don't have subtypes of kilogram. But we do have region and script codes that combine with language tags to form a complex language tag that is a specialization of the original language tag. It would be nice to recognize that "cat"@en and "cat"@en-US are the same thing, whereas "cat"@en-US and "cat"@en-NZL are not. And it would be nice to say that Resian (a specific variety of Slovene spoken in Italy) is something different than sl-IT (standard Slovene, happen to be written in Italy) -- BCP47 conflates these, but a (more easily extensible) URI-based solution (by reference to, say, https://glottolog.org/resource/languoid/id/resi1246) would support that. And it would be nice if we could interpret "cat" as an underspecification of either "cat"@en or "cat"@en-NZ and match them without having to explain RDF novices that a string is not a string if it has a language, and that New Zealand English is just as different from "generic" English as are Scots and Vietnamese from each other. Triple-based language (region, script, etc.) identification would be more appropriate, because we can link them with information about the language variety intended. > "chat"@en and "chat"@fr are different. > > "chat" rdf:lang "en" . > "chat" rdf:lang "fr" . > > makes every use of "chat" both @en and @fr. > I think the only way to avoid this would be if subject literals are be taken as a notational short-hand for a blank node that carries the literal as an rdf:value. (And, in a separate step, a problem-specific bnode skolemization routine could be provided to give it a proper URI.) >> I often end up adding @en to all the strings, or removing region > tags >> etc., just so I can do things more easily, which is surely a Bad > >> Thing. > > I don't think it is bad. > It is, because such an extra step is very hard to justify to newcomers. You explain to them that SPARQL is actually quite intuitive if you understand Turtle and SQL, but in the next second, you need to introduce an extra construct just to make them match a value on real-world data. You basically loose the next generation, because the very first thing they learn about SPARQL or RDF is that it is a nice concept with an idiosyncratic implementation -- and this is not the last idiosyncrasy they'll encounter. Best, Christian -- Prof. Dr. Christian Chiarcos Applied Computational Linguistics Johann Wolfgang Goethe Universität Frankfurt a. M. 60054 Frankfurt am Main, Germany office: Robert-Mayer-Str. 10, #401b mail: chiarcos@informatik.uni-frankfurt.de web: http://acoli.cs.uni-frankfurt.de tel: +49-(0)69-798-22463 fax: +49-(0)69-798-28931
Received on Saturday, 24 November 2018 20:17:32 UTC