W3C home > Mailing lists > Public > semantic-web@w3.org > November 2018

Re: "Language-tagged strings Re: Toward easier RDF: a proposal"

From: Hugh Glaser <hugh@glasers.org>
Date: Fri, 23 Nov 2018 13:16:02 +0000
Cc: semantic-web@w3.org
Message-Id: <8E962B02-9029-44E1-B19E-280524D957AE@glasers.org>
To: Frans Knibbe <frans.knibbe@geodan.nl>


> On 23 Nov 2018, at 12:57, Frans Knibbe <frans.knibbe@geodan.nl> wrote:
> 
> Using a general way to make statements about literals sounds good to me. For geographical data I also see too many statements being squashed into a single literal.  It is difficult to process and to store.
> Extensibilty could also be an issue. Why have a standard provision for indicating the language of a text string and not its pronunciation for example? How else can we tell the difference between the English nouns "shower" and "shower"?

"shower" and "shower" and not English nouns - they are strings, and both the same.
If you want the English nouns, you should be using URIs for the nouns, which possibly have that string attached.
Similarly, strings don't usually have pronunciations - things associated with strings do.
(My three ha'p'orth, others' mileage may vary.) 

> 
> Regards,
> Frans
> 
> Op vr 23 nov. 2018 om 13:07 schreef Hugh Glaser <hugh@glasers.org>:
> Ah, good topic.
> 
> So another thing I don't understand (:-)) is why we have to have language tags on strings at all, and indeed datatypes.
> (OK, it's because of XML heritage or something, I guess.)
> But we have a perfectly good way of representing knowledge about things.
> It is a real pain to create these 3 component literals and to query for different languages and datatypes in SPARQL.
> And worse still, if you want to query for strings that may or may not have language tags on, you need to do some real messing about.
> I often end up adding @en to all the strings, or removing region tags etc., just so I can do things more easily, which is surely a Bad Thing.
> 
> Surely languages and datatypes should simply be RDF properties of Literals, which are 1 component things?
> Much easier to explain to developers, and for them to use.
> (If indeed they want to use raw RDF.)
> 
> > On 23 Nov 2018, at 11:48, Andy Seaborne <andy@seaborne.org> wrote:
> > 
> > The RDF 1.1 WG did spend some time of this - both on putting the langtag into the lexical space and putting the lang tag into the datatype.  Both are not so easy; in the end the rdf@langString at least meant all literals had a datatype.
> > 
> > With the lexical form is a pair (string, lang) and squeezing that into a single string, it gets a bit unintuitive when strlen("hello@en") is 8, not 5. SeeAlso rdf:plainLiteral.
> > 
> > For datatypes, language tags have their own structure and hierarchy (lang-script-region-...) for their requirements which does not really fit with datatype subtyping very well.
> > 
> > I don't think changes would simplify.
> > 
> > We have what we have and people have been explaining to the wider community (i.e. it's not just people on this list affected). So "technically better" isn't the criterion, it should be "unlocks potential that is currently, provably blocked".
> > 
> >    Andy
> > 
> > On 23/11/2018 08:42, Wouter Beek wrote:
> >> Dear David, others,
> >> As another attempt at simplifying RDF, would it be possible to do away
> >> with the special status of language-tagged strings?
> >> In RDF 1.1 literals consist of 3 components: lexical form, datatype
> >> IRI, and language tag.  The last component is only used in
> >> language-tagged strings.  Would it be possible to define
> >> `rdf:langString' as a regular datatype IRI and have literals consist
> >> of 2 components instead?
> >> RDF 1.1 Concepts and Abstract Syntax currently contains many caveats
> >> to accommodate the idiosyncratic nature of language-tagged strings,
> >> e.g.,:
> >>> Language-tagged strings have the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. No datatype is formally defined for this IRI because the definition of datatypes does not accommodate language tags in the lexical space. The value space associated with this datatype IRI is the set of all pairs of strings and language tags.
> >> Would it be possible to define a regular lexical space, e.g.,
> >> containing "hello@en"^^rdf:langString, together with a value-2-lexical
> >> and a lexical-2-value mapping?
> >> The N3 and SPARQL notation "hello"@en will of course still be
> >> available, and will be syntactic sugar for "hello@en"^^rdf:langString.
> >> ---
> >> Best regards,
> >> Wouter Beek.
> >> Email: w.g.j.beek@vu.nl
> >> WWW: https://wouterbeek.org
> >> Tel: +31647674624
> > 
> 
> -- 
> Hugh
> 023 8061 5652
> 
> 

-- 
Hugh
023 8061 5652
Received on Friday, 23 November 2018 13:16:35 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:42:03 UTC