Re: "Language-tagged strings Re: Toward easier RDF: a proposal" from Hugh Glaser on 2018-11-25 (semantic-web@w3.org from November 2018)

From: Hugh Glaser <hugh@glasers.org>
Date: Sun, 25 Nov 2018 14:28:05 +0000
To: Andy Seaborne <andy@seaborne.org>
Cc: Semantic Web <semantic-web@w3.org>
Message-Id: <1E7F2BB6-1BDE-4F0D-AF74-D63279EBDD49@glasers.org>
Hi Andy,

> On 24 Nov 2018, at 17:42, Andy Seaborne <andy@seaborne.org> wrote:
> 
> 
> 
> On 24/11/2018 11:50, Hugh Glaser wrote:
>> Thank Andy.
>> I'm trying to get this straight in my head, so I may need to ask you to indulge me if/when I miss something.
>> Sorry to go on, but I see Literals being what I think is Badly Used quite a lot, and to me it actually negates the value of the Semantic Web/Linked Data, and is close to turning the way we use RDF into a Graph Database.
>>> As someone who works with a product that is used by users in different geographies, I can say that language tags matter.
>> Yes, language tags matter hugely - that's why I am concerned about them (and spend a lot of my time dealing with them).
>> This is the Web after all.
>> And by the way - I would never suggest that we change the format of the tags (I think) - we deeply need the goodness of the web stack that does all that Accept stuff to do the negotiation and everything for us.
> 
> Original context:
> >>>Surely languages and datatypes should simply be RDF properties of >>>Literals, which are 1 component things?
> 
>> But this is where I have difficulty:
>>> That would make all occurrences of "chat" @en.
>> Yes.
>> But they are - that is the RDF world, I think.
>> And they are also any other languages the RDF says they are.
>> "chat" is a literal - it isn't a word with any meaning
> 
> You might be able to argue it is a string of characters in a script.
> 
> 
> But for use in SKOS labels where does the language now come from? (you discount blank nodes below.)
> 
> "chat"^^xsd:string is a string of characters.
> 
> I think of language as a bit like units 23 lb != 23 kg. and neither aren't 23.
OK.
But we don't argue to create a complexity to those Literals to include those units.
> 
>> - if I want one of those, I should be using Opencyc or Wikidata or my own URI or whatever for that.
>> And of course the great thing about that is that you get all the free multi-lingual (including possible pronunciation) that such sites give you - that is the value of Linked Data/Semantic Web. You may not even have/want to state the Literal yourself - just use the URI and get any Literals in any languages you want if it has them.
> 
> "chat"@en and "chat"@fr are different.
> 
> "chat" rdf:lang "en" .
> "chat" rdf:lang "fr" .
> 
> makes every use of "chat" both @en and @fr.
> 
> Or did you mean a different way to use properties? Example?
Your example is fine, thanks.
Yes, I mean exactly that.
I really don't have a problem with every instance of "chat"^^xsd:string being both en and fr if someone has asserted that using rdf:lang.
> 
> I don't want my SKOS labels to suddenly all become both @en and @fr at the same time.
I can see that.
But what you are doing is putting a serious amount of knowledge into what is after all just a Literal.
Knowledge should be in the RDF graph, not embedded in Literals.
Then I can manipulate and process it using my normal RDF tools, not have to start messing about with string matching etc. 
> 
> >> I often end up adding @en to all the strings, or removing region tags >> etc., just so I can do things more easily, which is surely a Bad
> >> Thing.
> 
> I don't think it is bad.
> 
> Publishing data does not mean the data is in the ideal form for any application - that would be end-to-end design.  In publishing, the use of the data isn't known - the more detail the better.  (William makes a similar point about processing to get the data in the way the app wants.)
> 
>> My rule of thumb is that if the predicate isn't rdfs:label or similar, then my RDF is probably wrong, because it is attaching too much meaning to a literal.
>> (That was part of my point about "shower".)
>> I consider even foaf:{xxxx}name "naughty" in this respect - I think that it should probably have the concept of Name with labels attached; then you could have things like the same name rendered in different scripts easily, or say that "Amadeus", "Gottlieb" and "Theophilus" are all different literals of the same name, as Mozart thought, and you can attach different language tags if you want.
>> But I get that people wanted one less link for FOAF, of course.
>> So I don't think you are writing good RDF if you say
>>> And I live near "cymru"@cy.
>> How can you live near a Literal?!
> Not the point I was making.
Ah, sorry.
> 
> I live near a region that where labels need to reflect @cy and @en.
> 
> ... skos:prefLabel "Cymru"@cy ;
>    skos:prefLabel "Wales"@en .

> 
> Loosing that distinction is a loss.
I can't really see that.
Instead we would have (something like):

...skos:prefLabel "Cymru"@cy ;
skos:prefLabel "Wales"@en .
"Cymru" rdf:lang lang:cy .
"Wales" rdf:lang lang:en .

Is there something really important for the way things are used that changes from one to the other?
I can see that they may say different things, but the way in which they get queried seems to me to have the same functions.
And actually I am more comfortable with the meaning of my RDF than what I think is yours - it sort of feels like Closed world v. Open World.

And there is a nice gain for usage and developers, which is what these threads are about:
I (and the 33% we keep talking about) don't need to learn all about FILTER langMatches( lang(?title), "FR" ) etc., but can just do things the same way as everything else we do.

> 
>> You should say you live near a location that has labels (or whatever), one of which is in @cy and is "cymru"
>> And no, I don't want a Blank Node for it - the system that generates this should create a URI if it doesn't already have one ;-)
> 
> You want to use RDF properties but on what?
> RDF has global statements so you have to use properties on something that is a usage of the string because just making a property of the string itself, applies throughout the graph.
> 
> <uri> skos:prefLabel :label1234 .
> :label1234 rdf:value "Cymru" .
> :label1234 rdf:lang "cy" .
> 
> <uri> skos:prefLabel :label1235 .
> :label1235 rdf:value "Wales" .
> :label1235 rdf:lang "en" .
I agree that this wouldn't be a great way to do things.
But I don't think that is usually the best way to do it - the way I put it above is what I think.
I probably don't want the :lable1234 URI for "Cymru"^^xsd:string in "cy" - although I can have one if I think it is useful.

As I have said, I am very happy that an assertion that
	"chat" rdf:lang lang:en .
is global.
It's "true", after all, just like
	"chat" rdf:lang lang:fr .

Of course, I can currently do all this using untagged Literals without anyone's permission!
But because of the language tags in Literals, there is no tooling etc that supports it, and the data I get uses the tags.
I would like to see RDF move away from that.

Basically I think language tags are trying to avoid having to say in RDF what should be in the RDF.

Hugh

> 
> 	Andy
> 
> Of course, given the internet is device for publishing cat pictures, maybe chatbots really are catbots.
> 
>> Best
>>> On 23 Nov 2018, at 14:03, Andy Seaborne <andy@seaborne.org> wrote:
>>> 
>>> 
>>> 
>>> On 23/11/2018 12:03, Hugh Glaser wrote:
>>>> Ah, good topic.
>>>> So another thing I don't understand (:-)) is why we have to have language tags on strings at all, and indeed datatypes.
>>> 
>>> As someone who works with a product that is used by users in different geographies, I can say that language tags matter.
>>> 
>>> And I live near "cymru"@cy.
>>> 
>>>> (OK, it's because of XML heritage or something, I guess.)
>>>> But we have a perfectly good way of representing knowledge about things.
>>>> It is a real pain to create these 3 component literals and to query for different languages and datatypes in SPARQL.
>>>> And worse still, if you want to query for strings that may or may not have language tags on, you need to do some real messing about.
>>> 
>>> STR(?var) in SPARQL.
>>> 
>>> xsd:string("abc"@en) if you are lucky.
>>> 
>>>> I often end up adding @en to all the strings, or removing region tags etc., just so I can do things more easily, which is surely a Bad Thing.
>>>> Surely languages and datatypes should simply be RDF properties of Literals, which are 1 component things?
>>>> Much easier to explain to developers, and for them to use.
>>>> (If indeed they want to use raw RDF.)
>>> 
>>> As in:
>>> 
>>>  "chat" rdf:lang "en" .
>>> 
>>> ?
>>> 
>>> That would make all occurrences of "chat" @en.
>>> 
>>> They really are different literals.
>>> 
>>>    Andy
>>> 
>>> 
>>>>> On 23 Nov 2018, at 11:48, Andy Seaborne <andy@seaborne.org> wrote:
>>>>> 
>>>>> The RDF 1.1 WG did spend some time of this - both on putting the langtag into the lexical space and putting the lang tag into the datatype.  Both are not so easy; in the end the rdf@langString at least meant all literals had a datatype.
>>>>> 
>>>>> With the lexical form is a pair (string, lang) and squeezing that into a single string, it gets a bit unintuitive when strlen("hello@en") is 8, not 5. SeeAlso rdf:plainLiteral.
>>>>> 
>>>>> For datatypes, language tags have their own structure and hierarchy (lang-script-region-...) for their requirements which does not really fit with datatype subtyping very well.
>>>>> 
>>>>> I don't think changes would simplify.
>>>>> 
>>>>> We have what we have and people have been explaining to the wider community (i.e. it's not just people on this list affected). So "technically better" isn't the criterion, it should be "unlocks potential that is currently, provably blocked".
>>>>> 
>>>>>    Andy
>>>>> 
>>>>> On 23/11/2018 08:42, Wouter Beek wrote:
>>>>>> Dear David, others,
>>>>>> As another attempt at simplifying RDF, would it be possible to do away
>>>>>> with the special status of language-tagged strings?
>>>>>> In RDF 1.1 literals consist of 3 components: lexical form, datatype
>>>>>> IRI, and language tag.  The last component is only used in
>>>>>> language-tagged strings.  Would it be possible to define
>>>>>> `rdf:langString' as a regular datatype IRI and have literals consist
>>>>>> of 2 components instead?
>>>>>> RDF 1.1 Concepts and Abstract Syntax currently contains many caveats
>>>>>> to accommodate the idiosyncratic nature of language-tagged strings,
>>>>>> e.g.,:
>>>>>>> Language-tagged strings have the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. No datatype is formally defined for this IRI because the definition of datatypes does not accommodate language tags in the lexical space. The value space associated with this datatype IRI is the set of all pairs of strings and language tags.
>>>>>> Would it be possible to define a regular lexical space, e.g.,
>>>>>> containing "hello@en"^^rdf:langString, together with a value-2-lexical
>>>>>> and a lexical-2-value mapping?
>>>>>> The N3 and SPARQL notation "hello"@en will of course still be
>>>>>> available, and will be syntactic sugar for "hello@en"^^rdf:langString.
>>>>>> ---
>>>>>> Best regards,
>>>>>> Wouter Beek.
>>>>>> Email: w.g.j.beek@vu.nl
>>>>>> WWW: https://wouterbeek.org
>>>>>> Tel: +31647674624
>>>>> 
>>> 
> 

-- 
Hugh
023 8061 5652
Received on Sunday, 25 November 2018 14:28:35 UTC