Re: Use of "has" or "is" in DPV's properties from Harshvardhan J. Pandit on 2022-03-31 (public-dpvcg@w3.org from March 2022)

From: Harshvardhan J. Pandit <me@harshp.com>
Date: Thu, 31 Mar 2022 16:38:28 +0100
To: Pat McBennett <patm@inrupt.com>
Cc: Beatriz Esteves <besteves@delicias.dia.fi.upm.es>, public-dpvcg@w3.org
Message-ID: <38c29b94-2941-e3e8-55f6-c1adb39b5e49@harshp.com>
Hi. Replies are selected parts are inline. For Pat's full email, see 
https://lists.w3.org/Archives/Public/public-dpvcg/2022Mar/0027.html

On 31/03/2022 15:40, Pat McBennett wrote:
> 
> */>>> @Pat - would you be willing to do this? /*
> Yeah, absolutely. By "/the semantic web mailing list/", I assume you 
> mean this: https://lists.w3.org/Archives/Public/semantic-web/ 
> <https://lists.w3.org/Archives/Public/semantic-web/> ?

Yes.

> 
> About Rob's concern (i.e., ".../that some languages do not have the 
> upper/lower case characters we use in English/Western languages/"), can 
> anyone provide a couple of simple examples, as I'm not sure I understand 
> (at least not in the context of DPV, where the lingua franca (for term 
> names) has already been agreed to be English (and we are talking here 
> about the names of vocab terms, right? - since as Harsh says, the values 
> for `rdfs:label` or `rdfs:comment` or whatever predicates 
> /associated/ with these vocab terms can provide whatever values they 
> want in whatever local, non-English/Western languages they want, 
> right?)). Anyway, I'm sure a couple of simple examples might help 
> highlight what I'm probably missing here...

AFAIK - Languages that don't have capital letters: Sanskrit or Tamil 
language families, Japanese, Mandarin, Korean, Arabic (in general), 
Hebrew, Semetic language families.
> 
> On Harsh's point (i.e., "/...one would have to 'create' the label to 
> distinguish between a label for Class and a Property with the same name 
> i.e. class would be 'Concept' and property would have to be 'has 
> Concept'/"). I agree here, although I would propose having the labels 
> (as in `rdfs:label`, right?), in this example, as "Concept" for the 
> property and "Class of Concept" for the Class.

Yes, label as in any annotation adding a name, e.g. rdfs:label or 
skos:prefLabel or dct:title or foaf:name. Though your example is not 
correct for what I was saying. The label for a class representing a 
concept would be 'concept' and not 'class of concept', and that for the 
property would be 'has concept' in cases where there are no capital 
letters and there is a need to distinguish between labels of a class and 
a concept.

For IRIs, by convention, we use English (or rather the Latin or Roman 
script) and HTTP both of which support capitals - so this issue doesn't 
apply.

> 
> */>>> However, should there be consistency between multi-lingual labels/*
> I don't follow what you mean here. For me, I have no trouble simply 
> translating any labels as appropriate, which could be very independent 
> and different (and therefore may appear 'inconsistent' perhaps), but 
> that's fine (which is why I think I may not be following what you mean), 
> e.g.:
> 
>      ex.Concept a rdfs:Class ;
>        rdfs:label "Class of Concept"@en ;
>        rdfs:label "Clase de Concepto"@es ;
>        rdfs:label "All the yokes (in Dublin English, everyting's a 
> 'yoke' (and 'th' is pronounced 't'!))"@en-dublin .
> 
> So can you expand a little on what you mean by "/*consistency* between 
> multi-lingual labels/"...?

Try it in a language which doesn't support capitals. Label of a class is 
"Concept" (we write Class of Concept only if we're building an 
ontological representation, otherwise just the title), and that of its 
associated property is "concept" (differentiated using Capitals). In 
languages without this distinction, the following are what Class and 
Property labels look like:

Hindi: अवधारणा, अवधारणा
Japanese: 概念, 概念

So if such languages wish to distinguish between class and property 
labels, they must add a prefix or suffix which won't be present in 
languages that differentiate using Capitals. For example, in Japenese, 
the equivalent of "uses concept" as a different label than "Concept" is 
"コンセプトを使用する" (as per machine translation), but the English 
label would still be just "concept" unless everyone applies "uses 
concept" as the label text in translation. This is what I mean by having 
consistency across languages for a label.

> 
> *>> DCAT by way of example.*
> Yeah, I love DCAT as an example vocab. But in fact on close inspection, 
> there appear to be quite a number of inconsistencies and issues with 
> some of their terms names, and their choices for `rdfs:label` values.
> 
> So the major change I'd suggest making to DCAT (relevant to this 
> discussion anyway) is my point above about providing 'better' (i.e., 
> more useful, helpful, and unambiguous) labels for their Classes and 
> Properties, for instance:
> 
>    dcat:Catalog a rdfs:Class ;
>      rdfs:label "Class of Catalogs"@en .
> 
>    dcat:catalog a rdf:Property ;
>      rdfs:label "Catalog"@en .

Depends on what you're modelling. If you're representing an ontology 
that models just classes, this will look fine. But if you're using them 
to model data, this is not a good representation. For example, if I want 
to model that you're using data for Marketing, I'm not going to label 
that concept "Class of Marketing", but just "Marketing". Similarly, in 
DCAT, the label is "Catalog" rather than "Class of Catalog". This 
reflects how these concepts are used in the real-world in terms of labels.

> 
> 
> At first I thought that the label values for both the Class 
> `dcat:Catalog` and the Property `dcat:catalog` where both "Catalog". But 
> in fact, the English label for `dcat:Catalog` is `Catalog` (both capital 
> 'C'), and the English label for the `dcat:catalog` property is `catalog` 
> (both lowercase 'c'))
> 
> So Harsh, on your points:
>    "[DCAT] either have (i) exact same label for classes and concepts;"
>    Well, no, not the '/exact same/' labels at all (e.g., even in the 
> case of "Catalog" and "catalog").
> 
>   "or (ii) do not have the same language labels across classes and 
> properties. "

>    I don't follow what you mean here - they consistently '/do not/ have 
> the same language labels across classes and properties', i.e., they 
> differ by just the case of the first letter (in the 2 cases of 
> 'dcat:Catalog' and 'dcat:catalog', and 'dcat:Distribution' and 
> 'dcat:distribution'), or they differ more broadly in words (in the case 
> of 'CatalogRecord' and 'record').

In the context of your proposal, the issue was the prefix before 
property names. In DCAT, there is no such prefix, the IRI and label for 
class and property is the same i.e. 'catalog' - just differentiated by 
capitals. Now if you look in the DCAT file for labels in languages that 
don't have capitals, you will find the EXACT same label for both capital 
and property, or no label for that language. Here is an example:

dcat:Catalog has the following language labels which dcat:catalog does 
not have - Arabic, Japanese (don't support capitals)

dcat:Dataset has the following language labels which dcat:dataset also 
has, where they are EXACTLY SAME, and the language does not have 
capitals: "قائمة بيانات"@ar , "データセット"@ja

> 
> But perhaps all these DCAT issues are actually being resolved in the v3 
> (proposed) you mentioned. I see the HTML of the v3 spec (here 
> <https://www.w3.org/TR/vocab-dcat-3/>) - but how do I see the Turtle for 
> this new version (since the namespace IRI is still 
> http://www.w3.org/ns/dcat# <http://www.w3.org/ns/dcat#>, which right now 
> only gives me back the v2 Turtle, right?!)

https://github.com/w3c/dxwg/blob/e89e7a5f313cc30b7c4504c1ad9bbadd01e88609/dcat/rdf/dcat3.ttl

Same labels as v2.

Now afer all this, lets also consider why we have labels, and how do we 
use them.

1) Documentation e.g. on a website for that concept
- here the information is almost always accompanied by the type of that 
concept e.g. "Concept", is a Class, has definition ...
- So the same label for a property or a class are not a problem because 
there is context that provides differentiation

2) Creating textual representation of a triple i.e. <subject label> 
<property label> <object label>
- Here an example could be of the form: "Catalog" "dataset" "Dataset" - 
which seems confusing.
- But properties are usually used with instances that have their own 
labels, so a better example is: "This Catalog" "dataset" "That Dataset" 
- which isn't how we speak, but better than before
- If the purpose is to generate such text, the prefixes are great, i.e. 
"This Catalog" "has dataset" "That Dataset"
- But if you're used to looking at this in JSON or whatever code form, 
its a structure, so the prefix isn't needed, i.e. { Catalog --dataset--> 
Dataset } seems comprehensible even if I haven't used any specific 
language here because we know implicitly what graph this is representing.

So this is how I think why the conventions ended up the way they did.

If there is evidence that one or the other is considered best practice 
by the community, then we have the option of adopting that best 
practice. Otherwise the additional effort of redrafting all properties, 
breaking compatibility without any benfit, and potentially someone 
suggesting the opposite in the future are reasons not to do this.

Regards,
-- 
---
Harshvardhan J. Pandit, Ph.D
Research Fellow
ADAPT Centre, Trinity College Dublin
https://harshp.com/
Received on Thursday, 31 March 2022 15:38:45 UTC