Re: Labels separate from localnames (Was: Best Practice for Renaming OWL Vocabulary Elements from Martin Hepp on 2011-04-22 (public-lod@w3.org from April 2011)

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Fri, 22 Apr 2011 13:36:58 +0200
To: antoine.zimmermann@insa-lyon.fr
Cc: public-lod@w3.org
Message-Id: <09F95BB0-DEB2-4EC3-A3CC-661082424CB4@ebusiness-unibw.org>
See replies inline ;-)
> Sorry to say this, but I think you are making a mistake. To say that the rdfs:label has to look like a variable name because it is for Web developers sounds to me like you are saying that the javadoc of a method should look like a piece of code because it is addressed to programmers. I refuse to believe that Web developers understand better pseudo code than natural language.

I will finally give in to use English spacing and capitalization for rdfs:labels in GoodRelations, e.g. use

   "Business entity"@en for gr:BusinessEntity etc.

But I will keep the cardinality recommendation in the rdfs:label of properties, e.g.

    serial number (0..*) for gr:serialNumber

and the class type information in ontological individuals, as in

    By bank transfer in advance (payment method) for gr:ByBankTransferInAdvance

The latter should definitely not irritate human consumers, for it provides context; the former is to my judgment the best way of indicating cardinality recommendations in OWL, since the OWL cardinality constructs don't cover what is needed, yet I have to be able to tell modelers the intended cardinality. It is not nonsensical, as you state, as many users of GR have confirmed.

> Moreover, Web clients most of the time display raw data (in a nice way) extracted from databases. For instance, a Wikipedia article displays a nice readable title, which is exactly the raw data that is found in a column of a database. Of course, you can decide that you won't use rdfs:label for human readable text and reserve another property for that (eg, dc:title), but you cannot decide how others will use your data and they may have a preference for the rdfs:label. As a matter of fact, rdfs:label is commonly used for showing people a nice readable piece of text in natural language.

I was stressing that SW apps that aim at real people will have to use sophisticated methods for choosing the proper label for data elements anyway; using the raw rdfs:label will not work for non geeks in most of the cases. Most ordinary people cannot process data, just information.
> 
> Now, let's imagine I have a "product browser" which aggregates information about products found on the Web, leveraging the GoodRelations vocabulary and possibly other vocabularies. It may display the products in a table and have a column for "product type", which displays the class of the product. There are chances that the client will display the rdfs:label of the class as the "product type", which in the case of GoodRelations would look sibylline to a casual reader, with camel-toed text and nonsensical information about arity.

Nobody except for very specialized analysts will ever want to use a product browser that presents raw RDF data.

> 
> Moreover, with such practice, how can you provide labels in multiple languages? Paymentmethod is not even an English word!
The choice of labels for information consumers cannot be solved by the creator of the vocabulary, because that depends on the context (e.g. audience) in which the results will be displayed. 
This is independent from the question of translations. A good ontology makes good (context-independent, lasting, cross-cultural) choices regarding the categories of things. The linguistic representation of these categories in specific context is a completely different story.


>> But since this class is so frequently used, I want to change it to
>> simply gr:Location while remaining as much of backward compatibility
>> as possible; that is the background of the pattern I suggested.
> 
> Ouch! I'm afraid amateur Linked Data producers who are searching for terms in a SemWeb search engine will find gr:Location very appropriate for *any* location. As a consequence, it will be inferred that all locations recorded in geonames are selling something! The Semantic Web will break and bring in its downfall the World Wide Web and the Internet, then the end of the world...
> 

First, it does not hurt for him or her to use gr:Location for that purpose - there is no contradiction; any place or area in the universe can be said to be an instance of gr:Location.
Second, I cannot solve the problem of 
- amateur linked data producers in general and
- the unsatisfying state of search technology for ontologies and ontology elements.

The most important audience to cater for nowadays are Web developers who want to add RDFa to existing sites. Learn from Facebook and their findings re OGP.
>> Well, in my case that would mean I cannot change a)
>> gr:LocationOfSalesOrServiceProvisioning to gr:Location b)
>> gr:ProductOrServicesSomeInstancesPlaceholder to gr:SomeItems and c)
>> gr:ActualProductOrServiceInstance gr:Individual
> 
> Those names are horribly long but they have the merit of being little ambiguous, as opposed to gr:Individual. In FOAF, the names are very short, which certainly helps getting the vocabulary adopted but creates a considerable amount of misuses (foaf:img, foaf:mbox, ...).  Moreover, these long names are easier to discover in keyword-based search engines because there is more contextual information to properly index and relate the words in the name.
> 

I would put it differently: The initial long names were important for me to develop a clean conceptual model, because other terms would have been much less generic and much more industry-specific. The fact that you can use GoodRelations across industries (jobs, restaurants, transportation, cars, books, consulting, disposal, ...) is because I did not use the quick, context-bound words for conceptual elements.

But in the three modifications I am planning, I think the gain in brevity is much more relevant that the risk of wrong usage. Keep in mind that even long names do not prevent wrong usage.

Basically, I am evaluating only three changes (not yet confirmed with important stakeholders):

gr:ActualProductOrServiceInstance --> gr:Individual
gr:ProductOrServicesSomeInstancesPlaceholder --> gr:SomeItems
gr:LocationOfSalesOrServiceProvisioning --> gr:Location

The former two are always used as additional classes, so their IDs will always be in context:

foo:myHammer a <http://www.productontology.org/id/Hammer>, gr:Individual.
foo:someHammers a <http://www.productontology.org/id/Hammer>, gr:SomeItems.

Even I has to look up the GoodRelations Reference for the correct syntax from time to time, so there is a real need for improvement.

>> As said, I am considering to change the formatting from camel word to
>> non-camel style but keep the cardinality and class membership info
>> for developers. The issue of several languages is, in theory, a nice
>> feature, but extremely difficult to implement in six-sigma quality
>> due to the differences in connotations and semantic granularity of
>> natural languages. Having second-class translations would do more
>> harm than good, in my opinion. The only reliable translations I could
>> provide easily would be German, but that would really not increase
>> adoption significantly - most German Web developers speak English.
> 
> You do not need to make the translations yourself. Find fluent translators or expert linguists.

I do not know whether you have ever tried to get sufficiently precise translations for rather abstract ideas.
You would need to get at least two independent translations for each language and then evaluate the differences.

BTW, I am not saying there is no need for translations, but before the translations could be part of the official spec, they would have to be extremely reliable.

It's no problem if someone on the Web publishes an RDF graph of French labels for GoodRelations, even if it was not 100 % accurate.

Have a look at 30 years of terminology research (e.g. http://www.termnet.org/) or google for Eugen Wuester.

>> Snippets or Yahoo SearchMonkey will never see the vocabulary labels,
>> only the person configuring the generation of data.
> 
> Google Rich Snippets don't show the labels because it is specifically tuned for GoodRelations. But a generic tool which aggregates information from various sources using various vocabularies has to make a generic assumption on what to display. rdfs:label is what is often chosen by generic tools to be shown to people.

I doubt the interaction with RDF data on a Web scale will be a simple modification of the browser paradigm of HTML content. Pivot-style approaches IMO pointing to the right direction, but again, you will need a hard-coded or pretty intelligent additional layer in between the human and the data, and selecting the proper name for a piece of data will be among the challenges. A simple regex on the labels from the vocabulary will be the least obstacle of all.

I don't think that we as LOD / SW researchers do already know how to implement the larger vision, but it will for sure require a lot more sweat, more creativity, and more cross-discipline effort than many seem to assume.

Best

Martin
Received on Friday, 22 April 2011 11:37:24 UTC