W3C home > Mailing lists > Public > public-schemaorg@w3.org > November 2016

Re: Question on expressing translations of terms

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 25 Nov 2016 09:23:41 +0100
Cc: Thad Guidry <thadguidry@gmail.com>, Alexandre Bertails <bertails@apple.com>, Thomas Francart <thomas.francart@sparna.fr>, Dan Brickley <danbri@google.com>, "schema.org Mailing List" <public-schemaorg@w3.org>
Message-Id: <9A6D3C42-1229-4A59-8AB1-2108B2695F59@w3.org>
To: Richard Wallis <richard.wallis@dataliberate.com>
Dear Richard,

thanks a lot for taking the time to describe your view in detail.

You are mostly right in your interpretation of what I am trying to achieve. The missing bit is that I am not differentiating between entities and descriptions, but that I am defining a new unit which serves as a connector between ontological information and linguistic information. I am calling this unit a term.

This notion of a term is not my invention. In the localization industry, there has been the TBX format around for years. In my demo
http://fsasaki.github.io/stuff/tekom2016/
you will see a TBX example which shows a term with ID tid_db6_014D420D507ED411B1360060B03C6BFB. This term is linked (via its position in the XML structure) both to a linguistic description, as well as to other types of information (here related to localization workflows). You will see that translations of linguistic expressions are not interrelated directly, but are all connected via the same term entry.

TBX is widely used in localization and technical documentation, in relation to formats like XLIFF, DITA or DocBook. A few years ago a community started with the aim to represent such terminology information on the Web, using linked data principles. Their current outcome is the OntoLex model

https://www.w3.org/2016/05/ontolex/ 

People are providing also conversions from TBX to OntoLex, see

http://tbx2rdf.lider-project.eu/converter/index.html

helping terminology data base owners to provide a linked data view. 

You may argue now that the OntoLex way of ontology modeling is special purpose and not relevant for a general vocabulary like Schema.org . May counterargument would be that in my perception Schema.org is use case and community driven, and that the use case of cross lingual information access is getting more and more attention. That may lead in the future to term specific Schema.org sub models, like in the area of products and good relations or library catalogues and the bib extension. Please see my search engine mockup just as one possibility that could implement the use case of cross lingual access. With the ability to identify a unit (which I am calling a term) that bridges linguistic descriptions and general world knowledge (represented via Schema.org), there is a huge opportunity for many more implementations.

I hope that this clarifies my intention. I am aware that different to good relations or the bib extension, in the case of cross lingual access, there is not yet a huge community asking for better support with regards to cross lingual access in Schema.org. My hope is that this is just a question of time until more people engage.

Regards,

Felix

> Am 24.11.2016 um 12:50 schrieb Richard Wallis <richard.wallis@dataliberate.com>:
> 
> Felix,
> 
> What you describe is a fairly standard pattern in Schema.org and the world of authoritative linked data sets.
> 
> Several organisations/people may have their own understanding of a thing, concept, person etc. plus related supporting information, comments, local resources, etc.  They also recognise that the other authoritive sources exist and use sameAs links to share the fact that they are describing the same entity as others are.
> 
> They publish their own individual identifier for the Person, Place, Product, etc. and then sameAs links to others’ descriptions of the same Person, Place, Product, etc.   
> 
> It is up to the consumers of this data, mostly the search engines, to take this data interpret it as they wish (potentially merging it into aggregated descriptions of entities in their knowledge graphs) and use it as they feel appropriate to satisfy their users’ needs.
> 
> Where I see your proposal differing, if my understanding is correct, is that you are trying to identify an individual description about a particular entity (as against the entity itself) and then say it is the sameAs another description. (by description I mean all attributes such as reviews etc.)  In this pattern you are saying that two different descriptions are sameAs each other, which, as you say they are not direct translations of each other, they are obviously not.  What is the same is entity being described.
> 
> As an external observer, I see the data consumers looking for multiple structured data descriptions of individual Things (entities) for them to aggregate together in knowledge graphs and then use them to guide users to appropriate views of those entities.   
> 
> By concentrating on the description identities, as against the entities they describe, I believe you are providing confusion in the data patterns and the results may well be unpredictable.  I believe you are making assumptions about the way the search engines use this data, and are trying to push, or even game, them in to operating in a specific way.  History shows us that such initiatives usually only have short term positive benefit.
> 
> Although I have great sympathy with your objectives of helping users see the appropriate description of a resource in an appropriate language, I believe a message to implementers about ensuring the that text in their descriptions is correctly language tagged would be as, or possibly even more, effective.
> 
> Forgive me if my understanding of what you are proposing is not correct.
> 
> ~Richard.
> 
>  
> 
> Richard Wallis
> Founder, Data Liberate
> http://dataliberate.com <http://dataliberate.com/>
> Linkedin: http://www.linkedin.com/in/richardwallis <http://www.linkedin.com/in/richardwallis>
> Twitter: @rjw
> 
> On 24 November 2016 at 08:08, Felix Sasaki <fsasaki@w3.org <mailto:fsasaki@w3.org>> wrote:
> Hi Thad,
> 
> we may have a misunderstanding about the motivation. Let me try to explain with a comparison. If I want to give my own ratings of a product, Schema.org <http://schema.org/> allows me to that e.g. via https://schema.org/Rating <https://schema.org/Rating> . And ratings then are taken up in search results previous.
> 
> I can give my own translations (with your example, the one from Richard or the one from Alexandre in an earlier mail), but there is no guidance on how how translations will or should be taken up. 
> 
> By „translations" I don’t mean general translations based on world knowledge. That would be feasible, as you pointe out, with global lexical data bases.
> 
> If you look at the slides I linked to below, you will see an example. Companies have their own multilingual terminologies. They own and govern these terminologies and don’t want them to be a part of a global lexical data base. Still they want to put them on the Web, to ease cross lingual access to their data.
> 
> It would not make sense to put a highly company specific name like „Easy Graphics Framework“ and its Chinese translation „图形提供商“ into a general lexical data base. Still the companies want to have their multilingual data taken up in web search. See attached mockup of how that could look like.
> 
> So this topic mostly about motivations and benefits. For ratings and their uptake in web search this relation is clear. For company specific multilingual data it is not yet clear to me.
> 
> Best,
> 
> Felix 
> 
> 
> 
> <PastedGraphic-1.tiff>
> 
>> Am 23.11.2016 um 17:52 schrieb Thad Guidry <thadguidry@gmail.com <mailto:thadguidry@gmail.com>>:
>> 
>> Felix,
>> 
>> Let the Web work for you.  Do not try to "game" Languages for SEO or even enrichment purposes.
>> Instead, to encourage enrichment, invest in improving translations themselves at sites like Wiktionary, translate.Google.com <http://translate.google.com/>, bing.com/translator <http://bing.com/translator> , etc. 
>> 
>> To handle your use case and others, we already support translation of "name" by simply using "sameAs".
>> 
>> Use sameAs property
>> 
>> (URL of a reference Web page that unambiguously indicates the item's identity. E.g. the URL of the item's Wikipedia page, Freebase page, or official website.)
>> 
>>  to point to a URL that includes more information about that name to reap the benefits of a global translation community, instead of rolling your own.  (But if you want to roll your own, then you can use sameAs as well, but there might be limited understanding from search engines, since there is already investment against the major lexical databases and wikis out there in the world from the likes of Google, Bing, Yahoo, and Yandex.  They already can handle most translations and have an understanding using those lexical databases and wikis as well as their own.
>> 
>> Example:
>> 
>> {
>>   "@context": "http://schema.org <http://schema.org/>",
>>   "@type": "ProductModel",
>>  
>>   "description": "Our extra long, elongated screwdriver allows turning even in the tightest of confined previously unreachable spaces!",
>>   "name": "Screwdriver",
>>   "sameAs": "https://en.wikipedia.org/wiki/Screwdriver <https://en.wikipedia.org/wiki/Screwdriver>",
>>   "sameAs": "https://en.wiktionary.org/wiki/screwdriver <https://en.wiktionary.org/wiki/screwdriver>",
>>   "image": "xyz_screwdriver-32in.jpg",
>>   "brand": "XYZ",
>>   "manufacturer":"XYZ"
>> }
>> 
>> 
>> 
>> On Wed, Nov 23, 2016 at 3:43 AM Felix Sasaki <fsasaki@w3.org <mailto:fsasaki@w3.org>> wrote:
>> I want to do what Alexandre described in his example here
>> https://lists.w3.org/Archives/Public/public-schemaorg/2016Mar/0055.html <https://lists.w3.org/Archives/Public/public-schemaorg/2016Mar/0055.html>
>> in that thread, we discussed already usage of name properties or translationOfWork. Name properties don’t allow to attach additional information to the language specific name. But that additional information is the reason why a terminology data base exists: to express name variants within one language, to express that a name belongs to a certain (company specific) terminology in a certain version, to connect the name to a topic domain (e.g. screwdriver in manufacturing processes of company XYZ) etc.
>> 
>> So to achieve this you need two separate things. But translationOfWork seems to be tailored towards CreativeWorks, which seem to mean things like books, films, pieces of music. If one subsumes terms (from terminology data bases) as creative works, there are a lot of confusing properties.
>> 
>> The whole reason for this exercise is to allow users to influence cross-lingual search. Something like the mock up on slide 14 would be nice. Search engines allow for cross lingual search, see slide 29; but a user cannot influence that with Schema.org <http://schema.org/> markup.
>> 
>> - Felix
>> 
>>> Am 22.11.2016 um 01:06 schrieb Richard Wallis <richard.wallis@dataliberate.com <mailto:richard.wallis@dataliberate.com>>:
>>> 
>>> Scanning your slides I am not clear (in the Schema.org <http://schema.org/> markup) if you are describing two separate things the contents of which are in different languages or a single thing with names in different languages.  
>>> 
>>> The definition of inLanguage <http://schema.org/inLanguage> indicates “The language of the content..”
>>> 
>>> If it is the former, they are not the same thing and they probably should be related with translationOfWork <http://bib.schema.org/translationOfWork> and WorkTranslation <http://bib.schema.org/workTranslation> not sameAs.
>>> 
>>> If it is the latter, surely the use of two name properties, one in each language, with language labels would suffice.
>>> 
>>> ~Richard.
>>> 
>>> Richard Wallis
>>> Founder, Data Liberate
>>> http://dataliberate.com <http://dataliberate.com/>
>>> Linkedin: http://www.linkedin.com/in/richardwallis <http://www.linkedin.com/in/richardwallis>
>>> Twitter: @rjw
>>> 
>>> On 21 November 2016 at 14:27, Felix Sasaki <fsasaki@w3.org <mailto:fsasaki@w3.org>> wrote:
>>> Hello Alexandre and all,
>>> 
>>> I had the pleasure to explore the topic of how to express translation of terms further in a presentation at the Tekom / TCWorld conference. See the announcement and slides (including an extended abstract at the end) here
>>> 
>>> http://conferences.tekom.de/conference/tcworld16/conference-program/conference-program/program/sv_1486_IN21/ <http://conferences.tekom.de/conference/tcworld16/conference-program/conference-program/program/sv_1486_IN21/>
>>> http://conferences.tekom.de/fileadmin/tx_doccon/slides/1486_Summit_Meeting_Search_Meets_Terminology.pdf <http://conferences.tekom.de/fileadmin/tx_doccon/slides/1486_Summit_Meeting_Search_Meets_Terminology.pdf>
>>> 
>>> The presentation was well received and it seems that there is an interest in using existing terminology assets to foster cross lingual search use cases. It would be interesting to explore this further in the context of Schema.org <http://schema.org/>
>>> 
>>> Any comments on this topic & the presentation slides are very welcome.
>>> 
>>> Kind regards,
>>> 
>>> Felix
>>> 
>>>> Am 17.03.2016 um 15:35 schrieb Alexandre Bertails <bertails@apple.com <mailto:bertails@apple.com>>:
>>>> 
>>>> Felix,
>>>> 
>>>> We are currently trying to solve a very similar problem. My plan is to use schema:sameAs for that. Applied to your example:
>>>> 
>>>> {
>>>>  "@id": "http://example.com/my-term-data-base-entry-1 <http://example.com/my-term-data-base-entry-1>",
>>>>  "@type": "schema:Term",
>>>>  "schema:inLanguage": "en",
>>>>  "schema:name": "screwdriver",
>>>>  "schema:sameAs": {
>>>>    "@id": "http://example.com/my-term-data-base-entry-2 <http://example.com/my-term-data-base-entry-2>",
>>>>    "schema:inLanguage": "de",
>>>>    "schema:name": "schraubendreher"
>>>>  }
>>>> }
>>>> 
>>>> Conceptually, the 2 entities really denote the same thing. Granted, our usage of schema:sameAs is not exactly what's described in https://schema.org/sameAs <https://schema.org/sameAs> but there are reasons why we prefer to stay within the schema.org <http://schema.org/> realm. And owl:sameAs would bring a lot of baggage with it which we are not interested in.
>>>> 
>>>> Also, I think schema:translation would be too specific. Personally, I would be happy if the definition of schema:sameAs was less about web pages.
>>>> 
>>>> Best,
>>>> Alexandre
>>>> 
>>>>> On Mar 17, 2016, at 6:22 AM, Felix Sasaki <fsasaki@w3.org <mailto:fsasaki@w3.org>> wrote:
>>>>> 
>>>>> 
>>>>>> Am 17.03.2016 um 13:56 schrieb Thomas Francart <thomas.francart@sparna.fr <mailto:thomas.francart@sparna.fr>>:
>>>>>> 
>>>>>> I don't think the original question was about translating the terms of schema.org <http://schema.org/> itself (classes and properties); it was about the possibility to describe terms/words, similar to what SKOS-XL proposes.
>>>>>> For me the original proposition makes sense, it would allow to state things like "this term/word A is used for a large public", "that other word/term B is used by the scientific community" "the words/terms A and B are both used to refer to concept C", "word/term A is an acronym of word/term B", "word/term D is slang, while word/term E is formal language", etc.
>>>>> 
>>>>> Yes, that was the original question. A further comment below.
>>>>> 
>>>>>> 
>>>>>> Thomas
>>>>>> 
>>>>>> 2016-03-17 13:38 GMT+01:00 Dan Brickley <danbri@google.com <mailto:danbri@google.com>>:
>>>>>> Yes, I tend to agree with Chaals & Richard here: for translated labels
>>>>>> of structured data vocabulary terms (schema.org <http://schema.org/>'s and others), we
>>>>>> should look towards the underlying W3C standards: RDF/S and perhaps
>>>>>> sometimes SKOS, SKOS-XL. It is usual to stick to a single URL for
>>>>>> types and properties rather than proliferate them by having different
>>>>>> URLs for each language.
>>>>> 
>>>>> 
>>>>> In my use case (see below) I need to differentiate uniquely (= via URIS) between
>>>>> 
>>>>> 1) terms in language X,Y,Z
>>>>> 2) common = language agnostic concepts that they denote
>>>>> 3) domains (= topics) that they belong too
>>>>> 
>>>>> Richard wrote : 
>>>>> 
>>>>> [
>>>>> As to proposing a general purpose term definition / relationship structure such as you describe, I can see the need for such a capability but wonder if in most cases SKOS-like existing solutions would suffice for detailed description.  Whereas I would require some convincing as to the potential take up in a broad general purpose vocabulary such as Schema.org <http://schema.org/>.
>>>>> ]
>>>>> 
>>>>> The use case is a Japanese buyer of items who knows how something is expressed in his language. He wants to be able to make a search for 
>>>>> スクリュードライバー
>>>>> and say: give me pages about screwdrivers that express the concept of a screwdriver in my domain and denotes the concept I want to buy (= take up the information provided by 1,2,3 above). The buyer does not want to buy screwdrivers in general, and he does not want to buy everything with the label screwdriver in english; but he wants to be a specific screwdriver in a given domain, e.g. automative manufacturing. The buyer also wants to take variants of how terms are expressed into account, e.g. differences in spelling, abbreviations etc. 
>>>>> 
>>>>> Such searches are quite common in search of multilingual terminology data bases. In these data bases terms are uniquely identified first class citizens. More and more companies put such data bases on the web but don’t have a way yet to do that with structured HTML markup. So search for multilingual terminology, taking 1,2,3 into account, is not yet possible on the Web.
>>>>> 
>>>>> - Felix
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Here is an example btw of RDFa+RDFS definitions that do this, from
>>>>>> https://github.com/schemaorg/schemaorg/blob/sdo-deimos/data/l10n/zh-cn/schema_org_zhcn.html <https://github.com/schemaorg/schemaorg/blob/sdo-deimos/data/l10n/zh-cn/schema_org_zhcn.html>
>>>>>> 
>>>>>> <div typeof="rdfs:Class" resource="http://schema.org/Audience <http://schema.org/Audience>">
>>>>>> <span class="h" property="rdfs:label">Audience</span>
>>>>>> <span class="h" property="rdfs:label" xml:lang="zh-cn">听众</span>
>>>>>> <span property="rdfs:comment">Intended audience for an item, i.e. the
>>>>>> group for whom the item was created.</span>
>>>>>> <span property="rdfs:comment" xml:lang="zh-cn">听众,观众, 读者</span>
>>>>>> <span>Subclass of: <a property="rdfs:subClassOf"
>>>>>> href="http://schema.org/Intangible <http://schema.org/Intangible>">Intangible</a></span>
>>>>>> </div>
>>>>>> 
>>>>>> Does this approach do what you have in mind, Felix?
>>>>>> 
>>>>>> Dan
>>>>>> 
>>>>>> On 17 March 2016 at 10:56, Richard Wallis
>>>>>> <richard.wallis@dataliberate.com <mailto:richard.wallis@dataliberate.com>> wrote:
>>>>>>> Not sure I understand your definition of a term, but the ability to handle
>>>>>>> names, or any other text based properties, of things in multiple languages
>>>>>>> is already possible:
>>>>>>> 
>>>>>>> {
>>>>>>> 
>>>>>>>  "@context": “http://schema.org/ <http://schema.org/>”,
>>>>>>> 
>>>>>>>  "@id": "http://example.com/my-term-data-base-entry-1 <http://example.com/my-term-data-base-entry-1>",
>>>>>>> 
>>>>>>>  "@type": "schema:Thing",
>>>>>>> 
>>>>>>>  "schema:name": [
>>>>>>> 
>>>>>>>    {
>>>>>>> 
>>>>>>>      "@language": "en",
>>>>>>> 
>>>>>>>      "@value": "screwdriver"
>>>>>>> 
>>>>>>>    },
>>>>>>> 
>>>>>>>    {
>>>>>>> 
>>>>>>>      "@language": "de",
>>>>>>> 
>>>>>>>      "@value": "schraubendreher"
>>>>>>> 
>>>>>>>    }
>>>>>>> 
>>>>>>>  ]
>>>>>>> 
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> or in RDFa:
>>>>>>> 
>>>>>>> 
>>>>>>> <div typeof="schema:Thing"
>>>>>>> about="http://example.com/my-term-data-base-entry-1 <http://example.com/my-term-data-base-entry-1>">
>>>>>>>    <div property="schema:name" xml:lang="en" content="screwdriver"></div>
>>>>>>>    <div property="schema:name" xml:lang="de"
>>>>>>> content="schraubendreher"></div>
>>>>>>>  </div>
>>>>>>> 
>>>>>>> 
>>>>>>> ~Richard
>>>>>>> 
>>>>>>> Richard Wallis
>>>>>>> Founder, Data Liberate
>>>>>>> http://dataliberate.com <http://dataliberate.com/>
>>>>>>> Linkedin: http://www.linkedin.com/in/richardwallis <http://www.linkedin.com/in/richardwallis>
>>>>>>> Twitter: @rjw
>>>>>>> 
>>>>>>> On 17 March 2016 at 09:04, Felix Sasaki <fsasaki@w3.org <mailto:fsasaki@w3.org>> wrote:
>>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> It seems that schema.org <http://schema.org/> as of writing would not allow to express the
>>>>>>>> relation for terms „A is a translation from B“ or „A is an abbreviation from
>>>>>>>> B“. It is already possible to express that A is translation of B, see
>>>>>>>> 
>>>>>>>> http://bib.schema.org/translationOfWork <http://bib.schema.org/translationOfWork>
>>>>>>>> 
>>>>>>>> but this is specific to works, not translated terms. Would the below make
>>>>>>>> sense? It is adapted from
>>>>>>>> https://schema.org/translator <https://schema.org/translator>
>>>>>>>> 
>>>>>>>> note: schema:Term and schema:translation do not exist in schema.org <http://schema.org/>, I
>>>>>>>> made them up for the example.
>>>>>>>> 
>>>>>>>> {
>>>>>>>>  "@id": "http://example.com/my-term-data-base-entry-1 <http://example.com/my-term-data-base-entry-1>",
>>>>>>>>  "@type": "schema:Term",
>>>>>>>>  "schema:inLanguage": "en",
>>>>>>>>  "schema:name": "screwdriver",
>>>>>>>>  "schema:translation": {
>>>>>>>>    "@id": "http://example.com/my-term-data-base-entry-2 <http://example.com/my-term-data-base-entry-2>",
>>>>>>>>    "schema:inLanguage": "de",
>>>>>>>>    "schema:name": "schraubendreher"
>>>>>>>>  }
>>>>>>>> }
>>>>>>>> 
>>>>>>>> - Felix
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> 
>>>>>> Thomas Francart - SPARNA
>>>>>> Web de données | Architecture de l'information | Accès aux connaissances
>>>>>> blog : blog.sparna.fr <http://blog.sparna.fr/>, site : sparna.fr <http://sparna.fr/>, linkedin : fr.linkedin.com/in/thomasfrancart <http://fr.linkedin.com/in/thomasfrancart>
>>>>>> tel :  +33 (0)6.71.11.25.97 <tel:%2B33%20%280%296.71.11.25.97>, skype : francartthomas
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
> 
> 
Received on Friday, 25 November 2016 08:24:10 UTC

This archive was generated by hypermail 2.3.1 : Friday, 25 November 2016 08:24:11 UTC