- From: Thad Guidry <thadguidry@gmail.com>
- Date: Fri, 25 Nov 2016 17:52:12 +0000
- To: Felix Sasaki <fsasaki@w3.org>, Richard Wallis <richard.wallis@dataliberate.com>
- Cc: Alexandre Bertails <bertails@apple.com>, Thomas Francart <thomas.francart@sparna.fr>, Dan Brickley <danbri@google.com>, "schema.org Mailing List" <public-schemaorg@w3.org>
- Message-ID: <CAChbWaPpCVW7kk4_mDTUfTWT2oU+yrCN3sQ2vWi83z5JNPhQ+w@mail.gmail.com>
Felix, Thanks for that. Now things make more sense for me and your use case. Your not trying to game the system, but want better linked data methods. Some of your use case is a bit premature for Schema.org for a few reasons. Schema.org is not a lexical database or plans to be. It is not designed or intended to construct valid lexical models, but can and will provide linking constructs to those kinds of databases for your Things and their Types and Properties. That's the goal that I think all of us here agree will be useful eventually. Publishers want solid lexical data that is available to link to since its a heroic global effort against the worlds languages in any one particular domain. Towards that goal, you might be interested in tracking and participating with the Wikidata for Wiktionary project. https://www.wikidata.org/wiki/Wikidata:Wiktionary There are 5 sections across the top [Overview, Development Plan, How to help, FAQ, Discussion] Here's the September 2016 status PDF: https://commons.wikimedia.org/wiki/File:Wikidata_for_Wiktionary_announcement.pdf This project fits exactly your use case and others that are having problems with linking Terms and Lexical information for eventual multilingual linked data using Schema.org Schema.org (myself and others) have plans to provide linking constructs (Schema.org Types and Properties or expanded Metadata that might need to be created later) in order to give Publishers the ability to provide lexical linked data information once the Wikidata for Wikitionary project begins implementation. We don't know what will really be needed (or if anything) until this project begins their implementation. Right now the Wikidata for Wiktionary project is in the development and planning stages, and a critical time for you to get involved and throughout next year. On Fri, Nov 25, 2016 at 2:23 AM Felix Sasaki <fsasaki@w3.org> wrote: > Dear Richard, > > thanks a lot for taking the time to describe your view in detail. > > You are mostly right in your interpretation of what I am trying to > achieve. The missing bit is that I am not differentiating between entities > and descriptions, but that I am defining a new unit which serves as a > connector between ontological information and linguistic information. I am > calling this unit a term. > > This notion of a term is not my invention. In the localization industry, > there has been the TBX format around for years. In my demo > http://fsasaki.github.io/stuff/tekom2016/ > you will see a TBX example which shows a term with > ID tid_db6_014D420D507ED411B1360060B03C6BFB. This term is linked (via its > position in the XML structure) both to a linguistic description, as well as > to other types of information (here related to localization workflows). You > will see that translations of linguistic expressions are not interrelated > directly, but are all connected via the same term entry. > > TBX is widely used in localization and technical documentation, in > relation to formats like XLIFF, DITA or DocBook. A few years ago a > community started with the aim to represent such terminology information on > the Web, using linked data principles. Their current outcome is the OntoLex > model > > https://www.w3.org/2016/05/ontolex/ > > People are providing also conversions from TBX to OntoLex, see > > http://tbx2rdf.lider-project.eu/converter/index.html > > helping terminology data base owners to provide a linked data view. > > You may argue now that the OntoLex way of ontology modeling is special > purpose and not relevant for a general vocabulary like Schema.org > <http://schema.org> . May counterargument would be that in my perception > Schema.org <http://schema.org> is use case and community driven, and that > the use case of cross lingual information access is getting more and more > attention. That may lead in the future to term specific Schema.org > <http://schema.org> sub models, like in the area of products and good > relations or library catalogues and the bib extension. Please see my search > engine mockup just as one possibility that could implement the use case of > cross lingual access. With the ability to identify a unit (which I am > calling a term) that bridges linguistic descriptions and general world > knowledge (represented via Schema.org <http://schema.org>), there is a > huge opportunity for many more implementations. > > I hope that this clarifies my intention. I am aware that different to good > relations or the bib extension, in the case of cross lingual access, there > is not yet a huge community asking for better support with regards to cross > lingual access in Schema.org <http://schema.org>. My hope is that this is > just a question of time until more people engage. > > Regards, > > Felix > > Am 24.11.2016 um 12:50 schrieb Richard Wallis < > richard.wallis@dataliberate.com>: > > Felix, > > What you describe is a fairly standard pattern in Schema.org > <http://schema.org> and the world of authoritative linked data sets. > > Several organisations/people may have their own understanding of a thing, > concept, person etc. plus related supporting information, comments, local > resources, etc. They also recognise that the other authoritive sources > exist and use sameAs links to share the fact that they are describing the > same entity as others are. > > They publish their own individual identifier for the Person, Place, > Product, etc. and then sameAs links to others’ descriptions of the same > Person, Place, Product, etc. > > It is up to the consumers of this data, mostly the search engines, to take > this data interpret it as they wish (potentially merging it into aggregated > descriptions of entities in their knowledge graphs) and use it as they feel > appropriate to satisfy their users’ needs. > > Where I see your proposal differing, if my understanding is correct, is > that you are trying to identify an individual description about a > particular entity (as against the entity itself) and then say it is the > sameAs another description. (by description I mean all attributes such as > reviews etc.) In this pattern you are saying that two different > descriptions are sameAs each other, which, as you say they are not direct > translations of each other, they are obviously not. What *is* the same > is entity being described. > > As an external observer, I see the data consumers looking for multiple > structured data descriptions of individual Things (entities) for them to > aggregate together in knowledge graphs and then use them to guide users to > appropriate views of those entities. > > By concentrating on the description identities, as against the entities > they describe, I believe you are providing confusion in the data patterns > and the results may well be unpredictable. I believe you are making > assumptions about the way the search engines use this data, and are trying > to push, or even game, them in to operating in a specific way. History > shows us that such initiatives usually only have short term positive > benefit. > > Although I have great sympathy with your objectives of helping users see > the appropriate description of a resource in an appropriate language, I > believe a message to implementers about ensuring the that text in their > descriptions is correctly language tagged would be as, or possibly even > more, effective. > > Forgive me if my understanding of what you are proposing is not correct. > > ~Richard. > > > > > Richard Wallis > Founder, Data Liberate > http://dataliberate.com > Linkedin: http://www.linkedin.com/in/richardwallis > Twitter: @rjw > > On 24 November 2016 at 08:08, Felix Sasaki <fsasaki@w3.org> wrote: > > Hi Thad, > > we may have a misunderstanding about the motivation. Let me try to explain > with a comparison. If I want to give my own ratings of a product, > Schema.org <http://schema.org/> allows me to that e.g. via > https://schema.org/Rating . And ratings then are taken up in search > results previous. > > I can give my own translations (with your example, the one from Richard or > the one from Alexandre in an earlier mail), but there is no guidance on how > how translations will or should be taken up. > > By „translations" I don’t mean general translations based on world > knowledge. That would be feasible, as you pointe out, with global lexical > data bases. > > If you look at the slides I linked to below, you will see an example. > Companies have their own multilingual terminologies. They own and govern > these terminologies and don’t want them to be a part of a global lexical > data base. Still they want to put them on the Web, to ease cross lingual > access to their data. > > It would not make sense to put a highly company specific name like „Easy > Graphics Framework“ and its Chinese translation „图形提供商“ into a general > lexical data base. Still the companies want to have their multilingual data > taken up in web search. See attached mockup of how that could look like. > > So this topic mostly about motivations and benefits. For ratings and their > uptake in web search this relation is clear. For company specific > multilingual data it is not yet clear to me. > > Best, > > Felix > > > > <PastedGraphic-1.tiff> > > > Am 23.11.2016 um 17:52 schrieb Thad Guidry <thadguidry@gmail.com>: > > Felix, > > Let the Web work for you. Do not try to "game" Languages for SEO or even > enrichment purposes. > Instead, to encourage enrichment, invest in improving translations > themselves at sites like Wiktionary, translate.Google.com > <http://translate.google.com/>, bing.com/translator , etc. > > To handle your use case and others, we already support translation of > "name" by simply using "sameAs". > > Use sameAs property > > (URL of a reference Web page that unambiguously indicates the item's > identity. E.g. the URL of the item's Wikipedia page, Freebase page, or > official website.) > > to point to a URL that includes more information about that name to reap > the benefits of a global translation community, instead of rolling your > own. (But if you want to roll your own, then you can use sameAs as well, > but there might be limited understanding from search engines, since there > is already investment against the major lexical databases and wikis out > there in the world from the likes of Google, Bing, Yahoo, and Yandex. They > already can handle most translations and have an understanding using those > lexical databases and wikis as well as their own. > > Example: > > { > "@context": "http://schema.org", > "@type": "ProductModel", > > "description": "Our extra long, elongated screwdriver allows turning > even in the tightest of confined previously unreachable spaces!", > "name": "Screwdriver", > "sameAs": "https://en.wikipedia.org/wiki/Screwdriver", > "sameAs": "https://en.wiktionary.org/wiki/screwdriver", > "image": "xyz_screwdriver-32in.jpg", > "brand": "XYZ", > "manufacturer":"XYZ" > } > > > > On Wed, Nov 23, 2016 at 3:43 AM Felix Sasaki <fsasaki@w3.org> wrote: > > I want to do what Alexandre described in his example here > https://lists.w3.org/Archives/Public/public-schemaorg/2016Mar/0055.html > in that thread, we discussed already usage of name properties or > translationOfWork. Name properties don’t allow to attach additional > information to the language specific name. But that additional information > is the reason why a terminology data base exists: to express name variants > within one language, to express that a name belongs to a certain (company > specific) terminology in a certain version, to connect the name to a topic > domain (e.g. screwdriver in manufacturing processes of company XYZ) etc. > > So to achieve this you need two separate things. But translationOfWork > seems to be tailored towards CreativeWorks, which seem to mean things like > books, films, pieces of music. If one subsumes terms (from terminology data > bases) as creative works, there are a lot of confusing properties. > > The whole reason for this exercise is to allow users to influence > cross-lingual search. Something like the mock up on slide 14 would be nice. > Search engines allow for cross lingual search, see slide 29; but a user > cannot influence that with Schema.org <http://schema.org/> markup. > > - Felix > > Am 22.11.2016 um 01:06 schrieb Richard Wallis < > richard.wallis@dataliberate.com>: > > Scanning your slides I am not clear (in the Schema.org > <http://schema.org/> markup) if you are describing two separate things > the contents of which are in different languages or a single thing with > names in different languages. > > The definition of inLanguage <http://schema.org/inLanguage> indicates “The > language of the content..” > > If it is the former, they are not the same thing and they probably should > be related with translationOfWork > <http://bib.schema.org/translationOfWork> and WorkTranslation > <http://bib.schema.org/workTranslation> not *sameAs.* > > If it is the latter, surely the use of two *name* properties, one in each > language, with language labels would suffice. > > ~Richard. > > Richard Wallis > Founder, Data Liberate > http://dataliberate.com > Linkedin: http://www.linkedin.com/in/richardwallis > Twitter: @rjw > > On 21 November 2016 at 14:27, Felix Sasaki <fsasaki@w3.org> wrote: > > Hello Alexandre and all, > > I had the pleasure to explore the topic of how to express translation of > terms further in a presentation at the Tekom / TCWorld conference. See the > announcement and slides (including an extended abstract at the end) here > > > http://conferences.tekom.de/conference/tcworld16/conference-program/conference-program/program/sv_1486_IN21/ > > http://conferences.tekom.de/fileadmin/tx_doccon/slides/1486_Summit_Meeting_Search_Meets_Terminology.pdf > > The presentation was well received and it seems that there is an interest > in using existing terminology assets to foster cross lingual search use > cases. It would be interesting to explore this further in the context of > Schema.org <http://schema.org/> > > Any comments on this topic & the presentation slides are very welcome. > > Kind regards, > > Felix > > Am 17.03.2016 um 15:35 schrieb Alexandre Bertails <bertails@apple.com>: > > Felix, > > We are currently trying to solve a very similar problem. My plan is to use > schema:sameAs for that. Applied to your example: > > { > "@id": "http://example.com/my-term-data-base-entry-1", > "@type": "schema:Term", > "schema:inLanguage": "en", > "schema:name": "screwdriver", > "schema:sameAs": { > "@id": "http://example.com/my-term-data-base-entry-2", > "schema:inLanguage": "de", > "schema:name": "schraubendreher" > } > } > > Conceptually, the 2 entities really denote the same thing. Granted, our > usage of schema:sameAs is not exactly what's described in > https://schema.org/sameAs but there are reasons why we prefer to stay > within the schema.org realm. And owl:sameAs would bring a lot of baggage > with it which we are not interested in. > > Also, I think schema:translation would be too specific. Personally, I > would be happy if the definition of schema:sameAs was less about web pages. > > Best, > Alexandre > > On Mar 17, 2016, at 6:22 AM, Felix Sasaki <fsasaki@w3.org> wrote: > > > Am 17.03.2016 um 13:56 schrieb Thomas Francart <thomas.francart@sparna.fr > >: > > I don't think the original question was about translating the terms of > schema.org itself (classes and properties); it was about the possibility > to describe terms/words, similar to what SKOS-XL proposes. > For me the original proposition makes sense, it would allow to state > things like "this term/word A is used for a large public", "that other > word/term B is used by the scientific community" "the words/terms A and B > are both used to refer to concept C", "word/term A is an acronym of > word/term B", "word/term D is slang, while word/term E is formal language", > etc. > > > Yes, that was the original question. A further comment below. > > > Thomas > > 2016-03-17 13:38 GMT+01:00 Dan Brickley <danbri@google.com>: > Yes, I tend to agree with Chaals & Richard here: for translated labels > of structured data vocabulary terms (schema.org's and others), we > should look towards the underlying W3C standards: RDF/S and perhaps > sometimes SKOS, SKOS-XL. It is usual to stick to a single URL for > types and properties rather than proliferate them by having different > URLs for each language. > > > > In my use case (see below) I need to differentiate uniquely (= via URIS) > between > > 1) terms in language X,Y,Z > 2) common = language agnostic concepts that they denote > 3) domains (= topics) that they belong too > > Richard wrote : > > [ > As to proposing a general purpose term definition / relationship structure > such as you describe, I can see the need for such a capability but wonder > if in most cases SKOS-like existing solutions would suffice for detailed > description. Whereas I would require some convincing as to the potential > take up in a broad general purpose vocabulary such as Schema.org > <http://schema.org/>. > ] > > The use case is a Japanese buyer of items who knows how something is > expressed in his language. He wants to be able to make a search for > スクリュードライバー > and say: give me pages about screwdrivers that express the concept of a > screwdriver in my domain and denotes the concept I want to buy (= take up > the information provided by 1,2,3 above). The buyer does not want to buy > screwdrivers in general, and he does not want to buy everything with the > label screwdriver in english; but he wants to be a specific screwdriver in > a given domain, e.g. automative manufacturing. The buyer also wants to take > variants of how terms are expressed into account, e.g. differences in > spelling, abbreviations etc. > > Such searches are quite common in search of multilingual terminology data > bases. In these data bases terms are uniquely identified first class > citizens. More and more companies put such data bases on the web but don’t > have a way yet to do that with structured HTML markup. So search for > multilingual terminology, taking 1,2,3 into account, is not yet possible on > the Web. > > - Felix > > > > Here is an example btw of RDFa+RDFS definitions that do this, from > > https://github.com/schemaorg/schemaorg/blob/sdo-deimos/data/l10n/zh-cn/schema_org_zhcn.html > > <div typeof="rdfs:Class" resource="http://schema.org/Audience"> > <span class="h" property="rdfs:label">Audience</span> > <span class="h" property="rdfs:label" xml:lang="zh-cn">听众</span> > <span property="rdfs:comment">Intended audience for an item, i.e. the > group for whom the item was created.</span> > <span property="rdfs:comment" xml:lang="zh-cn">听众,观众, 读者</span> > <span>Subclass of: <a property="rdfs:subClassOf" > href="http://schema.org/Intangible">Intangible</a></span> > </div> > > Does this approach do what you have in mind, Felix? > > Dan > > On 17 March 2016 at 10:56, Richard Wallis > <richard.wallis@dataliberate.com> wrote: > > Not sure I understand your definition of a term, but the ability to handle > names, or any other text based properties, of things in multiple languages > is already possible: > > { > > "@context": “http://schema.org/”, > > "@id": "http://example.com/my-term-data-base-entry-1", > > "@type": "schema:Thing", > > "schema:name": [ > > { > > "@language": "en", > > "@value": "screwdriver" > > }, > > { > > "@language": "de", > > "@value": "schraubendreher" > > } > > ] > > } > > > or in RDFa: > > > <div typeof="schema:Thing" > about="http://example.com/my-term-data-base-entry-1"> > <div property="schema:name" xml:lang="en" content="screwdriver"></div> > <div property="schema:name" xml:lang="de" > content="schraubendreher"></div> > </div> > > > ~Richard > > Richard Wallis > Founder, Data Liberate > http://dataliberate.com > Linkedin: http://www.linkedin.com/in/richardwallis > Twitter: @rjw > > On 17 March 2016 at 09:04, Felix Sasaki <fsasaki@w3.org> wrote: > > > Hi all, > > It seems that schema.org as of writing would not allow to express the > relation for terms „A is a translation from B“ or „A is an abbreviation > from > B“. It is already possible to express that A is translation of B, see > > http://bib.schema.org/translationOfWork > > but this is specific to works, not translated terms. Would the below make > sense? It is adapted from > https://schema.org/translator > > note: schema:Term and schema:translation do not exist in schema.org, I > made them up for the example. > > { > "@id": "http://example.com/my-term-data-base-entry-1", > "@type": "schema:Term", > "schema:inLanguage": "en", > "schema:name": "screwdriver", > "schema:translation": { > "@id": "http://example.com/my-term-data-base-entry-2", > "schema:inLanguage": "de", > "schema:name": "schraubendreher" > } > } > > - Felix > > > > > > > > -- > > Thomas Francart - SPARNA > Web de données | Architecture de l'information | Accès aux connaissances > blog : blog.sparna.fr, site : sparna.fr, linkedin : > fr.linkedin.com/in/thomasfrancart > tel : +33 (0)6.71.11.25.97, skype : francartthomas > > > > > > > > > >
Received on Friday, 25 November 2016 17:53:00 UTC