- From: Jarno van Driel <jarnovandriel@gmail.com>
- Date: Mon, 11 Aug 2014 23:14:04 +0200
- To: "martin.hepp@ebusiness-unibw.org" <martin.hepp@ebusiness-unibw.org>
- Cc: Dan Brickley <danbri@google.com>, W3C Web Schemas Task Force <public-vocabs@w3.org>
- Message-ID: <CADK2AU0KM9b6utoXzXby27guCtZT_5npWUDBPRJ2Av_vzXxpjw@mail.gmail.com>
> > "[all of it]" +1000 2014-08-11 22:36 GMT+02:00 martin.hepp@ebusiness-unibw.org < martin.hepp@ebusiness-unibw.org>: > Dan: > Picking up an old thread with a new, related request: I think it would > make the lives of ALL Web developers a LOT easier if the sponsors of > schema.org could spend some effort to make sure that their parsers and > consuming components treat all variants of valid schema.org equally, i.e. > if you properly follow the spec, you should assume that the search engines > understand your information if they process the respective type of > information. My experience is that you need a lot of insider knowledge to > design schema.org markup in a way that maximizes the understanding by > search engines, e.g. in the case of > > - syntaxes (RDFa, Microdata, JSON-LD, ...) and > - variants and alternatives in schema.org. > > I know that this is difficult to implement at the level of four big > corporations with hundreds or thousands of software components. Still, it > would help to define schema.org-based test-cases that are used for > automated testing. I once started something similar for GoodRelations at > > http://www.heppnetz.de/rdfa4google/testcases.html > > But I think we need something like that for each major schema.org type in > all relevant syntaxes, and for the more complex types, we will need > variants (e.g. pricing for Offer). > > Again, I think this is crucial for lowering the entrance barrier for > adoption, because schema.org would be the official guideline for > developers. Currently, schema.org is only a starting point and you need a > lot of additional expertise and experience to apply it properly. > > > Best wishes / Mit freundlichen Grüßen > > Martin Hepp > > ------------------------------------------------------- > martin hepp > e-business & web science research group > universitaet der bundeswehr muenchen > > e-mail: martin.hepp@unibw.de > phone: +49-(0)89-6004-4217 > fax: +49-(0)89-6004-4620 > www: http://www.unibw.de/ebusiness/ (group) > http://www.heppnetz.de/ (personal) > skype: mfhepp > twitter: mfhepp > > Check out GoodRelations for E-Commerce on the Web of Linked Data! > ================================================================= > * Project Main Page: http://purl.org/goodrelations/ > > > > > On 20 May 2014, at 18:16, Dan Brickley <danbri@google.com> wrote: > > > On 20 May 2014 11:28, Jarno van Driel <jarnovandriel@gmail.com> wrote: > >> Martin, I don't know if I a completely agree about going to the product > >> forum about this. I think I understand why you might say this, but in my > >> thread about the working of WebPage (http://bit.ly/1jyFN0g), Jason > Douglas > >> said: > >> > >>> "That said, we probably do need a mechanism for indicating the "primary > >>> entity" of a webpage when there is one. Current clients make up their > own > >>> heuristics for this, but I think it would be better to have an > explicit way > >>> of stating that." > >> > >> But this is not the main subject of this thread. Maybe a new thread to > >> discuss the "primary entity" or continuation of the subject in the > thread I > >> already started is a better place. > > > > This is very much in scope for public-vocabs and for schema.org > discussions. > > > > There are a few pieces to the puzzle, but the basic idea is simple. > > Schema.org allows a rich descriptive graph to be embedded in a Web > > page, which means we often have several entities mentioned; we'd like > > to know which one is the main one, if any. > > > > Consider the second example in http://schema.org/MusicEvent to give us > > a concrete focus. > > > > It describes a 'MusicEvent' (a concert), whose 'location' is a > > 'Place'. The event lists multiple associated 'offers'; each 'Offer' > > with price/date etc. info. The event also lists two 'performer's, each > > a 'MusicGroup'. > > > > There is nothing *intrinsically* primary about the event, the > > location, the offers or the musicians. This description is all the > > richer because it mentions multiple entities. If I was forced to pick > > one, I'd probably guess at the MusicEvent being the 'main' entity > > here, because the others feel slightly more like background > > information. But there's no need to leave this to guesswork. If this > > markup was on the homepage of the venue, that publisher might well > > consider the Place to be the main entity. And if it was on an artist's > > homepage, they might want to mention the gig (perhaps alongside > > others) but indicate that the MusicGroup was the main thing. > > > > The above sketches this in terms of embedded structured data, but we > > can also think of this in terms of capturing a very common pattern in > > Web content. Often Web pages _do_ have a focus on a single entity. If > > we add a property like mainEntity, it would give sites a way to make > > this focus explicit. > > > > PROPOSAL: > > > > 1. > > We already have "about", "The subject matter of the content.", > > relating a CreativeWork to a Thing. This is enough to do what we need, > > if we add clarification and examples. > > > > I suggest the description should be updated to say: "A Thing that is > > the primary subject matter of this CreativeWork". > > > > 2. > > If we want a more SKOS-like, bibliographic and nuanced notion of > > 'subject', I suggest we adopt something like Dublin Core's 'subject' > > to do that work. > > > > (DC has "The topic of the resource."/ "Typically, the subject will be > > represented using keywords, key phrases, or classification codes. > > Recommended best practice is to use a controlled vocabulary.", from > > http://purl.org/dc/terms/ ) > > > > The distinction: > > > > if we want to say "This document is about the entity Sweden, i.e. the > > thing that is sameAs http://en.wikipedia.org/wiki/Sweden > > http://www.freebase.com/m/0d0vqn), we would use > > http://schema.org/about ... i.e. this tells us the main thing that > > the page is about. > > > > but > > > > If we want to say, "This document's topic is “environmental impact of > > the decline of tin mining in Sweden in the 20th century“, we'd be > > going beyond "about" and would want a more bibliographic subject > > description, e.g. using DDC or UDC subject classification codes, SKOS > > etc. > > > > (fictional example, I know nothing about tin mining in Sweden) > > > > My proposal then is that we break out these two use cases, and target > > the 'about' more explicitly on the 'main entity' use case. > > > > 3. Tweak http://schema.org/mentions > > > > We should note that http://schema.org/mentions is a very similar > > notion to http://schema.org/about except that it allows multiple > > different entities to be referenced. > > > > "Indicates that the CreativeWork contains a reference to, but is not > > necessarily about a concept." > > > > I suggest rewording this in terms of entities/things, since we don't > > use 'concept' elsewhere: > > > > "Indicates that the CreativeWork contains a reference to, but is not > > necessarily about some particular thing." > > > > 4. http://schema.org/mainContentOfPage > > > > We already have this strange-looking property. It addresses a > > different use case: > > > > it relates a WebPage to a part of that WebPage, > > "Indicates if this web page element is the main subject of the page." > > > > The wording is awkward. It should be something like "Indicates the > > main element within some Web page." since the expected type is > > WebPageElement. > > > > I'm not convinced that the various types we have under WebPageElement > > ("A web page element, like a table or an image") really work, but the > > important point here is that they address a different scenario. A > > WebPageElement is a piece of markup, like SiteNavigationElement, > > Table, WPAdBlock, WPFooter, WPHeader, WPSideBar. This is a different > > idea to the problem of finding the main *entity* that all this markup > > is describing. > > > > HTML already a <main> element, see > > https://developer.mozilla.org/en-US/docs/Web/HTML/Element/main > > > > "The HTML <main> element represents the main content of the <body> of > > a document or application. The main content area consists of content > > that is directly related to, or expands upon the central topic of a > > document or the central functionality of an application. This content > > should be unique to the document, excluding any content that is > > repeated across a set of documents such as sidebars, navigation links, > > copyright information, site logos, and search forms (unless, of > > course, the document's main function is as a search form)." > > > > I believe most of the use cases for mainContentOfPage are better > > addressed by <main>. > > > > However <main> does not help us pick out a single highlighted entity: > > the main section of a Web page could still contain microdata/rdfa or > > json-ld mentioning lots of different entities. > > > > It is useful sometimes to know that structured data markup comes from > > footers or boilerplate rather than the <main> section of a page, and > > it is probably worth including some examples of this on the schema.org > > site. > > > > > > 5. Avoiding ratholes > > > > If we can please discuss this without slipping into discussion of > > http://www.w3.org/2001/tag/group/track/issues/14 I'd be happy. There > > are places in schema.org usage where we tolerate an URL for a WebPage > > being used in place of an URL that is more explictly for the > > real-world entity itself. For example in http://schema.org/Person we > > write "<a href="http://www.xyz.edu/students/alicejones.html" > > itemprop="colleague">Alice Jones</a>". > > > > Clarifying the use of 'about' as above could help such pages clarify > > which real world entity they are 'about'. This won't solve every issue > > around entity disambiguation, but it will improve the basic support we > > have within schema.org for stating such distinctions when we want to. > > > > (Sorry this was such a long mail...) > > > > Finally, let's also try not to get stuck on syntax issues at this > > stage. We'll have to find the best patterns in Microdata/RDFa and > > JSON-LD that we can for this, and it may sometimes be tricky. Here's > > an attempt at amending the MusicEvent example by adding a WebPage and > > 'about' - https://gist.github.com/anonymous/cf7e24f6378b176aa010 . We > > might want to discuss a reverse property that could be expressed on > > the entity rather than the page, for example. > > > > cheers, > > > > Dan > > > > >
Received on Monday, 11 August 2014 21:14:32 UTC