- From: Jocelyn Fournier <jocelyn.fournier@gmail.com>
- Date: Tue, 20 May 2014 23:00:36 +0200
- To: Dan Brickley <danbri@google.com>, W3C Web Schemas Task Force <public-vocabs@w3.org>
Le 20/05/2014 18:16, Dan Brickley a écrit : > On 20 May 2014 11:28, Jarno van Driel <jarnovandriel@gmail.com> wrote: >> Martin, I don't know if I a completely agree about going to the product >> forum about this. I think I understand why you might say this, but in my >> thread about the working of WebPage (http://bit.ly/1jyFN0g), Jason Douglas >> said: >> >>> "That said, we probably do need a mechanism for indicating the "primary >>> entity" of a webpage when there is one. Current clients make up their own >>> heuristics for this, but I think it would be better to have an explicit way >>> of stating that." >> >> But this is not the main subject of this thread. Maybe a new thread to >> discuss the "primary entity" or continuation of the subject in the thread I >> already started is a better place. > > This is very much in scope for public-vocabs and for schema.org discussions. > > There are a few pieces to the puzzle, but the basic idea is simple. > Schema.org allows a rich descriptive graph to be embedded in a Web > page, which means we often have several entities mentioned; we'd like > to know which one is the main one, if any. > > Consider the second example in http://schema.org/MusicEvent to give us > a concrete focus. > > It describes a 'MusicEvent' (a concert), whose 'location' is a > 'Place'. The event lists multiple associated 'offers'; each 'Offer' > with price/date etc. info. The event also lists two 'performer's, each > a 'MusicGroup'. > > There is nothing *intrinsically* primary about the event, the > location, the offers or the musicians. This description is all the > richer because it mentions multiple entities. If I was forced to pick > one, I'd probably guess at the MusicEvent being the 'main' entity > here, because the others feel slightly more like background > information. But there's no need to leave this to guesswork. If this > markup was on the homepage of the venue, that publisher might well > consider the Place to be the main entity. And if it was on an artist's > homepage, they might want to mention the gig (perhaps alongside > others) but indicate that the MusicGroup was the main thing. > > The above sketches this in terms of embedded structured data, but we > can also think of this in terms of capturing a very common pattern in > Web content. Often Web pages _do_ have a focus on a single entity. If > we add a property like mainEntity, it would give sites a way to make > this focus explicit. > > PROPOSAL: > > 1. > We already have "about", "The subject matter of the content.", > relating a CreativeWork to a Thing. This is enough to do what we need, > if we add clarification and examples. > > I suggest the description should be updated to say: "A Thing that is > the primary subject matter of this CreativeWork". > > 2. > If we want a more SKOS-like, bibliographic and nuanced notion of > 'subject', I suggest we adopt something like Dublin Core's 'subject' > to do that work. > > (DC has "The topic of the resource."/ "Typically, the subject will be > represented using keywords, key phrases, or classification codes. > Recommended best practice is to use a controlled vocabulary.", from > http://purl.org/dc/terms/ ) > > The distinction: > > if we want to say "This document is about the entity Sweden, i.e. the > thing that is sameAs http://en.wikipedia.org/wiki/Sweden > http://www.freebase.com/m/0d0vqn), we would use > http://schema.org/about ... i.e. this tells us the main thing that > the page is about. > > but > > If we want to say, "This document's topic is “environmental impact of > the decline of tin mining in Sweden in the 20th century“, we'd be > going beyond "about" and would want a more bibliographic subject > description, e.g. using DDC or UDC subject classification codes, SKOS > etc. > > (fictional example, I know nothing about tin mining in Sweden) > > My proposal then is that we break out these two use cases, and target > the 'about' more explicitly on the 'main entity' use case. > > 3. Tweak http://schema.org/mentions > > We should note that http://schema.org/mentions is a very similar > notion to http://schema.org/about except that it allows multiple > different entities to be referenced. > > "Indicates that the CreativeWork contains a reference to, but is not > necessarily about a concept." > > I suggest rewording this in terms of entities/things, since we don't > use 'concept' elsewhere: > > "Indicates that the CreativeWork contains a reference to, but is not > necessarily about some particular thing." Hi Dan, With this description, it's not really easy to make the difference with the http://www.schema.org/citation property (not sure BTW I really understand the difference :)) Thanks, Jocelyn > > 4. http://schema.org/mainContentOfPage > > We already have this strange-looking property. It addresses a > different use case: > > it relates a WebPage to a part of that WebPage, > "Indicates if this web page element is the main subject of the page." > > The wording is awkward. It should be something like "Indicates the > main element within some Web page." since the expected type is > WebPageElement. > > I'm not convinced that the various types we have under WebPageElement > ("A web page element, like a table or an image") really work, but the > important point here is that they address a different scenario. A > WebPageElement is a piece of markup, like SiteNavigationElement, > Table, WPAdBlock, WPFooter, WPHeader, WPSideBar. This is a different > idea to the problem of finding the main *entity* that all this markup > is describing. > > HTML already a <main> element, see > https://developer.mozilla.org/en-US/docs/Web/HTML/Element/main > > "The HTML <main> element represents the main content of the <body> of > a document or application. The main content area consists of content > that is directly related to, or expands upon the central topic of a > document or the central functionality of an application. This content > should be unique to the document, excluding any content that is > repeated across a set of documents such as sidebars, navigation links, > copyright information, site logos, and search forms (unless, of > course, the document's main function is as a search form)." > > I believe most of the use cases for mainContentOfPage are better > addressed by <main>. > > However <main> does not help us pick out a single highlighted entity: > the main section of a Web page could still contain microdata/rdfa or > json-ld mentioning lots of different entities. > > It is useful sometimes to know that structured data markup comes from > footers or boilerplate rather than the <main> section of a page, and > it is probably worth including some examples of this on the schema.org > site. > > > 5. Avoiding ratholes > > If we can please discuss this without slipping into discussion of > http://www.w3.org/2001/tag/group/track/issues/14 I'd be happy. There > are places in schema.org usage where we tolerate an URL for a WebPage > being used in place of an URL that is more explictly for the > real-world entity itself. For example in http://schema.org/Person we > write "<a href="http://www.xyz.edu/students/alicejones.html" > itemprop="colleague">Alice Jones</a>". > > Clarifying the use of 'about' as above could help such pages clarify > which real world entity they are 'about'. This won't solve every issue > around entity disambiguation, but it will improve the basic support we > have within schema.org for stating such distinctions when we want to. > > (Sorry this was such a long mail...) > > Finally, let's also try not to get stuck on syntax issues at this > stage. We'll have to find the best patterns in Microdata/RDFa and > JSON-LD that we can for this, and it may sometimes be tricky. Here's > an attempt at amending the MusicEvent example by adding a WebPage and > 'about' - https://gist.github.com/anonymous/cf7e24f6378b176aa010 . We > might want to discuss a reverse property that could be expressed on > the entity rather than the page, for example. > > cheers, > > Dan >
Received on Tuesday, 20 May 2014 21:01:07 UTC