- From: Jason Douglas <jasondouglas@google.com>
- Date: Tue, 20 May 2014 16:30:50 +0000
- To: Dan Brickley <danbri@google.com>, W3C Web Schemas Task Force <public-vocabs@w3.org>
- Message-ID: <CAEiKvUA0ovLPdE2GGv0+FfSHj8H4QFz1ApOKbk-RKu40FvXJHg@mail.gmail.com>
I think you buried the lede in the github link at the bottom. :) In terms of markup would it be reasonable to do this? <html vocab="http://schema.org"> <head> <link rel="about" href="#main_item"/> </head> <body> <div resource="main_item" typeof="MusicEvent"> ... </div> </body> </html> and rely on an implicit WebPage subject? On Tue May 20 2014 at 9:17:25 AM, Dan Brickley <danbri@google.com> wrote: > On 20 May 2014 11:28, Jarno van Driel <jarnovandriel@gmail.com> wrote: > > Martin, I don't know if I a completely agree about going to the product > > forum about this. I think I understand why you might say this, but in my > > thread about the working of WebPage (http://bit.ly/1jyFN0g), Jason > Douglas > > said: > > > >> "That said, we probably do need a mechanism for indicating the "primary > >> entity" of a webpage when there is one. Current clients make up their > own > >> heuristics for this, but I think it would be better to have an explicit > way > >> of stating that." > > > > But this is not the main subject of this thread. Maybe a new thread to > > discuss the "primary entity" or continuation of the subject in the > thread I > > already started is a better place. > > This is very much in scope for public-vocabs and for schema.orgdiscussions. > > There are a few pieces to the puzzle, but the basic idea is simple. > Schema.org allows a rich descriptive graph to be embedded in a Web > page, which means we often have several entities mentioned; we'd like > to know which one is the main one, if any. > > Consider the second example in http://schema.org/MusicEvent to give us > a concrete focus. > > It describes a 'MusicEvent' (a concert), whose 'location' is a > 'Place'. The event lists multiple associated 'offers'; each 'Offer' > with price/date etc. info. The event also lists two 'performer's, each > a 'MusicGroup'. > > There is nothing *intrinsically* primary about the event, the > location, the offers or the musicians. This description is all the > richer because it mentions multiple entities. If I was forced to pick > one, I'd probably guess at the MusicEvent being the 'main' entity > here, because the others feel slightly more like background > information. But there's no need to leave this to guesswork. If this > markup was on the homepage of the venue, that publisher might well > consider the Place to be the main entity. And if it was on an artist's > homepage, they might want to mention the gig (perhaps alongside > others) but indicate that the MusicGroup was the main thing. > > The above sketches this in terms of embedded structured data, but we > can also think of this in terms of capturing a very common pattern in > Web content. Often Web pages _do_ have a focus on a single entity. If > we add a property like mainEntity, it would give sites a way to make > this focus explicit. > > PROPOSAL: > > 1. > We already have "about", "The subject matter of the content.", > relating a CreativeWork to a Thing. This is enough to do what we need, > if we add clarification and examples. > > I suggest the description should be updated to say: "A Thing that is > the primary subject matter of this CreativeWork". > > 2. > If we want a more SKOS-like, bibliographic and nuanced notion of > 'subject', I suggest we adopt something like Dublin Core's 'subject' > to do that work. > > (DC has "The topic of the resource."/ "Typically, the subject will be > represented using keywords, key phrases, or classification codes. > Recommended best practice is to use a controlled vocabulary.", from > http://purl.org/dc/terms/ ) > > The distinction: > > if we want to say "This document is about the entity Sweden, i.e. the > thing that is sameAs http://en.wikipedia.org/wiki/Sweden > http://www.freebase.com/m/0d0vqn), we would use > http://schema.org/about ... i.e. this tells us the main thing that > the page is about. > > but > > If we want to say, "This document's topic is “environmental impact of > the decline of tin mining in Sweden in the 20th century“, we'd be > going beyond "about" and would want a more bibliographic subject > description, e.g. using DDC or UDC subject classification codes, SKOS > etc. > > (fictional example, I know nothing about tin mining in Sweden) > > My proposal then is that we break out these two use cases, and target > the 'about' more explicitly on the 'main entity' use case. > > 3. Tweak http://schema.org/mentions > > We should note that http://schema.org/mentions is a very similar > notion to http://schema.org/about except that it allows multiple > different entities to be referenced. > > "Indicates that the CreativeWork contains a reference to, but is not > necessarily about a concept." > > I suggest rewording this in terms of entities/things, since we don't > use 'concept' elsewhere: > > "Indicates that the CreativeWork contains a reference to, but is not > necessarily about some particular thing." > > 4. http://schema.org/mainContentOfPage > > We already have this strange-looking property. It addresses a > different use case: > > it relates a WebPage to a part of that WebPage, > "Indicates if this web page element is the main subject of the page." > > The wording is awkward. It should be something like "Indicates the > main element within some Web page." since the expected type is > WebPageElement. > > I'm not convinced that the various types we have under WebPageElement > ("A web page element, like a table or an image") really work, but the > important point here is that they address a different scenario. A > WebPageElement is a piece of markup, like SiteNavigationElement, > Table, WPAdBlock, WPFooter, WPHeader, WPSideBar. This is a different > idea to the problem of finding the main *entity* that all this markup > is describing. > > HTML already a <main> element, see > https://developer.mozilla.org/en-US/docs/Web/HTML/Element/main > > "The HTML <main> element represents the main content of the <body> of > a document or application. The main content area consists of content > that is directly related to, or expands upon the central topic of a > document or the central functionality of an application. This content > should be unique to the document, excluding any content that is > repeated across a set of documents such as sidebars, navigation links, > copyright information, site logos, and search forms (unless, of > course, the document's main function is as a search form)." > > I believe most of the use cases for mainContentOfPage are better > addressed by <main>. > > However <main> does not help us pick out a single highlighted entity: > the main section of a Web page could still contain microdata/rdfa or > json-ld mentioning lots of different entities. > > It is useful sometimes to know that structured data markup comes from > footers or boilerplate rather than the <main> section of a page, and > it is probably worth including some examples of this on the schema.org > site. > > > 5. Avoiding ratholes > > If we can please discuss this without slipping into discussion of > http://www.w3.org/2001/tag/group/track/issues/14 I'd be happy. There > are places in schema.org usage where we tolerate an URL for a WebPage > being used in place of an URL that is more explictly for the > real-world entity itself. For example in http://schema.org/Person we > write "<a href="http://www.xyz.edu/students/alicejones.html" > itemprop="colleague">Alice Jones</a>". > > Clarifying the use of 'about' as above could help such pages clarify > which real world entity they are 'about'. This won't solve every issue > around entity disambiguation, but it will improve the basic support we > have within schema.org for stating such distinctions when we want to. > > (Sorry this was such a long mail...) > > Finally, let's also try not to get stuck on syntax issues at this > stage. We'll have to find the best patterns in Microdata/RDFa and > JSON-LD that we can for this, and it may sometimes be tricky. Here's > an attempt at amending the MusicEvent example by adding a WebPage and > 'about' - https://gist.github.com/anonymous/cf7e24f6378b176aa010 . We > might want to discuss a reverse property that could be expressed on > the entity rather than the page, for example. > > cheers, > > Dan > >
Received on Tuesday, 20 May 2014 16:31:20 UTC