Re: Indicating main entity / primaryTopic - proposal to use ''

Le 20/05/2014 18:16, Dan Brickley a écrit :
> On 20 May 2014 11:28, Jarno van Driel <> wrote:
>> Martin, I don't know if I a completely agree about going to the product
>> forum about this. I think I understand why you might say this, but in my
>> thread about the working of WebPage (, Jason Douglas
>> said:
>>> "That said, we probably do need a mechanism for indicating the "primary
>>> entity" of a webpage when there is one.  Current clients make up their own
>>> heuristics for this, but I think it would be better to have an explicit way
>>> of stating that."
>> But this is not the main subject of this thread. Maybe a new thread to
>> discuss the "primary entity" or continuation of the subject in the thread I
>> already started is a better place.
> This is very much in scope for public-vocabs and for discussions.
> There are a few pieces to the puzzle, but the basic idea is simple.
> allows a rich descriptive graph to be embedded in a Web
> page, which means we often have several entities mentioned; we'd like
> to know which one is the main one, if any.
> Consider the second example in to give us
> a concrete focus.
> It describes a 'MusicEvent' (a concert), whose 'location' is a
> 'Place'. The event lists multiple associated 'offers'; each 'Offer'
> with price/date etc. info. The event also lists two 'performer's, each
> a 'MusicGroup'.
> There is nothing *intrinsically* primary about the event, the
> location, the offers or the musicians. This description is all the
> richer because it mentions multiple entities. If I was forced to pick
> one, I'd probably guess at the MusicEvent being the 'main' entity
> here, because the others feel slightly more like background
> information. But there's no need to leave this to guesswork. If this
> markup was on the homepage of the venue, that publisher might well
> consider the Place to be the main entity. And if it was on an artist's
> homepage, they might want to mention the gig (perhaps alongside
> others) but indicate that the MusicGroup was the main thing.
> The above sketches this in terms of embedded structured data, but we
> can also think of this in terms of capturing a very common pattern in
> Web content. Often Web pages _do_ have a focus on a single entity. If
> we add a property like mainEntity, it would give sites a way to make
> this focus explicit.
> 1.
> We already have "about", "The subject matter of the content.",
> relating a CreativeWork to a Thing. This is enough to do what we need,
> if we add clarification and examples.
> I suggest the description should be updated to  say: "A Thing that is
> the primary subject matter of this CreativeWork".
> 2.
> If we want a more SKOS-like, bibliographic and nuanced notion of
> 'subject', I suggest we adopt something like Dublin Core's 'subject'
> to do that work.
> (DC has "The topic of the resource."/ "Typically, the subject will be
> represented using keywords, key phrases, or classification codes.
> Recommended best practice is to use a controlled vocabulary.", from
> )
> The distinction:
> if we want to say "This document is about the entity Sweden, i.e. the
> thing that is sameAs
>, we would use
>   ... i.e. this tells us the main thing that
> the page is about.
> but
> If we want to say, "This document's topic is “environmental impact of
> the decline of tin mining in Sweden in the 20th century“, we'd be
> going beyond "about" and would want a more bibliographic subject
> description, e.g. using DDC or UDC subject classification codes, SKOS
> etc.
> (fictional example, I know nothing about tin mining in Sweden)
> My proposal then is that we break out these two use cases, and target
> the 'about' more explicitly on the 'main entity' use case.
> 3. Tweak
> We should note that is a very similar
> notion to except that it allows multiple
> different entities to be referenced.
> "Indicates that the CreativeWork contains a reference to, but is not
> necessarily about a concept."
> I suggest rewording this in terms of entities/things, since we don't
> use 'concept' elsewhere:
> "Indicates that the CreativeWork contains a reference to, but is not
> necessarily about some particular thing."

Hi Dan,

With this description, it's not really easy to make the difference with 
the property (not sure BTW I really 
understand the difference :))


> 4.
> We already have this strange-looking property. It addresses a
> different use case:
> it relates a WebPage to a part of that WebPage,
> "Indicates if this web page element is the main subject of the page."
> The wording is awkward. It should be something like "Indicates the
> main element within some Web page." since the expected type is
> WebPageElement.
> I'm not convinced that the various types we have under WebPageElement
> ("A web page element, like a table or an image") really work, but the
> important point here is that they address a different scenario. A
> WebPageElement is a piece of markup, like SiteNavigationElement,
> Table, WPAdBlock, WPFooter, WPHeader, WPSideBar. This is a different
> idea to the problem of finding the main *entity* that all this markup
> is describing.
> HTML already a <main> element, see
> "The HTML <main> element represents the main content of  the <body> of
> a document or application. The main content area consists of content
> that is directly related to, or expands upon the central topic of a
> document or the central functionality of an application. This content
> should be unique to the document, excluding any content that is
> repeated across a set of documents such as sidebars, navigation links,
> copyright information, site logos, and search forms (unless, of
> course, the document's main function is as a search form)."
> I believe most of the use cases for mainContentOfPage are better
> addressed by <main>.
> However <main> does not help us pick out a single highlighted entity:
> the main section of a Web page could still contain microdata/rdfa or
> json-ld mentioning lots of different entities.
> It is useful sometimes to know that structured data markup comes from
> footers or boilerplate rather than the <main> section of a page, and
> it is probably worth including some examples of this on the
> site.
> 5. Avoiding ratholes
> If we can please discuss this without slipping into discussion of
> I'd be happy. There
> are places in usage where we tolerate an URL for a WebPage
> being used in place of an URL that is more explictly for the
> real-world entity itself. For example in we
> write "<a href=""
> itemprop="colleague">Alice Jones</a>".
> Clarifying the use of 'about' as above could help such pages clarify
> which real world entity they are 'about'. This won't solve every issue
> around entity disambiguation, but it will improve the basic support we
> have within for stating such distinctions when we want to.
> (Sorry this was such a long mail...)
> Finally, let's also try not to get stuck on syntax issues at this
> stage. We'll have to find the best patterns in Microdata/RDFa and
> JSON-LD that we can for this, and it may sometimes be tricky. Here's
> an attempt at amending the MusicEvent example by adding a WebPage and
> 'about' - . We
> might want to discuss a reverse property that could be expressed on
> the entity rather than the page, for example.
> cheers,
> Dan

Received on Tuesday, 20 May 2014 21:01:07 UTC