Re: Indicating main entity / primaryTopic - proposal to use 'schema.org/about'

>
> "[all of it]"


+1000


2014-08-11 22:36 GMT+02:00 martin.hepp@ebusiness-unibw.org <
martin.hepp@ebusiness-unibw.org>:

> Dan:
> Picking up an old thread with a new, related request: I think it would
> make the lives of ALL Web developers a LOT easier if the sponsors of
> schema.org could spend some effort to make sure that their parsers and
> consuming components treat all variants of valid schema.org equally, i.e.
> if you properly follow the spec, you should assume that the search engines
> understand your information if they process the respective type of
> information. My experience is that you need a lot of insider knowledge to
> design schema.org markup in a way that maximizes the understanding by
> search engines, e.g. in the case of
>
> - syntaxes (RDFa, Microdata, JSON-LD, ...) and
> - variants and alternatives in schema.org.
>
> I know that this is difficult to implement at the level of four big
> corporations with hundreds or thousands of software components. Still, it
> would help to define schema.org-based test-cases that are used for
> automated testing. I once started something similar for GoodRelations at
>
>     http://www.heppnetz.de/rdfa4google/testcases.html
>
> But I think we need something like that for each major schema.org type in
> all relevant syntaxes, and for the more complex types, we will need
> variants (e.g. pricing for Offer).
>
> Again, I think this is crucial for lowering the entrance barrier for
> adoption, because schema.org would be the official guideline for
> developers. Currently, schema.org is only a starting point and you need a
> lot of additional expertise and experience to apply it properly.
>
>
> Best wishes / Mit freundlichen Grüßen
>
> Martin Hepp
>
> -------------------------------------------------------
> martin hepp
> e-business & web science research group
> universitaet der bundeswehr muenchen
>
> e-mail:  martin.hepp@unibw.de
> phone:   +49-(0)89-6004-4217
> fax:     +49-(0)89-6004-4620
> www:     http://www.unibw.de/ebusiness/ (group)
>          http://www.heppnetz.de/ (personal)
> skype:   mfhepp
> twitter: mfhepp
>
> Check out GoodRelations for E-Commerce on the Web of Linked Data!
> =================================================================
> * Project Main Page: http://purl.org/goodrelations/
>
>
>
>
> On 20 May 2014, at 18:16, Dan Brickley <danbri@google.com> wrote:
>
> > On 20 May 2014 11:28, Jarno van Driel <jarnovandriel@gmail.com> wrote:
> >> Martin, I don't know if I a completely agree about going to the product
> >> forum about this. I think I understand why you might say this, but in my
> >> thread about the working of WebPage (http://bit.ly/1jyFN0g), Jason
> Douglas
> >> said:
> >>
> >>> "That said, we probably do need a mechanism for indicating the "primary
> >>> entity" of a webpage when there is one.  Current clients make up their
> own
> >>> heuristics for this, but I think it would be better to have an
> explicit way
> >>> of stating that."
> >>
> >> But this is not the main subject of this thread. Maybe a new thread to
> >> discuss the "primary entity" or continuation of the subject in the
> thread I
> >> already started is a better place.
> >
> > This is very much in scope for public-vocabs and for schema.org
> discussions.
> >
> > There are a few pieces to the puzzle, but the basic idea is simple.
> > Schema.org allows a rich descriptive graph to be embedded in a Web
> > page, which means we often have several entities mentioned; we'd like
> > to know which one is the main one, if any.
> >
> > Consider the second example in http://schema.org/MusicEvent to give us
> > a concrete focus.
> >
> > It describes a 'MusicEvent' (a concert), whose 'location' is a
> > 'Place'. The event lists multiple associated 'offers'; each 'Offer'
> > with price/date etc. info. The event also lists two 'performer's, each
> > a 'MusicGroup'.
> >
> > There is nothing *intrinsically* primary about the event, the
> > location, the offers or the musicians. This description is all the
> > richer because it mentions multiple entities. If I was forced to pick
> > one, I'd probably guess at the MusicEvent being the 'main' entity
> > here, because the others feel slightly more like background
> > information. But there's no need to leave this to guesswork. If this
> > markup was on the homepage of the venue, that publisher might well
> > consider the Place to be the main entity. And if it was on an artist's
> > homepage, they might want to mention the gig (perhaps alongside
> > others) but indicate that the MusicGroup was the main thing.
> >
> > The above sketches this in terms of embedded structured data, but we
> > can also think of this in terms of capturing a very common pattern in
> > Web content. Often Web pages _do_ have a focus on a single entity. If
> > we add a property like mainEntity, it would give sites a way to make
> > this focus explicit.
> >
> > PROPOSAL:
> >
> > 1.
> > We already have "about", "The subject matter of the content.",
> > relating a CreativeWork to a Thing. This is enough to do what we need,
> > if we add clarification and examples.
> >
> > I suggest the description should be updated to  say: "A Thing that is
> > the primary subject matter of this CreativeWork".
> >
> > 2.
> > If we want a more SKOS-like, bibliographic and nuanced notion of
> > 'subject', I suggest we adopt something like Dublin Core's 'subject'
> > to do that work.
> >
> > (DC has "The topic of the resource."/ "Typically, the subject will be
> > represented using keywords, key phrases, or classification codes.
> > Recommended best practice is to use a controlled vocabulary.", from
> > http://purl.org/dc/terms/ )
> >
> > The distinction:
> >
> > if we want to say "This document is about the entity Sweden, i.e. the
> > thing that is sameAs http://en.wikipedia.org/wiki/Sweden
> > http://www.freebase.com/m/0d0vqn), we would use
> > http://schema.org/about   ... i.e. this tells us the main thing that
> > the page is about.
> >
> > but
> >
> > If we want to say, "This document's topic is “environmental impact of
> > the decline of tin mining in Sweden in the 20th century“, we'd be
> > going beyond "about" and would want a more bibliographic subject
> > description, e.g. using DDC or UDC subject classification codes, SKOS
> > etc.
> >
> > (fictional example, I know nothing about tin mining in Sweden)
> >
> > My proposal then is that we break out these two use cases, and target
> > the 'about' more explicitly on the 'main entity' use case.
> >
> > 3. Tweak http://schema.org/mentions
> >
> > We should note that http://schema.org/mentions is a very similar
> > notion to http://schema.org/about except that it allows multiple
> > different entities to be referenced.
> >
> > "Indicates that the CreativeWork contains a reference to, but is not
> > necessarily about a concept."
> >
> > I suggest rewording this in terms of entities/things, since we don't
> > use 'concept' elsewhere:
> >
> > "Indicates that the CreativeWork contains a reference to, but is not
> > necessarily about some particular thing."
> >
> > 4. http://schema.org/mainContentOfPage
> >
> > We already have this strange-looking property. It addresses a
> > different use case:
> >
> > it relates a WebPage to a part of that WebPage,
> > "Indicates if this web page element is the main subject of the page."
> >
> > The wording is awkward. It should be something like "Indicates the
> > main element within some Web page." since the expected type is
> > WebPageElement.
> >
> > I'm not convinced that the various types we have under WebPageElement
> > ("A web page element, like a table or an image") really work, but the
> > important point here is that they address a different scenario. A
> > WebPageElement is a piece of markup, like SiteNavigationElement,
> > Table, WPAdBlock, WPFooter, WPHeader, WPSideBar. This is a different
> > idea to the problem of finding the main *entity* that all this markup
> > is describing.
> >
> > HTML already a <main> element, see
> > https://developer.mozilla.org/en-US/docs/Web/HTML/Element/main
> >
> > "The HTML <main> element represents the main content of  the <body> of
> > a document or application. The main content area consists of content
> > that is directly related to, or expands upon the central topic of a
> > document or the central functionality of an application. This content
> > should be unique to the document, excluding any content that is
> > repeated across a set of documents such as sidebars, navigation links,
> > copyright information, site logos, and search forms (unless, of
> > course, the document's main function is as a search form)."
> >
> > I believe most of the use cases for mainContentOfPage are better
> > addressed by <main>.
> >
> > However <main> does not help us pick out a single highlighted entity:
> > the main section of a Web page could still contain microdata/rdfa or
> > json-ld mentioning lots of different entities.
> >
> > It is useful sometimes to know that structured data markup comes from
> > footers or boilerplate rather than the <main> section of a page, and
> > it is probably worth including some examples of this on the schema.org
> > site.
> >
> >
> > 5. Avoiding ratholes
> >
> > If we can please discuss this without slipping into discussion of
> > http://www.w3.org/2001/tag/group/track/issues/14 I'd be happy. There
> > are places in schema.org usage where we tolerate an URL for a WebPage
> > being used in place of an URL that is more explictly for the
> > real-world entity itself. For example in http://schema.org/Person we
> > write "<a href="http://www.xyz.edu/students/alicejones.html"
> > itemprop="colleague">Alice Jones</a>".
> >
> > Clarifying the use of 'about' as above could help such pages clarify
> > which real world entity they are 'about'. This won't solve every issue
> > around entity disambiguation, but it will improve the basic support we
> > have within schema.org for stating such distinctions when we want to.
> >
> > (Sorry this was such a long mail...)
> >
> > Finally, let's also try not to get stuck on syntax issues at this
> > stage. We'll have to find the best patterns in Microdata/RDFa and
> > JSON-LD that we can for this, and it may sometimes be tricky. Here's
> > an attempt at amending the MusicEvent example by adding a WebPage and
> > 'about' - https://gist.github.com/anonymous/cf7e24f6378b176aa010 . We
> > might want to discuss a reverse property that could be expressed on
> > the entity rather than the page, for example.
> >
> > cheers,
> >
> > Dan
> >
>
>
>

Received on Monday, 11 August 2014 21:14:32 UTC