W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2014

Re: working of schema.org/WebPage

From: Kevin Polley <kevin.polley@mutualadvantage.co.uk>
Date: Fri, 18 Apr 2014 00:59:53 -0000
Message-ID: <b15e772f1ae56ef26a7f03460ede1494.squirrel@213.165.245.209>
To: "Jason Douglas" <jasondouglas@google.com>, "Jarno van Driel" <jarno@quantumspork.nl>, "Jocelyn Fournier" <jocelyn.fournier@gmail.com>
Cc: "Public Vocabs" <public-vocabs@w3.org>
I'm also trying to get a better understanding of the mechanisms at play in
relation to the use/declaration of WebPage and other CreativeWork types on
the same page.

My query comes about as a result of seeing a growing number of CMS plugin
and theme developers, who want to help business owners (not developers)
expose structured data in their sites, who are adding itemscope
itemtype="http://schema.org/WebPage" and also itemscope
itemtype="http://schema.org/Article" into the head and or body.

My understanding was that a 'page' of content should be, as an example,
one of either WebPage, Article, Blog, AboutPage types etc. but not more or
all. Can or should a page of content be more than one type?

By way of an example see the following sample:

<div itemscope itemtype="http://schema.org/WebPage">
<p itemprop="name">Web Page Title</p>

  <div itemscope itemtype="http://schema.org/Article">
  <p itemprop="description">.. content ..</p>

    <div itemscope itemtype="http://schema.org/Blog>
    <p itemprop="about"> about the blog</p>
    </div>

  </div>

</div>

One thing that particularly intrigued me with the above sample was that
the Yandex structured data validator pulled out all three types but the
Google SDTT failed to detect the blog type.

thanks

Kevin


> Le 17/04/2014 20:25, Jason Douglas a écrit :
>> Yup, that's messed up.  Let's fix it!
>
> Hi,
>
> Note that examples regarding mainContentOfPage on schema.org are also
> misleading.
> E.g. : http://schema.org/Table
> =>  <meta itemprop="mainContentOfPage" content="true"/>
>
> I would have expected
> <div itemprop="mainContentOfPage" itemscope
> itemtype="http://schema.org/Table">
>
>
> But I fully agree it would be much more usefull to have
> mainContentOfPage as a 'Thing' rather than 'WebPageElement'
>
>    Jocelyn
>
>>
>> On Thu Apr 17 2014 at 11:15:43 AM, Jarno van Driel
>> <jarno@quantumspork.nl <mailto:jarno@quantumspork.nl>> wrote:
>>
>>     "...if a relation is declared without an explicit subject, then the
>>     subject will be assumed to be the current WebPage."
>>     Got it.
>>
>>     "It is legal for there to be multiple top-level entities." +
>>     "Current clients make up their own heuristics for this..."
>>     Brainfreeze!
>>     How am I, as a developer, to deal with this? Does this mean I have
>>     to somehow figure out which heuristics every parser/search engine
>>     uses, to be able to have control or do I need to try to chain
>>     everything together such that only one top-level entity is left?
>>
>>     And how would I do this for a category page on for example an
>>     eCommerce site. Which shows a range of Product entities on a
>>     CollectionPage, which together form the main-content and where the
>>     CollectionPage, for lack of a better term, only functions as a
>>     'wrapper' for the list of products.
>>
>>     "we probably do need a mechanism for indicating the "primary entity"
>>     of a webpage when there is one..."
>>     One the reasons why I asked my questions is because I encounter
>>     quite a lot of markup on websites where people use @mainContentPage
>>     on entities like Product. Now @mainContentOfPage has the expected
>>     type WebPageElement, but many aren't aware of this. And since there
>>     is no property to indicate which entity is the primary one I
>>     actually can completely understand they try to resolve it like this.
>>     And frankly, I'm confused as well.
>>
>>
>>
>>     On Thu, Apr 17, 2014 at 7:51 PM, Jason Douglas
>>     <jasondouglas@google.com <mailto:jasondouglas@google.com>> wrote:
>>
>>         It is legal for there to be multiple top-level entities.  That
>>         description of WebPage is not meant to imply anything about the
>>         relationship of those top-level objects... all that is saying is
>>         that if a relation is declared without an explicit subject, then
>>         the subject will be assumed to be the current WebPage.
>>
>>         That said, we probably do need a mechanism for indicating the
>>         "primary entity" of a webpage when there is one.  Current
>>         clients make up their own heuristics for this, but I think it
>>         would be better to have an explicit way of stating that.
>>
>>         -jason
>>
>>
>>         On Thu Apr 17 2014 at 10:41:47 AM, Jarno van Driel
>>         <jarno@quantumspork.nl <mailto:jarno@quantumspork.nl>> wrote:
>>
>>             I'm trying to understand semantic mechanisms better but am a
>>             bit confused about schema.org/WebPage
>>             <http://schema.org/WebPage> and I'd like to know how it
>> works.
>>
>>             Now it could well be I understand certain terminologies
>>             wrong, so please bare with me and be so nice to correct me
>>             when needed.
>>
>>             1] The description of http://schema.org/WebPage says:
>>             "Every web page is implicitly assumed to be declared to be
>>             of type WebPage, so the various properties about that
>>             webpage, such as breadcrumb may be used. We recommend
>>             explicit declaration if these properties are specified, but
>>             if they are found outside of an itemscope, they will be
>>             assumed to be about the page."
>>
>>             code example:
>>             <body itemscope itemtype="http://schema.org/WebPage">
>>                <!-- Content -->
>>             </body>
>>
>>             Now if the WebPage is the only entity is it then considered
>>             to be the 'Subject', the 'Object' or both?
>>
>>             2] If the WebPage contains an entity, let's say a Product,
>>             without specifying a property on the Product and I check
>>             this with Google's SDTT, I see 2 'root' entities, since
>>             there is no property to chain the two together. Yet I get
>>             the impression the Product gets treated as the 'Object',
>>             since it's the Product that gets used for Rich snippet
>>             extraction, and that therefore the WebPage is the 'Subject'
>> :
>>
>>             code example:
>>             <body itemscope itemtype="http://schema.org/WebPage">
>>                <span itemprop="name">Page title</span>
>>
>>                <div itemscope itemtype="http://schema.org/Product">
>>                  <span itemprop="name">Product name</span>
>>                  <!-- Product properties -->
>>                </div>
>>             </body>
>>
>>             Now since "Every web page is implicitly assumed to be
>>             declared to be of type WebPage" I was wondering if there
>>             also is a property that is 'implicitly assumed to be
>>             declared' (something like @contains) on the first entity
>>             that comes after it, like Product in this case, which
>>             indicates that the Product is the 'Object'?
>>
>>             And if not, than how does a parser 'know' which of the
>>             entities is the 'Subject' and which is the 'Object',
>>             shouldn't there be a predicate for this?
>>
>>             3] When a WebPage contains a bunch of 'root' entities, how
>>             does a parser make sense of this, does the DOM have anything
>>             to do with this?
>>
>>             <body itemscope itemtype="http://schema.org/WebPage">
>>                <span itemprop="name">Page title</span>
>>
>>                <div itemscope itemtype="http://schema.org/Product">
>>                  <span itemprop="name">Product 1 name</span>
>>                  <!-- Product properties -->
>>                </div>
>>
>>                <div itemscope itemtype="http://schema.org/Product">
>>                  <span itemprop="name">Product 2 name</span>
>>                  <!-- Product properties -->
>>                </div>
>>
>>                <div itemscope
>> itemtype="http://schema.org/LocalBusiness">
>>                  <span itemprop="name">Business name</span>
>>                  <!-- Product properties -->
>>                </div>
>>             </body>
>>
>>             Now the above could be full of misunderstandings because I
>>             lack in theoretical knowledge still, but that's exactly the
>>             thing I'm hoping to change. Who can enlighten me?
>>
>>
>>
>>
>>
>>
>
>
>
Received on Friday, 18 April 2014 00:57:06 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:29:39 UTC