Re: working of schema.org/WebPage

Yup, that's messed up.  Let's fix it!

On Thu Apr 17 2014 at 11:15:43 AM, Jarno van Driel <jarno@quantumspork.nl>
wrote:

> "...if a relation is declared without an explicit subject, then the
> subject will be assumed to be the current WebPage."
> Got it.
>
> "It is legal for there to be multiple top-level entities." + "Current
> clients make up their own heuristics for this..."
> Brainfreeze!
> How am I, as a developer, to deal with this? Does this mean I have to
> somehow figure out which heuristics every parser/search engine uses, to be
> able to have control or do I need to try to chain everything together such
> that only one top-level entity is left?
>
> And how would I do this for a category page on for example an eCommerce
> site. Which shows a range of Product entities on a CollectionPage, which
> together form the main-content and where the CollectionPage, for lack of a
> better term, only functions as a 'wrapper' for the list of products.
>
> "we probably do need a mechanism for indicating the "primary entity" of a
> webpage when there is one..."
> One the reasons why I asked my questions is because I encounter quite a
> lot of markup on websites where people use @mainContentPage on entities
> like Product. Now @mainContentOfPage has the expected type WebPageElement,
> but many aren't aware of this. And since there is no property to indicate
> which entity is the primary one I actually can completely understand they
> try to resolve it like this. And frankly, I'm confused as well.
>
>
>
> On Thu, Apr 17, 2014 at 7:51 PM, Jason Douglas <jasondouglas@google.com>wrote:
>
>> It is legal for there to be multiple top-level entities.  That
>> description of WebPage is not meant to imply anything about the
>> relationship of those top-level objects... all that is saying is that if a
>> relation is declared without an explicit subject, then the subject will be
>> assumed to be the current WebPage.
>>
>> That said, we probably do need a mechanism for indicating the "primary
>> entity" of a webpage when there is one.  Current clients make up their own
>> heuristics for this, but I think it would be better to have an explicit way
>> of stating that.
>>
>> -jason
>>
>>
>> On Thu Apr 17 2014 at 10:41:47 AM, Jarno van Driel <jarno@quantumspork.nl>
>> wrote:
>>
>>> I'm trying to understand semantic mechanisms better but am a bit
>>> confused about schema.org/WebPage and I'd like to know how it works.
>>>
>>> Now it could well be I understand certain terminologies wrong, so please
>>> bare with me and be so nice to correct me when needed.
>>>
>>> 1] The description of http://schema.org/WebPage says:
>>> "Every web page is implicitly assumed to be declared to be of type
>>> WebPage, so the various properties about that webpage, such as breadcrumb
>>> may be used. We recommend explicit declaration if these properties are
>>> specified, but if they are found outside of an itemscope, they will be
>>> assumed to be about the page."
>>>
>>> code example:
>>> <body itemscope itemtype="http://schema.org/WebPage">
>>>   <!-- Content -->
>>> </body>
>>>
>>> Now if the WebPage is the only entity is it then considered to be the
>>> 'Subject', the 'Object' or both?
>>>
>>> 2] If the WebPage contains an entity, let's say a Product, without
>>> specifying a property on the Product and I check this with Google's SDTT, I
>>> see 2 'root' entities, since there is no property to chain the two
>>> together. Yet I get the impression the Product gets treated as the
>>> 'Object', since it's the Product that gets used for Rich snippet
>>> extraction, and that therefore the WebPage is the 'Subject' :
>>>
>>> code example:
>>> <body itemscope itemtype="http://schema.org/WebPage">
>>>   <span itemprop="name">Page title</span>
>>>
>>>   <div itemscope itemtype="http://schema.org/Product">
>>>     <span itemprop="name">Product name</span>
>>>     <!-- Product properties -->
>>>   </div>
>>> </body>
>>>
>>> Now since "Every web page is implicitly assumed to be declared to be of
>>> type WebPage" I was wondering if there also is a property that is
>>> 'implicitly assumed to be declared' (something like @contains) on the first
>>> entity that comes after it, like Product in this case, which indicates that
>>> the Product is the 'Object'?
>>>
>>> And if not, than how does a parser 'know' which of the entities is the
>>> 'Subject' and which is the 'Object', shouldn't there be a predicate for
>>> this?
>>>
>>> 3] When a WebPage contains a bunch of 'root' entities, how does a parser
>>> make sense of this, does the DOM have anything to do with this?
>>>
>>> <body itemscope itemtype="http://schema.org/WebPage">
>>>   <span itemprop="name">Page title</span>
>>>
>>>   <div itemscope itemtype="http://schema.org/Product">
>>>     <span itemprop="name">Product 1 name</span>
>>>     <!-- Product properties -->
>>>   </div>
>>>
>>>   <div itemscope itemtype="http://schema.org/Product">
>>>     <span itemprop="name">Product 2 name</span>
>>>     <!-- Product properties -->
>>>   </div>
>>>
>>>   <div itemscope itemtype="http://schema.org/LocalBusiness">
>>>     <span itemprop="name">Business name</span>
>>>     <!-- Product properties -->
>>>   </div>
>>> </body>
>>>
>>> Now the above could be full of misunderstandings because I lack in
>>> theoretical knowledge still, but that's exactly the thing I'm hoping to
>>> change. Who can enlighten me?
>>>
>>>
>>>
>>>
>>>
>>>
>

Received on Thursday, 17 April 2014 18:25:50 UTC