W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2014

Re: working of schema.org/WebPage

From: Thad Guidry <thadguidry@gmail.com>
Date: Sun, 20 Apr 2014 10:10:09 -0500
Message-ID: <CAChbWaOKY+g-rcMnASXU8UP15CT0AA7KuBK4Wij1gqkvVtJOdQ@mail.gmail.com>
To: "Wallis,Richard" <Richard.Wallis@oclc.org>
Cc: Jarno van Driel <jarno@quantumspork.nl>, Jocelyn Fournier <jocelyn.fournier@gmail.com>, Jason Douglas <jasondouglas@google.com>, "<public-vocabs@w3.org>" <public-vocabs@w3.org>
Least we forget...

A webpage IS A html document.
It can contain multiple sections about multiple subjects.

Reminder FYI.
Thanks for asking these kinds of questions, Jarno.  Keep 'em coming!



On Sun, Apr 20, 2014 at 4:13 AM, Wallis,Richard <Richard.Wallis@oclc.org>wrote:

>  I would suggest that it should read “… then the subject will be assumed
> to be the current *webpage*”.
>
>  That subject of that webpage can be of any Type - Person, Place,
> Organization, WebPage or one of its more focused subtypes - that is being
> described.
>
>  By defining the type, you are doing the right thing otherwise some
> process or person parsing the data would just have to assume that it
> is ‘just a WebPage’.
>
>  ~Richard
>
>  On 20 Apr 2014, at 02:05, Jarno van Driel <jarno@quantumspork.nl> wrote:
>
>  "all that is saying is that if a relation is declared without an
> explicit subject, then the subject will be assumed to be the current
> WebPage"
>
>  So if the WebPage is only there to function as a sort of 'catch all'
> than why does it even have subClasses?
>
>  I actually use WebPage and it's subClasses in an attempt to do the right
> thing, but is there a point in doing this, does it actually matter if I
> declare the page type?
>
>
> On Thu, Apr 17, 2014 at 9:03 PM, Jocelyn Fournier <
> jocelyn.fournier@gmail.com> wrote:
>
>> Le 17/04/2014 20:25, Jason Douglas a écrit :
>>
>>  Yup, that's messed up.  Let's fix it!
>>>
>>
>>  Hi,
>>
>> Note that examples regarding mainContentOfPage on schema.org are also
>> misleading.
>> E.g. : http://schema.org/Table
>> =>  <meta itemprop="mainContentOfPage" content="true"/>
>>
>> I would have expected
>> <div itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/
>> Table">
>>
>>
>> But I fully agree it would be much more usefull to have mainContentOfPage
>> as a 'Thing' rather than 'WebPageElement'
>>
>>   Jocelyn
>>
>>
>>> On Thu Apr 17 2014 at 11:15:43 AM, Jarno van Driel
>>>  <jarno@quantumspork.nl <mailto:jarno@quantumspork.nl>> wrote:
>>>
>>>     "...if a relation is declared without an explicit subject, then the
>>>     subject will be assumed to be the current WebPage."
>>>     Got it.
>>>
>>>     "It is legal for there to be multiple top-level entities." +
>>>     "Current clients make up their own heuristics for this..."
>>>     Brainfreeze!
>>>     How am I, as a developer, to deal with this? Does this mean I have
>>>     to somehow figure out which heuristics every parser/search engine
>>>     uses, to be able to have control or do I need to try to chain
>>>     everything together such that only one top-level entity is left?
>>>
>>>     And how would I do this for a category page on for example an
>>>     eCommerce site. Which shows a range of Product entities on a
>>>     CollectionPage, which together form the main-content and where the
>>>     CollectionPage, for lack of a better term, only functions as a
>>>     'wrapper' for the list of products.
>>>
>>>     "we probably do need a mechanism for indicating the "primary entity"
>>>     of a webpage when there is one..."
>>>     One the reasons why I asked my questions is because I encounter
>>>     quite a lot of markup on websites where people use @mainContentPage
>>>     on entities like Product. Now @mainContentOfPage has the expected
>>>     type WebPageElement, but many aren't aware of this. And since there
>>>     is no property to indicate which entity is the primary one I
>>>     actually can completely understand they try to resolve it like this.
>>>     And frankly, I'm confused as well.
>>>
>>>
>>>
>>>     On Thu, Apr 17, 2014 at 7:51 PM, Jason Douglas
>>>      <jasondouglas@google.com <mailto:jasondouglas@google.com>> wrote:
>>>
>>>         It is legal for there to be multiple top-level entities.  That
>>>         description of WebPage is not meant to imply anything about the
>>>         relationship of those top-level objects... all that is saying is
>>>         that if a relation is declared without an explicit subject, then
>>>         the subject will be assumed to be the current WebPage.
>>>
>>>         That said, we probably do need a mechanism for indicating the
>>>         "primary entity" of a webpage when there is one.  Current
>>>         clients make up their own heuristics for this, but I think it
>>>         would be better to have an explicit way of stating that.
>>>
>>>         -jason
>>>
>>>
>>>         On Thu Apr 17 2014 at 10:41:47 AM, Jarno van Driel
>>>          <jarno@quantumspork.nl <mailto:jarno@quantumspork.nl>> wrote:
>>>
>>>             I'm trying to understand semantic mechanisms better but am a
>>>             bit confused about schema.org/WebPage
>>>              <http://schema.org/WebPage> and I'd like to know how it
>>> works.
>>>
>>>
>>>             Now it could well be I understand certain terminologies
>>>             wrong, so please bare with me and be so nice to correct me
>>>             when needed.
>>>
>>>             1] The description of http://schema.org/WebPage says:
>>>             "Every web page is implicitly assumed to be declared to be
>>>             of type WebPage, so the various properties about that
>>>             webpage, such as breadcrumb may be used. We recommend
>>>             explicit declaration if these properties are specified, but
>>>             if they are found outside of an itemscope, they will be
>>>             assumed to be about the page."
>>>
>>>             code example:
>>>             <body itemscope itemtype="http://schema.org/WebPage">
>>>                <!-- Content -->
>>>             </body>
>>>
>>>             Now if the WebPage is the only entity is it then considered
>>>             to be the 'Subject', the 'Object' or both?
>>>
>>>             2] If the WebPage contains an entity, let's say a Product,
>>>             without specifying a property on the Product and I check
>>>             this with Google's SDTT, I see 2 'root' entities, since
>>>             there is no property to chain the two together. Yet I get
>>>             the impression the Product gets treated as the 'Object',
>>>             since it's the Product that gets used for Rich snippet
>>>             extraction, and that therefore the WebPage is the 'Subject' :
>>>
>>>             code example:
>>>             <body itemscope itemtype="http://schema.org/WebPage">
>>>                <span itemprop="name">Page title</span>
>>>
>>>                <div itemscope itemtype="http://schema.org/Product">
>>>                  <span itemprop="name">Product name</span>
>>>                  <!-- Product properties -->
>>>                </div>
>>>             </body>
>>>
>>>             Now since "Every web page is implicitly assumed to be
>>>             declared to be of type WebPage" I was wondering if there
>>>             also is a property that is 'implicitly assumed to be
>>>             declared' (something like @contains) on the first entity
>>>             that comes after it, like Product in this case, which
>>>             indicates that the Product is the 'Object'?
>>>
>>>             And if not, than how does a parser 'know' which of the
>>>             entities is the 'Subject' and which is the 'Object',
>>>             shouldn't there be a predicate for this?
>>>
>>>             3] When a WebPage contains a bunch of 'root' entities, how
>>>             does a parser make sense of this, does the DOM have anything
>>>             to do with this?
>>>
>>>             <body itemscope itemtype="http://schema.org/WebPage">
>>>                <span itemprop="name">Page title</span>
>>>
>>>                <div itemscope itemtype="http://schema.org/Product">
>>>                  <span itemprop="name">Product 1 name</span>
>>>                  <!-- Product properties -->
>>>                </div>
>>>
>>>                <div itemscope itemtype="http://schema.org/Product">
>>>                  <span itemprop="name">Product 2 name</span>
>>>                  <!-- Product properties -->
>>>                </div>
>>>
>>>                <div itemscope itemtype="http://schema.org/LocalBusiness
>>> ">
>>>                  <span itemprop="name">Business name</span>
>>>                  <!-- Product properties -->
>>>                </div>
>>>             </body>
>>>
>>>             Now the above could be full of misunderstandings because I
>>>             lack in theoretical knowledge still, but that's exactly the
>>>             thing I'm hoping to change. Who can enlighten me?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>


-- 
-Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>
Thad on LinkedIn <http://www.linkedin.com/in/thadguidry/>
Received on Sunday, 20 April 2014 15:10:37 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:29:39 UTC