Re: Refering to an element (and its markup) on an other webpage (offlist)

Hi Niels,

Some [hopefully] helpful hints from my experience…..

Firstly, when thinking about how a search engine sees your Schema.org
marked up site, try to think entities not pages or user navigation paths
through your site.

Secondly, initially don’t worry about the particular serialisation of the
data in pages (Microdata/RDfa/JSON-LD), that can come later when you have
decided what data you are going to share.

To take your simple example you have individual event pages, each
containing its own Event entity description and associated Offer(s).

{
  "@context": "http://schema.org",
  "@type": "Event",
  "name": "Interesting Event",
  "location": {
    "@type": "Place",
    "name": "The place",
    "address": {
      "@type": "PostalAddress",
      "addressLocality": "London",
 "addressCountry": "UK",
      "streetAddress": "7, A Street"
    }
  },
  "offers": {
    "@type": "Offer",
    "price": "13.00",
    "priceCurrency": "USD",
    "offeredBy": "http://myorganization.com"
  },
  "startDate": "2018-07-20T08:00:"
}


You will also have an about page, or the home page, containing an
Organization entity description

{
"@context" : "http://schema.org",
"@type" : "Organization",
"name": "My Organization",
"url" : "http://myorganization.com",
"contactPoint" : {
 "@type" : "ContactPoint",
 "telephone" : "+44-202-123-1212",
 "contactType" : "customer service"
}
}

If you identify these pages to the search engine crawler using sitemaps,
that is all they need to know that the organisation “My Organization”
is offering tickets to the “Interesting Event” and visa versa.

For human discovery needs, you may want to introduce listing pages to help
them navigate to a suitable event, but from a structured data point of view
such a page is not strictly necessary.

~Richard.





Richard Wallis
Founder, Data Liberate
http://dataliberate.com
Linkedin: http://www.linkedin.com/in/richardwallis
Twitter: @rjw

On 10 July 2018 at 16:31, Niels <nielsl@xs4all.nl> wrote:

> It is however not just the overhead of duplicate data, it also means that
> for a webapp, as soon as we change any detail of information, this detail
> must be edited on a number of pages, and those pages must be rerendered,
> meaning caching of those pages is also far less effective.
>
> With microdata rather than JSON-LD it also means adding loads of meta tags
> thoughout the body of a document. Which is ok, but I suspect indexers tend
> to see this as data which is provided but still hidden for visitors, and
> thus be a reason for them to lower the ranking of the page. (Indexers tend
> not to rely on information hidden from sight).
>
> To me this historical way of webpage centric thinking seems like an
> elephant in the room, which sooner or later is goint to bite us if we don't
> do something about it.
>
> Even if it doesn't bite, it leaves uncertainty about proper use of
> microdata. Uncertainty is bad for buisness, I might invest into marking up
> our website today, but not know if I have done it right and will have more
> costs in the future. (Noone enters the room if the elephant is still just
> standing there, it might bite you know. ;) )
>
> Changing the nature of webpage centric markup overnight would also be a
> bad idea. But how about coming up with clear documentation and examples of
> ways to reference data on other pages, and discuss the options with
> indexers such as Google Microsoft and Yandex.
> Maybe it is quite achievable to use itemref not only with an element ID on
> this page, but also provide a url to an element on another page on the same
> domain.
>
> The most important aspect is getting aknoledgement from search indexers
> whether they would be able to support such markup, and providing clear
> documentations and examples to the public showing them this is a valid way
> of marking up your pages. (And where its limits lie, such as not attempting
> to mark up data cross-domain)
>
> Yes, we can sustain a model of providing all data on each page, but that
> will form many use cases cause confusion (should contact info of company
> logo be on every page?!). And it will on fact mean significant overhead,
> because it interfears with segmentation of the website in webpages, and
> means caching of pages becomes much less useful.
>
>
> Kind regards,
> Niels Lancel
>
>
> On July 10, 2018 5:04:39 PM GMT+02:00, Martin Hepp <mfhepp@gmail.com>
> wrote:
>>
>> Hi can't speak for the search engines, but a few general hints:
>>
>> 1. The main unit of interest for search engines are Web resources identified via URIs (approximately: "Web pages"). These are the results listed in organic search results, those are clicked, etc. - the entire eco-system is centered on URIs / "pages".
>>
>> 2. As a consequence, most of the consumption of structured data is used to improve the selection, ranking, and presentation (e.g. with Rich Snippets) of URIs for queries.
>>
>> 3. A few new features of search engines (like Knowledge Graph panels) are based on the notion of "entities" and try to combine information from multiple URIs/pages and other sources, like Wikidata and Wikipedia. But this is only a fraction of the search engine's functionality
>>
>> Since the majority of the ecosystem is, at least historically, page-centric, content from external pages is difficult to consider for the the selection, ranking, and presentation of a single URI. The problems range from the organization of the respective data structures to questions of trust and authority (e.g. when integrating product reviews from an external site in the presentation of search engine results).
>>
>> This might change slowly, so it will make sense to include references to external entities, but I assume this will primarily help improve the relevance ranking of a URI.
>>
>> So in a nutshell:
>>
>> 1. References to other entities are good; add them if you can with moderate effort.
>> 2. Make sure you populate all required fields for an entitity *on that very same page*.
>>
>> As for the issue of redundancy: In practice, it is no big deal if this means that data in JSON-LD will be redundant within your site. We once did a study that showed that even very detailed data markup would at most add a few percent to the total size of the HTML, and that effect will be even lower for page speed as HTTP 1.1 and up use compression if configured correctly.
>>
>> Best wishes
>> Martin Hepp
>>
>>
>> ------------------------------
>>
>> martin hepp  http://www.heppnetz.de
>> mhepp@computer.org          @mfhepp
>>
>>
>>
>>
>>  On 10 Jul 2018, at 16:29, Niels <nielsl@xs4all.nl> wrote:
>>>
>>>  I have unfortunatly had no further replies on this question so far.
>>>  It makes perfect sense that indexers dont want to go into any detail about which pages they fetch and what leads up to that. What I am hoping to learn is not what indexers do in practice, but what I as a web developer might do to make sure it is possible for them to fetch the information.
>>>
>>>  In the past I have expressed some frustration over this matter, because I struggle to incorporate the vocab with the manner in which I intended to present data on my websites. The Schema.org vocab, like most vocabs, is meant to descrive context on a per document basis. It can reference other documents, and it can go into details on a document, but the basic entry point is a single document.
>>>  This makes perfect sense, but in my persobal opinion has not been applied in that manner when considering websites.
>>>
>>>  When we look at schema vocab being applied to websites, it considers each webpage to be a document.
>>>  In my view, that approach introduces limitations. A webpage is a page out of a larger document. I would argue that not a webpage is a document, but a website is a document.
>>>
>>>  The limitations in treating webpages as units of information is that fir some (many) websites, certain pages cannot provide a detailed set of data on their own. Contact details are on another web page, a description of thecauthor is on yet another webpage, the tickets for an event are again sold (and thus their details provided on) an other webpage.
>>>
>>>  I would like to be able to mark up information across my website, without fear that if I do not provide the ticket sales information on the same page as the event information, it might not get indexed properly.
>>>  Using validators such as the markup validator of Google, I get countless warnings that certain properties are missing, but that it is advised to include them. In fact I do provise that information, just not on this very webpage. I would like a solid method of stating that more data on this itemscope is provided on an other page. I can eaven clearly point to the ID of the element which provides that data.
>>>
>>>  So again, I hope someone can clear up for me if it is indeed considered good use of microdata and schema vocab to itemref a url with ID of an element on an other page. I also ask that this question and the aswer to it are provided to the public, as I think I am not the only obe uncertain about the manner in which relationships between webpages should be indicated.
>>>
>>>  This is mainly intended for binding weboages of the same website, on one single domain. I argue to consider a web domain as the scope of a set of data, rather than a webpage.
>>>
>>>  Kind regards,
>>>  Niels Lancel
>>>
>>>  On July 9, 2018 8:56:06 AM GMT+02:00, Webfeet <schema@webfeet.org> wrote:
>>>
>>>  On 08/07/18 15:10, Niels wrote:
>>>
>>>  Say I have an overview page of upcoming events. As soon as I start
>>>  describing an even will take place next sunday, indexers such as
>>>  google advise me to also include an offer of the ticket, a location, a
>>>  description etc. etc.. It nakes perfect sense, but that information as
>>>  already provided on the events own details oage, which is a seperate
>>>  webpage on the same website.
>>>
>>>
>>>  You mean, instead of having (using the JSON-LD format):
>>>
>>>  ... "offers" :
>>>  {
>>>  "@type" : "Offer",
>>>  "price" : "..." ,
>>>  "availability" : "...",
>>>  "url" : "https://some.otherpage.com/events/thisevent.html"
>>>  }
>>>
>>>  you have:
>>>
>>>  ... "offers" : "https://some.otherpage.com/events/thisevent.html#offers"
>>>
>>>  As, as per https://schema.org/docs/datamodel.html#conformance, you can
>>>  give text or a url. The URL has to be of the page holding the extra
>>>  details and the specific markup on that page will need an "@id" matching
>>>  the URL... so the "thisevent.html" page would need (?):
>>>
>>>  ... "offers" :
>>>  {
>>>  "@type" : "Offer",
>>>  "@id" : "https://some.otherpage.com/events/thisevent.html#offers"
>>>  "price" : "..." ,
>>>  "availability" : "...",
>>>  }
>>>
>>>  That, as I understand it, is the theory (from my reading and playing, so
>>>  guesswork rather than anything definitive....)
>>>
>>>  What happens in practice is a different territory, you are asking what
>>>  individual search engine do (might do...) and I've found a clear
>>>  reticence when people answer those questions - they are not going to
>>>  give away details of the internal workings of "their" search engine.
>>>
>>>  I have noticed that Google's SDTT doesn't necessarily accept a URL (it
>>>  seems to in some places, not in others). Google's Structured Data
>>>  documentation also says one event per page (which may help if people
>>>  don't add a "@id", but this is more guesswork). And if you use links you
>>>  don't get to see what connections are being made in the background...
>>>
>>>  Now, if you are writing something that looks for an event detail and
>>>  then wants to collect "further details" from linked pages, you are the
>>>  search engine and you try to follow these links...
>>>
>>>  SameAs can reference whole pages, but can not reference a certain
>>>  element on a certain page. Itemref can reference a certain element on
>>>  this page, but can not reference anything on other pages. I would like
>>>  to use sonething which allows me to point out a soesific eoement on an
>>>  other page.
>>>
>>>
>>>  As I read "sameAs", it's there to give an incontrovertible label to your
>>>  markup. Something that defines unambiguously what your markup is about.
>>>  Don't know if there's an implication a crawler should follow the link.
>>>  SameAs takes a URL though, so it should be able to cope with a link to
>>>  an "#element" (??)
>>>
>>>  Am I asking for something useful or am misunderstanding the use of
>>>  microdata/schema vocab?
>>>
>>>
>>>  Yes, useful :-)
>>>
>>>  My suspicion is that this is all taken as "rather self evident", that it
>>>  is there but never even made it into the FAQ...
>>>
>>>  Kind regards,
>>>  Niels Lancel
>>>
>>>
>>>  Let me know if you get any other offlist answers... we are probably
>>>  trying to cope with similar things :-)
>>>
>>>  Webfeet
>>>  "roleName":"Interested Observer" ...
>>>
>>>
>>

Received on Tuesday, 10 July 2018 19:23:16 UTC