Re: Refering to an element (and its markup) on an other webpage (offlist) from Niels on 2018-07-10 (public-schemaorg@w3.org from July 2018)

From: Niels <nielsl@xs4all.nl>
Date: Tue, 10 Jul 2018 17:31:16 +0200
To: Martin Hepp <mfhepp@gmail.com>
CC: public-schemaorg@w3.org,Steve Harris <steveharrischelt@gmail.com>
Message-ID: <FEB06A23-4DF0-45DE-96DA-2CD0B19D93A6@xs4all.nl>
It is however not just the overhead of duplicate data, it also means that for a webapp, as soon as we change any detail of information, this detail must be edited on a number of pages, and those pages must be rerendered, meaning caching of those pages is also far less effective.

With microdata rather than JSON-LD it also means adding loads of meta tags thoughout the body of a document. Which is ok, but I suspect indexers tend to see this as data which is provided but still hidden for visitors, and thus be a reason for them to lower the ranking of the page. (Indexers tend not to rely on information hidden from sight).

To me this historical way of webpage centric thinking seems like an elephant in the room, which sooner or later is goint to bite us if we don't do something about it.

Even if it doesn't bite, it leaves uncertainty about proper use of microdata. Uncertainty is bad for buisness, I might invest into marking up our website today, but not know if I have done it right and will have more costs in the future. (Noone enters the room if the elephant is still just standing there, it might bite you know. ;) )

Changing the nature of webpage centric markup overnight would also be a bad idea. But how about coming up with clear documentation and examples of ways to reference data on other pages, and discuss the options with indexers such as Google Microsoft and Yandex.
Maybe it is quite achievable to use itemref not only with an element ID on this page, but also provide a url to an element on another page on the same domain.

The most important aspect is getting aknoledgement from search indexers whether they would be able to support such markup, and providing clear documentations and examples to the public showing them this is a valid way of marking up your pages. (And where its limits lie, such as not attempting to mark up data cross-domain)

Yes, we can sustain a model of providing all data on each page, but that will form many use cases cause confusion (should contact info of company logo be on every page?!). And it will on fact mean significant overhead, because it interfears with segmentation of the website in webpages, and means caching of pages becomes much less useful.


Kind regards,
Niels Lancel


On July 10, 2018 5:04:39 PM GMT+02:00, Martin Hepp <mfhepp@gmail.com> wrote:
>Hi can't speak for the search engines, but a few general hints:
>
>1. The main unit of interest for search engines are Web resources
>identified via URIs (approximately: "Web pages"). These are the results
>listed in organic search results, those are clicked, etc. - the entire
>eco-system is centered on URIs / "pages". 
>
>2. As a consequence, most of the consumption of structured data is used
>to improve the selection, ranking, and presentation (e.g. with Rich
>Snippets) of URIs for queries.
>
>3. A few new features of search engines (like Knowledge Graph panels)
>are based on the notion of "entities" and try to combine information
>from multiple URIs/pages and other sources, like Wikidata and
>Wikipedia. But this is only a fraction of the search engine's
>functionality
>
>Since the majority of the ecosystem is, at least historically,
>page-centric, content from external pages is difficult to consider for
>the the selection, ranking, and presentation of a single URI. The
>problems range from the organization of the respective data structures
>to questions of trust and authority (e.g. when integrating product
>reviews from an external site in the presentation of search engine
>results).
>
>This might change slowly, so it will make sense to include references
>to external entities, but I assume this will primarily help improve the
>relevance ranking of a URI.
>
>So in a nutshell:
>
>1. References to other entities are good; add them if you can with
>moderate effort.
>2. Make sure you populate all required fields for an entitity *on that
>very same page*.
>
>As for the issue of redundancy: In practice, it is no big deal if this
>means that data in JSON-LD will be redundant within your site. We once
>did a study that showed that even very detailed data markup would at
>most add a few percent to the total size of the HTML, and that effect
>will be even lower for page speed as HTTP 1.1 and up use compression if
>configured correctly.
>
>Best wishes
>Martin Hepp
>
>
>-----------------------------------
>martin hepp  http://www.heppnetz.de
>mhepp@computer.org          @mfhepp
>
>
>
>
>> On 10 Jul 2018, at 16:29, Niels <nielsl@xs4all.nl> wrote:
>> 
>> I have unfortunatly had no further replies on this question so far.
>> It makes perfect sense that indexers dont want to go into any detail
>about which pages they fetch and what leads up to that. What I am
>hoping to learn is not what indexers do in practice, but what I as a
>web developer might do to make sure it is possible for them to fetch
>the information.
>> 
>> In the past I have expressed some frustration over this matter,
>because I struggle to incorporate the vocab with the manner in which I
>intended to present data on my websites. The Schema.org vocab, like
>most vocabs, is meant to descrive context on a per document basis. It
>can reference other documents, and it can go into details on a
>document, but the basic entry point is a single document.
>> This makes perfect sense, but in my persobal opinion has not been
>applied in that manner when considering websites.
>> 
>> When we look at schema vocab being applied to websites, it considers
>each webpage to be a document.
>> In my view, that approach introduces limitations. A webpage is a page
>out of a larger document. I would argue that not a webpage is a
>document, but a website is a document.
>> 
>> The limitations in treating webpages as units of information is that
>fir some (many) websites, certain pages cannot provide a detailed set
>of data on their own. Contact details are on another web page, a
>description of thecauthor is on yet another webpage, the tickets for an
>event are again sold (and thus their details provided on) an other
>webpage.
>> 
>> I would like to be able to mark up information across my website,
>without fear that if I do not provide the ticket sales information on
>the same page as the event information, it might not get indexed
>properly.
>> Using validators such as the markup validator of Google, I get
>countless warnings that certain properties are missing, but that it is
>advised to include them. In fact I do provise that information, just
>not on this very webpage. I would like a solid method of stating that
>more data on this itemscope is provided on an other page. I can eaven
>clearly point to the ID of the element which provides that data.
>> 
>> So again, I hope someone can clear up for me if it is indeed
>considered good use of microdata and schema vocab to itemref a url with
>ID of an element on an other page. I also ask that this question and
>the aswer to it are provided to the public, as I think I am not the
>only obe uncertain about the manner in which relationships between
>webpages should be indicated.
>> 
>> This is mainly intended for binding weboages of the same website, on
>one single domain. I argue to consider a web domain as the scope of a
>set of data, rather than a webpage.
>> 
>> Kind regards,
>> Niels Lancel
>> 
>> On July 9, 2018 8:56:06 AM GMT+02:00, Webfeet <schema@webfeet.org>
>wrote:
>> 
>> On 08/07/18 15:10, Niels wrote:
>> 
>> Say I have an overview page of upcoming events. As soon as I start 
>> describing an even will take place next sunday, indexers such as 
>> google advise me to also include an offer of the ticket, a location,
>a 
>> description etc. etc.. It nakes perfect sense, but that information
>as 
>> already provided on the events own details oage, which is a seperate 
>> webpage on the same website.
>> 
>> 
>> You mean, instead of having (using the JSON-LD format):
>> 
>> ... "offers" :
>> {
>> "@type" : "Offer",
>> "price" : "..." ,
>> "availability" : "...",
>> "url" : "https://some.otherpage.com/events/thisevent.html"
>> }
>> 
>> you have:
>> 
>> ... "offers" :
>"https://some.otherpage.com/events/thisevent.html#offers"
>> 
>> As, as per https://schema.org/docs/datamodel.html#conformance, you
>can 
>> give text or a url. The URL has to be of the page holding the extra 
>> details and the specific markup on that page will need an "@id"
>matching 
>> the URL... so the "thisevent.html" page would need (?):
>> 
>> ... "offers" :
>> {
>> "@type" : "Offer",
>> "@id" : "https://some.otherpage.com/events/thisevent.html#offers"
>> "price" : "..." ,
>> "availability" : "...",
>> }
>> 
>> That, as I understand it, is the theory (from my reading and playing,
>so 
>> guesswork rather than anything definitive....)
>> 
>> What happens in practice is a different territory, you are asking
>what 
>> individual search engine do (might do...) and I've found a clear 
>> reticence when people answer those questions - they are not going to 
>> give away details of the internal workings of "their" search engine.
>> 
>> I have noticed that Google's SDTT doesn't necessarily accept a URL
>(it 
>> seems to in some places, not in others). Google's Structured Data 
>> documentation also says one event per page (which may help if people 
>> don't add a "@id", but this is more guesswork). And if you use links
>you 
>> don't get to see what connections are being made in the background...
>> 
>> Now, if you are writing something that looks for an event detail and 
>> then wants to collect "further details" from linked pages, you are
>the 
>> search engine and you try to follow these links...
>> 
>> SameAs can reference whole pages, but can not reference a certain 
>> element on a certain page. Itemref can reference a certain element on
>
>> this page, but can not reference anything on other pages. I would
>like 
>> to use sonething which allows me to point out a soesific eoement on
>an 
>> other page.
>> 
>> 
>> As I read "sameAs", it's there to give an incontrovertible label to
>your 
>> markup. Something that defines unambiguously what your markup is
>about. 
>> Don't know if there's an implication a crawler should follow the
>link. 
>> SameAs takes a URL though, so it should be able to cope with a link
>to 
>> an "#element" (??)
>> 
>> Am I asking for something useful or am misunderstanding the use of 
>> microdata/schema vocab?
>> 
>> 
>> Yes, useful :-)
>> 
>> My suspicion is that this is all taken as "rather self evident", that
>it 
>> is there but never even made it into the FAQ...
>> 
>> Kind regards,
>> Niels Lancel
>> 
>> 
>> Let me know if you get any other offlist answers... we are probably 
>> trying to cope with similar things :-)
>> 
>> Webfeet
>> "roleName":"Interested Observer" ...
>>
Received on Tuesday, 10 July 2018 15:31:45 UTC