Re: Refering to an element (and its markup) on an other webpage (offlist)

Hi can't speak for the search engines, but a few general hints:

1. The main unit of interest for search engines are Web resources identified via URIs (approximately: "Web pages"). These are the results listed in organic search results, those are clicked, etc. - the entire eco-system is centered on URIs / "pages". 

2. As a consequence, most of the consumption of structured data is used to improve the selection, ranking, and presentation (e.g. with Rich Snippets) of URIs for queries.

3. A few new features of search engines (like Knowledge Graph panels) are based on the notion of "entities" and try to combine information from multiple URIs/pages and other sources, like Wikidata and Wikipedia. But this is only a fraction of the search engine's functionality

Since the majority of the ecosystem is, at least historically, page-centric, content from external pages is difficult to consider for the the selection, ranking, and presentation of a single URI. The problems range from the organization of the respective data structures to questions of trust and authority (e.g. when integrating product reviews from an external site in the presentation of search engine results).

This might change slowly, so it will make sense to include references to external entities, but I assume this will primarily help improve the relevance ranking of a URI.

So in a nutshell:

1. References to other entities are good; add them if you can with moderate effort.
2. Make sure you populate all required fields for an entitity *on that very same page*.

As for the issue of redundancy: In practice, it is no big deal if this means that data in JSON-LD will be redundant within your site. We once did a study that showed that even very detailed data markup would at most add a few percent to the total size of the HTML, and that effect will be even lower for page speed as HTTP 1.1 and up use compression if configured correctly.

Best wishes
Martin Hepp


-----------------------------------
martin hepp  http://www.heppnetz.de
mhepp@computer.org          @mfhepp




> On 10 Jul 2018, at 16:29, Niels <nielsl@xs4all.nl> wrote:
> 
> I have unfortunatly had no further replies on this question so far.
> It makes perfect sense that indexers dont want to go into any detail about which pages they fetch and what leads up to that. What I am hoping to learn is not what indexers do in practice, but what I as a web developer might do to make sure it is possible for them to fetch the information.
> 
> In the past I have expressed some frustration over this matter, because I struggle to incorporate the vocab with the manner in which I intended to present data on my websites. The Schema.org vocab, like most vocabs, is meant to descrive context on a per document basis. It can reference other documents, and it can go into details on a document, but the basic entry point is a single document.
> This makes perfect sense, but in my persobal opinion has not been applied in that manner when considering websites.
> 
> When we look at schema vocab being applied to websites, it considers each webpage to be a document.
> In my view, that approach introduces limitations. A webpage is a page out of a larger document. I would argue that not a webpage is a document, but a website is a document.
> 
> The limitations in treating webpages as units of information is that fir some (many) websites, certain pages cannot provide a detailed set of data on their own. Contact details are on another web page, a description of thecauthor is on yet another webpage, the tickets for an event are again sold (and thus their details provided on) an other webpage.
> 
> I would like to be able to mark up information across my website, without fear that if I do not provide the ticket sales information on the same page as the event information, it might not get indexed properly.
> Using validators such as the markup validator of Google, I get countless warnings that certain properties are missing, but that it is advised to include them. In fact I do provise that information, just not on this very webpage. I would like a solid method of stating that more data on this itemscope is provided on an other page. I can eaven clearly point to the ID of the element which provides that data.
> 
> So again, I hope someone can clear up for me if it is indeed considered good use of microdata and schema vocab to itemref a url with ID of an element on an other page. I also ask that this question and the aswer to it are provided to the public, as I think I am not the only obe uncertain about the manner in which relationships between webpages should be indicated.
> 
> This is mainly intended for binding weboages of the same website, on one single domain. I argue to consider a web domain as the scope of a set of data, rather than a webpage.
> 
> Kind regards,
> Niels Lancel
> 
> On July 9, 2018 8:56:06 AM GMT+02:00, Webfeet <schema@webfeet.org> wrote:
> 
> On 08/07/18 15:10, Niels wrote:
> 
> Say I have an overview page of upcoming events. As soon as I start 
> describing an even will take place next sunday, indexers such as 
> google advise me to also include an offer of the ticket, a location, a 
> description etc. etc.. It nakes perfect sense, but that information as 
> already provided on the events own details oage, which is a seperate 
> webpage on the same website.
> 
> 
> You mean, instead of having (using the JSON-LD format):
> 
> ... "offers" :
> {
> "@type" : "Offer",
> "price" : "..." ,
> "availability" : "...",
> "url" : "https://some.otherpage.com/events/thisevent.html"
> }
> 
> you have:
> 
> ... "offers" : "https://some.otherpage.com/events/thisevent.html#offers"
> 
> As, as per https://schema.org/docs/datamodel.html#conformance, you can 
> give text or a url. The URL has to be of the page holding the extra 
> details and the specific markup on that page will need an "@id" matching 
> the URL... so the "thisevent.html" page would need (?):
> 
> ... "offers" :
> {
> "@type" : "Offer",
> "@id" : "https://some.otherpage.com/events/thisevent.html#offers"
> "price" : "..." ,
> "availability" : "...",
> }
> 
> That, as I understand it, is the theory (from my reading and playing, so 
> guesswork rather than anything definitive....)
> 
> What happens in practice is a different territory, you are asking what 
> individual search engine do (might do...) and I've found a clear 
> reticence when people answer those questions - they are not going to 
> give away details of the internal workings of "their" search engine.
> 
> I have noticed that Google's SDTT doesn't necessarily accept a URL (it 
> seems to in some places, not in others). Google's Structured Data 
> documentation also says one event per page (which may help if people 
> don't add a "@id", but this is more guesswork). And if you use links you 
> don't get to see what connections are being made in the background...
> 
> Now, if you are writing something that looks for an event detail and 
> then wants to collect "further details" from linked pages, you are the 
> search engine and you try to follow these links...
> 
> SameAs can reference whole pages, but can not reference a certain 
> element on a certain page. Itemref can reference a certain element on 
> this page, but can not reference anything on other pages. I would like 
> to use sonething which allows me to point out a soesific eoement on an 
> other page.
> 
> 
> As I read "sameAs", it's there to give an incontrovertible label to your 
> markup. Something that defines unambiguously what your markup is about. 
> Don't know if there's an implication a crawler should follow the link. 
> SameAs takes a URL though, so it should be able to cope with a link to 
> an "#element" (??)
> 
> Am I asking for something useful or am misunderstanding the use of 
> microdata/schema vocab?
> 
> 
> Yes, useful :-)
> 
> My suspicion is that this is all taken as "rather self evident", that it 
> is there but never even made it into the FAQ...
> 
> Kind regards,
> Niels Lancel
> 
> 
> Let me know if you get any other offlist answers... we are probably 
> trying to cope with similar things :-)
> 
> Webfeet
> "roleName":"Interested Observer" ...
> 

Received on Tuesday, 10 July 2018 15:05:05 UTC