- From: Aaron Bradley <aaranged@gmail.com>
- Date: Fri, 7 Aug 2015 11:23:12 -0700
- To: "schema.org Mailing List" <public-schemaorg@w3.org>
- Message-ID: <CAMbipBubrL_ZMya4Gm-Mj5bi0Vk+e5AqT-3403Y=8B++amrNaw@mail.gmail.com>
This issue is relevant to two recently-opened issues and a Google+ discussion, which is why I'm posting to the list rather than adding a comment to one or the other on Github. schema.org URLs should resolve the HTTPS form https://github.com/schemaorg/schemaorg/issues/716 Extension pages should give canonical URL explicitly https://github.com/schemaorg/schemaorg/issues/717 Ganymede goes live as schema.org v 2.1 [see comments] https://plus.google.com/106943062990152739506/posts/eT7SjF22rhy There seems to be a nascent consensus forming around "standard" methods of canonicalizing schema.org URLs: - 301 redirect from the HTTP form of a URL to the HTTPS form - 301 redirect from www.schema.org to schema.org Note that this consensus is around the desired end state, and that - as Dan Brickley noted in a comment on 716 - there are challenges that need be overcome before these measures can be rolled out. The third "standard" canonicalization method - in this case specifically for search engines - is: - Declare the preferred URL using rel="canonical" This protocol has long been supported by all the sponsors [1,2,3], and the thrust of the declaration is fairly straightforward: "hey search engine, thanks for visiting this URL, but the one you really should be indexing and serving up in response to queries is value provided by the href attribute of rel='canonical'." As per 717 and Richard Wallis' work on that, we now have a statement about the "Canonical URL" on types and properties that reside on a reviewed/hosted extension. E.g. on: https://bib.schema.org/Thesis We have: Canonical URL: http://schema.org/Thesis On one hand this is problematic only insofar as this use of "canonical" is at variance with the use of "canonical" in rel="canonical". That is, for that declaration the canonical would be exactly the opposite as the one provided in the on-page text. If one had to direct the search engines which URL to index it would absolutely be https://bib.schema.org/Thesis because that's where the information about https://bib.schema.org/Thesis resides. In the event that the search engines were directed to use http://schema.org/Thesis preferentially over https://bib.schema.org/Thesis and actually did so, none of the descriptions, no information aside from the term would ever be indexed for an extension, as the only data on page is a stub. I.e. the only data http://schema.org/Thesis provides is a link to the place to where that type is fully described. All fine and well if the variance between the two uses is merely a matter of semantics (ha) - that is, if no actual rel="canonical" value were provided for either page, allowing both to be indexed - although just to reduce confusion it would be nice if the descriptive and declarative terms were used in the same way. On the other hand, I don't actually understand what "Canonical URL" means on those pages. This is the URL that ... which human or data consumer should use? If I wanted to find out information (either as a human or data consumer) about bib.schema.org/Thesis I'm not going to find it at schema.org/Thesis, as the latter only points to the former, which seems circular. Color me confused, and I appreciate any elucidation on what "Canonical URL" is meant to convey on these extension pages (sorry I was unable to come to clear understanding of its employment despite your detailed comment on the subject Dan). While, indeed it might turn out that this is correct from a data handling point of view, it could readily be misconstrued by webmasters (who are mostly accustomed to it's employment in the rel="canonical" sense) that they should be using the schema.org URL, rather than the bib.schema.org URL, in their declarations. As per the example on the bib page, maybe this isn't a misstep at all, but exactly how extension URLs are supposed to work? <div itemscope itemtype="http://schema.org/Thesis"> Again, all fine and well if that's the case except, again, we have this use of "canonical" to mean "use this URL if you're marking up code", contrary to the HTML declaration that would, applied here, mean "use this URL if you're a search engine", as that's very much wrong for reasons described above. In this scenario it might be worthwhile to come up with an alternative to the on-page term "Canonical URL" to avoid confusion. I'll note too, that there's the possibility a type or property could duplicated by different extensions unless they're kept locked down. https://furniture.schema.org/chair https://meetings.schema.org/chair And even if locking-down extension terms so that schema.org/[term] can only refer to its use by the core or a single extension, as Dan has noted this possibility is further complicated by external extensions. FWIW in my view of a perfect world, bib.schema.org/Thesis would be the canonical in both the colloquial and declarative sense, and it would be possible to declare it as such: <div itemscope itemtype="http://bib.schema.org/Thesis"> However, I suspect it's precisely the impossibility of that itemtype declaration that's the sticking point, especially viewed in context of a subsequently declared property that resides in the core vocabulary. <span itemprop="name">A meandering dissertation on the use of "canonical" at schema.org</span> Thanks for any feedback. [1] https://support.google.com/webmasters/answer/139066?hl=en [2] https://yandex.com/support/webmaster/controlling-robot/html.xml#canonical [3] https://blogs.bing.com/webmaster/2011/10/06/managing-redirects-301s-302s-and-canonicals/ - Bing's respect for rel="canonical" currently extends to Yahoo! as Bing fuels Yahoo results (Yes, "canonicalization" and ""canonicalization" *do* generate spelling error flags.:).
Received on Friday, 7 August 2015 18:23:41 UTC