Re: Canonicalization of schema.org URLs

Thanks for this. A quickish reply for now.

I think the double confusion in this topic is (a) the status of
schema.org via https:// (b) the appeal of using foo.schema.org vs
simplicity of not doing so.

On that first theme - we have never really announced its availability
at that address, despite it having worked for some time. The http://
-version by which point was deeply embedded in everyday use across the
Web.

Meanwhile, https:// is on the rise, and if you search for schema.org
at Google, Google will helpfully send you to the version that was
never announced, i.e. https://schema.org/

At some point I hope soon we'll be able to say with more confidence
that the https: version of the site is in fact the canonical (in every
sense) version. Unfortunately for now it is best considered a kind of
preview. Schema.org is special in a few ways: it's URLs are embedded
as data in software and documents across the Web. It is also an
opensource AppEngine project whose current ability to be served as
https:// while using a raw domain (i.e. schema.org rather than
www.schema.org) is taking advantage of special AppEngine facilities
that are not currently generally available. According to
http://stackoverflow.com/questions/4910683/is-it-at-all-possible-to-use-google-app-engine-with-a-naked-domain
this is in the pipeline. So I think the direction is clear, but we
haven't yet taken the step to declare the https: versions canonical.
When we do, rel="canonical" will be fully applicable.

Regarding "I'll note too, that there's the possibility a type or
property could duplicated by different extensions unless they're kept
locked down." - there are many many possibilities for errors in
schema.org workflow; this is just one. The project's intent is clearly
that for each name ("Bank", "Agent" or whatever) there is really only
one vocabulary term within schema.org (core + hosted extensions).
However different extensions might all *mention* that term, e.g. to
add related vocabulary. We have the notion that each term is tagged as
being (at this moment) "part of" either the core or an extension.

Therefore behind the scenes, we can run scripts like this (to jump
into excessive detail)

 find data/ext/ -name *.rdfa -exec rdfa {} \; | grep isPartOf

And see declarations like those copied below(*). We also have some
basic checks in the codebase
(https://github.com/schemaorg/schemaorg/blob/sdo-phobos/api.py#L701
for the curious, although eventually a unit test) to make sure that
each term is claimed by only one area ("layer") of schema.org.

On the question of whether publishers should put
http://bib.schema.org/Thesis vs http://schema.org/Thesis in their
markup, we should be guided by the general principles behind
schema.org: whenever there is doubt about who to make work for, make
work for the search engines rather than publishers. Search engines
(all of them!) regularly solve countless problems massively more
complex than figuring out whether someone has written
"health.schema.org" when they mean "healthinsurance.schema.org".
Schema.org was created in part because webmasters and publishers were
suffering from the complexity of the 100+ possible namespaces
available to them in RDF. We don't want to rebuild that chaos within
schema.org, or to leave people struggling to remember whether e.g.
nutrition-related markup was part of a health extension, a food
packaging extension, a restaurants/menus extension, or a "quantified
self" extension. At least for hosted extensions we expect (through
this community and the steering group, supported by technical /
workflow tools) to have some level of over-arching coordination, even
as things open up. There is much more to discuss on the "when do we
use subdomains question" than I want to get into for today,
particularly around JSON-LD and external extensions, but I'm glad
we've got to a milestone that makes such debate timely.

Re "FWIW in my view of a perfect world, bib.schema.org/Thesis would be
the canonical in both the colloquial and declarative sense" --- for a
purely bibliographic description, my intution also pulls me in that
direction. However if we have a set of related and complementary
extensions (see nutrition examples above; for bib, consider also
possible extensions for museums, archives, ...), then a single
description might want to use several extensions simultaneously. At
which point the artificial task of remembering which extension came
from where, and how to write all that in Microdata/RDFa, seems more of
a burden than a benefit. This line of thought pulls me towards
thinking of the extension status of a term as a kind of "tag" that
might change over time, and towards using schema.org/Thesis even if it
is declared in bib.schema.org. From this perspective, these extension
tags serve to organize a large family of schemas (for navigation,
filtering etc.) without overloading the typing mechanism as we did
with http://schema.org/MedicalEntity. And then we get to JSON-LD, ...
[for another day]

Thanks for the discussion.

Dan




(*)

<http://schema.org/modelDate> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/emissionsCO2> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/speed> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/tongueWeight> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/engineDisplacement> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/Motorcycle> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/specialUsage> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/BusOrCoach> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/accelerationTime> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/seatingCapacity> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/MotorizedBicycle> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/enginePower> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/roofLoad> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/trailerWeight> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/payload> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/bodyType> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/acrissCode> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/meetsEmissionStandard> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/fuelCapacity> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/torque> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/weightTotal> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/wheelbase> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/engineType> <http://schema.org/isPartOf>
<http://auto.schema.org> .
<http://schema.org/Newspaper> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/workTranslation> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/Atlas> <http://schema.org/isPartOf> <http://bib.schema.org> .
<http://schema.org/translationOfWork> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/readBy> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/inSupportOf> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/abridged> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/Audiobook> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/Chapter> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/Collection> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/Thesis> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/publishedBy> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/colorist> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/GraphicNovel> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/inker> <http://schema.org/isPartOf> <http://bib.schema.org> .
<http://schema.org/ComicSeries> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/artist> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/CoverArt> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/ComicCoverArt> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/ComicIssue> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/variantCover> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/ComicStory> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/publisherImprint> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/penciler> <http://schema.org/isPartOf>
<http://bib.schema.org> .
<http://schema.org/letterer> <http://schema.org/isPartOf>
<http://bib.schema.org> .

Received on Friday, 7 August 2015 19:08:43 UTC