Schema.org considered helpful from Harry Halpin on 2011-06-16 (public-lod@w3.org from June 2011)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Thu, 16 Jun 2011 17:09:18 -0400
To: Linked Data community <public-lod@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <BANLkTi=dzobzm0FsfwkHD2nn9cfYf3xw-Q@mail.gmail.com>
I've been watching the community response to schema.org for the last
bit of time. Overall, I think we should clarify why people are upset.
First, there should be no reason to be upset that the major search
engines went off and created their own vocabularies. According to the
argument of decentralized extensibility, schema.org *exactly* what
Google/Yahoo!/Microsoft are supposed to be doing. It's a
straightfoward site that clearly for how the average Web developer can
use structured data in markup to solve real-world use-cases and
provides examples.  That's the entire vision of the Semantic Web, let
a thousand ontologies bloom with no central control.

The reason people are upset are that they didn't use RDFa, but instead
used microdata. One *cannot* argue that Google is ignoring open
standards. RDFa and microdata are *both* Last Call W3C Working Drafts
now. RDFa 1.0 is a spec but only for XHTML 1.0, which is not what most
of the Web uses. Microdata does have RDF parsing bugs, but again, most
developers outside the Semantic Web probably don't care - they want
JSON anyways.

Form what I understand from tevents  where Rich Snippets team has
presented is that RDFa is simply too complicated for ordinary web
developers to use. Google has been deploying Rich Snippets for two
years, claim to have user-studies  and have experience with a large
user-base. This user-driven feedback should be taken on board by both
relevant WGs obviously, HTML and RDFa. Designing technology without
user-feedback leads to odd results (for proof, see many of the fun and
exiciting "httpRange-14" discussions). Which is also why many
practical developers do not use the technology.

But realistically, it's not the RDFa WG's job to do user-studies and
build compelling user-experiences in products. They are only a few
people. Why has the *hundreds* of people in the Semantic Web community
not done such work?

The fact of the matter is that the Semantic Web academic community has
had their priorities skewed to the wrong direction. Had folks been
spending time doing usability testing and focussing on user-feedback
on common problems (such as the rather obvious "vocabulary hosting"
problem) rather than focussing on things with little to no support
with the world outside academia, then we probably would not be in the
situation we are in today. Today, major companies such as Microsoft
(oData) and Google (microdata) are jumping on the "open data"
bandwagon but finding the RDF stack unacceptable. Some of it may be a
"not invented here" syndrome, but as anyone who has actually looked at
RDF/XML can tell you, some of it is hard-to-deny technical reasoning
by companies that have decided that "open data" is a great market but
do not agree with the technical choices made by the  Semantic Web
stack.

This is not to say good things can't come out of the academic
community - the *internet* came out of the academic community. But
seriously, at some point (think of the role of Netscape in getting the
Web going with the magic of images) commercial companies enter the
game. We should be happy now search engines are seeing value in
structured data on the Web.

I would suggest the Semantic Web community take on-board the
"microdata" challenge in two different ways. First of all, start
focussing on user-studies and user experience (not just visual
interfaces, the Semantic Web has more than its share of user-hostile
visual interfaces). It's harder to publish academic papers on these
topics but possible (see SIGCHI), and would help a lot with actual
deployment. Second, we should start focussing more on actual empirical
data-driven feedback, both on what parts of RDF are being used and
common mistakes. With indexes such as the Billion Triple Challenge and
Sindice's index, we can actually do that with the Semantic Web. Third,
why not actually try to get RDF - or "open data more broadly" into the
browser in usable manner? Tabulator may be a step in the right
direction, but the user experience needs work. Fourth, why not start a
company and try to deliver products to actual end-users and give that
feedback to the wider community and W3C WGs (and if you already work
for an actual SemWeb company, please send your feedback from user
studies to the WG before Last Call)? I believe the Semantic Web
research community - which still has tons of funding and lots of
passion - can make the Web better.

Schema.org is not a threat. It's an opportunity to step up. Good luck everyone!

           cheers,
              harry

P.S.: Note this opinions are purely personal and held as an individual.
Received on Thursday, 16 June 2011 21:09:46 UTC