Re: Schema.org considered helpful from adasal on 2011-06-17 (public-lod@w3.org from June 2011)

From: adasal <adam.saltiel@gmail.com>
Date: Fri, 17 Jun 2011 11:22:18 +0100
To: Steve Harris <steve.harris@garlik.com>
Cc: Mischa Tuffield <mmt04r@ecs.soton.ac.uk>, Harry Halpin <hhalpin@ibiblio.org>, Linked Data community <public-lod@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <BANLkTikhrJgWgA0SFQTMH+cxCWoMYrQn9g@mail.gmail.com>
I noticed Steve's comment in this very civilised discussion without seeing
his details, and was going to confirm how much this reminds me of the way
CTO's and architect groups think.
Steve mentions an 'internal project', but I think there is a degree of
confusion about the nature of the domain we are discussing.
I think there is
1. Search engines
2. LOD
3. Enhanced HTML that may expose data such as to become (semantic) LOD
4. Internal - using semantics
5. Internal that consumes/produces LOD (i.e. a bigger internal application
than implied by 2. or 3.)

2. and 3. are considered more or less in the absence of any business use
case.
But to be realistic we should be looking at 1. and 5. as business.
To understand 1. we have to sketch in likely business benefits to those
businesses - our suppliers. A notable point about 1. is those businesses
produce extra data out of the data our usage gives them, which, potentially,
could be very interesting and most certainly is not LOD. A further point is
that the manner in which they find and serve the results is up to them, and
we, as users, must evaluate what we are consuming. For instance, relevant to
semantics, search results are very fragmented and fairly arbitrary compared
to an hypothetical rational query response along the axis of knowledge. Such
a hypothetical query is entirely different in kind to serving links to web
pages (that is web pages owned by people or organisations who may want to be
searchable).
In terms of LOD or open semantic usage that is two 'marks' against search
engines: the first generating data that only they are in a position to do
which we have a natural interest in (and could, theoretically, be generated
by a neutral party), the second that the results ever will not be really
semantically viable. But we want them to be in the business they are in,
that of reliably returning links to pages, don't we? Promotion of a semantic
vision and supplying a semantic search engine is essentially different to
that business.
5. above is different again. This comes back to the CTO point just made by
Steve. It is very difficult to make a business case for such a project.
I tried to do so when in Serco in relation to their BusinessLink contract
with HMRC. My approach we incomplete but here are some of things I would
have had to consider.
(HMRC and Serco were acting on the Varney report to create a single web
presence for the government's business oriented concerns with business and
the citizen.)
My (and my colleagues') idea was, roughly, to semantically enhance the core
engine to make it into a 'semantic switch board' that would direct traffic
and exchange data according to semantic criteria, either out or onto queues.
Bear with me or skip to final point below if you prefer -:)
Some Benefits
1. Greatly simplify the site design
2. Prevent user evaporation as they are passed to another site
3. Solve single sign on issues
4. Control and balance traffic to different disparate services
5. Build more targeted, intelligible and accessible services from the solid
foundation of a semantic core
6. Possibility of creating framework that third party suppliers could plug
their services into, further automating government business transactions
...
Some Concerns
.... costs
1. Queues would have to be picked up by third party internal suppliers at
the other end - there would be appreciable cost in this
2. Some third parties had implemented their own queues, in their own format,
so multiple formats to support - who would bear the cost here
3. Reworking of core engine - expensive to do properly, and nothing short of
this would work here
.... benefits?
4. Is government in the business of facilitating business over and above
necessary transactions with business?
5. How to demonstrate that cost per transaction is reduced significantly

This final point is obviously the deal breaker. Short of being able to
demonstrate this there could be no buy in.
How could this be demonstrated?
I don't think this is a sort of chicken and egg, were there other similar
schemes benefits could be extrapolated.
I think this is more an aligning of the planets. It is quite possible that
in five or so years this will be undertaken. The existing infrastructure
cannot last for ever, and will begin to look unwieldy soon enough. My
thoughts are a consortia (or just concerted pressure + ideas) of small
suppliers who might themselves benefit from having a slice of this as part
of their portfolio.

>From this it can be seen how difficult a propitious alignment is. It could
never come about!
Steve's point remains: there has to be a business case.

(BTW there was a back story about the funding for this that got us into the
position of making this proposal in the first place which I wont go into
here.)

Best,

Adam

On 17 June 2011 08:52, Steve Harris <steve.harris@garlik.com> wrote:

> I'm sure that some of these points were relevant at some level, but I
> suspect that's not the key reason.
>
> At some point, the team working on the internal project would have to go to
> the divisional CTO and/or CIO in charge of operations and ask permission to
> deploy the code on the production systems. They don't give a damn how
> interesting the technology is, just want to know how much it's going to cost
> in bps of bandwidth, bytes of storage, and microseconds of CPU per page. The
> answer for RDFa is probably an order of magnitude higher than the
> schema.org format, and could equate to tens of millions of dollars per
> year of extra cost, and will show little to no extra revenue (schema.orgv's RDFa), even in the medium term. No chance.
>
> - Steve
>
> On 2011-06-17, at 01:02, Mischa Tuffield wrote:
>
> Hello,
>
> *excuse a little top-posting before comments coming inline ...
>
> Great email Harry, I agree with your sentiment that schema.org shouldn't
> be perceived as a massive thread to the SW community. If anything I find and
> welcome the move, surely it will widen the audience of web-developers
> interested in creating and authoring structure data to the web? A lot of
> people write code, and work for companies who are heavily reliant on
> pleasing Search Engines - SEO is big business. Let users get on with
> building stuff with microdata/schema.org, and who knows they might even
> come round to using the various W3C SW specs when they find their needs
> change, when they find they want to interoperate with data whose primary
> focus isn't for human consumption or SEO.
>
> RDF satisfies more than one use-case, it is more than a SEO tool.
> Personally, I make daily use of RDF, http, SPARQL (to name a few) within the
> software platform we have built at Garlik (note that I have been too lazy to
> use other email address) and it makes sense to us as a business, as we make
> good use of developing software without being constrained by a database
> schema in a relational database and we can pull in data arbitrarily. In
> summary, RDF via  GoodRelations in RDFa has shown that the work has made an
> impact in the world of Search Engines, RDF/SPARQL is being used to power
> applications in a number of companies big and small, RDF is being outputted
> by major commercial sales houses, non-computer scientists are using it to
> represent their scientific data, governments are using in the shape of
> linked data/SPARQL, this is all good stuff ... more than one use-case -
> fundamentally engrained with the notion of interoperability and the
> standardised representation of data (awesome stuff!).
>
> I am not trying to have a dig here about microdata or schema.org, or the
> technology stack which builds on the aforementioned, I simply don't know
> enough about it to comment. I do know that the SW technology stack is
> growing strong though, and it is an open technology stack - being an
> optimist I feel that open stuff will prevail.
>
> <snip itemtype="http://example.com/Annotation"/>
> <!-- hehe -->
> *
> *
> On 16 Jun 2011, at 22:09, Harry Halpin wrote:
>
> I've been watching the community response to schema.org for the last
> bit of time. Overall, I think we should clarify why people are upset.
> First, there should be no reason to be upset that the major search
> engines went off and created their own vocabularies. According to the
> argument of decentralized extensibility, schema.org *exactly* what
> Google/Yahoo!/Microsoft are supposed to be doing. It's a
> straightfoward site that clearly for how the average Web developer can
> use structured data in markup to solve real-world use-cases and
> provides examples.  That's the entire vision of the Semantic Web, let
> a thousand ontologies bloom with no central control.
>
>
> Indeed, I do feel that schema.org has been very explicit about how people
> with the given use-case can use their work to solve a real-world problem.
> Many people make work out of getting their employer some awesome search
> engine love. I went to a news related metadata talk (an rNews one -
> fantastic work by the way), and chatting to people from their industry I
> noticed how important it was to them. The use-case seemed to boil down to a
> standard way to annotate new stories/documents to please search engines to
> push eyeballs their way... this is great but I am convinced it is not the
> only contribution the SW tech stack has to give to the world. I recall
> someone had stats re: numbers of webpages vs numbers of rows in databases in
> the world...
>
>
> The reason people are upset are that they didn't use RDFa, but instead
> used microdata. One *cannot* argue that Google is ignoring open
> standards. RDFa and microdata are *both* Last Call W3C Working Drafts
> now. RDFa 1.0 is a spec but only for XHTML 1.0, which is not what most
> of the Web uses. Microdata does have RDF parsing bugs, but again, most
> developers outside the Semantic Web probably don't care - they want
> JSON anyways.
>
> Form what I understand from tevents  where Rich Snippets team has
> presented is that RDFa is simply too complicated for ordinary web
> developers to use. Google has been deploying Rich Snippets for two
> years, claim to have user-studies  and have experience with a large
> user-base. This user-driven feedback should be taken on board by both
> relevant WGs obviously, HTML and RDFa. Designing technology without
> user-feedback leads to odd results (for proof, see many of the fun and
> exiciting "httpRange-14" discussions). Which is also why many
> practical developers do not use the technology.
>
> But realistically, it's not the RDFa WG's job to do user-studies and
> build compelling user-experiences in products. They are only a few
> people. Why has the *hundreds* of people in the Semantic Web community
> not done such work?
>
>
> I think it is probably due to the fact that no one in the Semantic Web
> community runs a search engine!
>
>
> The fact of the matter is that the Semantic Web academic community has
> had their priorities skewed to the wrong direction. Had folks been
> spending time doing usability testing and focussing on user-feedback
> on common problems (such as the rather obvious "vocabulary hosting"
> problem) rather than focussing on things with little to no support
> with the world outside academia, then we probably would not be in the
> situation we are in today. Today, major companies such as Microsoft
> (oData) and Google (microdata) are jumping on the "open data"
> bandwagon but finding the RDF stack unacceptable. Some of it may be a
> "not invented here" syndrome, but as anyone who has actually looked at
> RDF/XML can tell you, some of it is hard-to-deny technical reasoning
> by companies that have decided that "open data" is a great market but
> do not agree with the technical choices made by the  Semantic Web
> stack.
>
>
> Here is where I am not sure I 100% agree with you. Lots of good work has
> come out of academia, user-studies are one thing, and agreed UX hasn't been
> a forte in our community - but I don't think this was the problem. I
> personally don't imagine that schema.org was designed like it is due to
> the fact that they have noticed our community bang on about that number14
> for so long. I think you hinted at what the real issue was above...
>
> A lot of the SW tech stack I follow has both in the past and at the present
> enjoyed tremendous academic support. For one, Garlik (where I work) has a
> core technology team from Southampton Uni, mostly from the AKT (when I was
> ickle [1] <-- lots of familiar faces in there) an EPSRC (UK funding thing)
> project which was set out to build SW tech, it worked well, and there are
> plenty of others out there to see too am sure.
>
> So, my disagreement goes, yes so it could be seen that none of the search
> engines have found the RDF stack acceptable (RDFa GR seems to have struck a
> good cord), but lots of other people have, i.e. not everyone is trying to
> tackle the problem of web-search. And the big search engines all have their
> priorities and none of them boil down to sharing data. Academic output
> hasn't been focused on UI and UX in the SW field, but it has lead to the
> solid, open set of standards which lots and lots of people are building on
> top of - lets not forget how much XMP there is in the world. I don't think
> it is the Search Engines using their vast usability experts to design a
> standard for representing generic data, this is not their core business,
> they built something which would suit their use-case: making it easy for
> web-developers (probably with HTML/CSS/JS/UI skills) to add in metadata to
> their pages, so that the search engines can best server their users.
>
>
> This is not to say good things can't come out of the academic
> community - the *internet* came out of the academic community. But
> seriously, at some point (think of the role of Netscape in getting the
> Web going with the magic of images) commercial companies enter the
> game. We should be happy now search engines are seeing value in
> structured data on the Web.
>
>
> Yes, and trust that our technology stack is built on solid foundations, has
> a great vision, and is being built by lots of lovely people, and companies
> have been involved for a while (he says...)
>
>
> I would suggest the Semantic Web community take on-board the
> "microdata" challenge in two different ways. First of all, start
> focussing on user-studies and user experience (not just visual
> interfaces, the Semantic Web has more than its share of user-hostile
> visual interfaces). It's harder to publish academic papers on these
> topics but possible (see SIGCHI), and would help a lot with actual
> deployment. Second, we should start focussing more on actual empirical
> data-driven feedback, both on what parts of RDF are being used and
> common mistakes. With indexes such as the Billion Triple Challenge and
> Sindice's index, we can actually do that with the Semantic Web. Third,
> why not actually try to get RDF - or "open data more broadly" into the
> browser in usable manner? Tabulator may be a step in the right
> direction, but the user experience needs work. Fourth, why not start a
> company and try to deliver products to actual end-users and give that
> feedback to the wider community and W3C WGs (and if you already work
> for an actual SemWeb company, please send your feedback from user
> studies to the WG before Last Call)? I believe the Semantic Web
> research community - which still has tons of funding and lots of
> passion - can make the Web better.
>
>
> Use-cases are the key, and am sure there are plenty of them kicking about
> as otherwise there wouldn't be so many people working so hard to ensure we
> have this open-technology stack in place.
>
> Indeed Harry you are making the Web better I know it, good on you! But as
> is the rest of the SW community, if anything I have enjoyed seeing how
> passionate people are open-standards.
>
> Good night all,
>
> Mischa
>
> P.S. All views posted here are of my own personal opinion.
>
> [1] http://www.aktors.org/people/students/
>
>
>
> Schema.org is not a threat. It's an opportunity to step up. Good luck
> everyone!
>
>           cheers,
>              harry
>
> P.S.: Note this opinions are purely personal and held as an individual.
>
>
>
> --
> Steve Harris, CTO, Garlik Limited
> 1-3 Halford Road, Richmond, TW10 6AW, UK
> +44 20 8439 8203  http://www.garlik.com/
> Registered in England and Wales 535 7233 VAT # 849 0517 11
> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>
>
Received on Friday, 17 June 2011 10:22:49 UTC