Re: The Complexity Argument from Manu Sporny on 2009-09-22 (public-html@w3.org from September 2009)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Tue, 22 Sep 2009 01:53:08 -0400
CC: HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <4AB86644.5090805@digitalbazaar.com>
Ian Hickson wrote:
> On Fri, 18 Sep 2009, Manu Sporny wrote:
>> RDFa is more complex than Microformats and Microdata. It is more complex
>> because the set of use cases are more complex. Follow-your-nose,
>> vocabulary validation, data typing, and inferencing are just a few of
>> the design goals for RDFa, based on the requirements in the use cases.
> 
> What are these use cases? I took into account every use case that was 
> mentioned anywhere I could find, for microdata, and microdata handles 
> every one of those that RDFa handles -- the only ones I didn't solve are 
> also not solved by RDFa.

I'm going to pick one and be terse because I don't have the time to make
an exhaustive list right now. Microdata doesn't support this use case:

http://rdfa.info/wiki/rdfa-use-cases#Publishing_an_RDF_Vocabulary_and_Validating_Usage

Specifically, it doesn't support datatyping... you can't do something
like this in Microdata:

<span xmlns:measure="http://example.org/measure#"
      about="#patient" property="measure:weight"
      datatype="measure:kilograms">72</span>

which would generate this triple:

<#patient> measure:weight "72"^^measure:kilograms

There are two important pieces of functionality here:

1. The author can specify a datatype for the object literal because they
   want to be very specific about the data. This is important in most
   scientific fields, such as medicine. As an anesthesiologist, am I
   setting up the operation for a 72 pound patient, or a 159 pound
   patient?
2. Tools can verify that the data is valid by validating it
   against a vocabulary specification. For example, if the vocabulary
   restricts all "measure:weight" values to be in "measure:pounds",
   then the validator can provide vocabulary usage errors to
   authors that don't provide a datatype for the measure as well
   as those that use the wrong datatype for the measure. Applications
   consuming the data may also flag the data as erroneous since they
   don't validate against a single vocabulary specification.

Unless I'm missing something, Microdata doesn't support #1 or #2. So,
there are use cases that are solved by RDFa that are not solved by
Microdata. Did I make a mistake in understanding datatyping in Microdata?

>> It's fairly clear that RDFa is more complex than Microformats and
>> Microdata, and I would say that is true because it solves a larger set
>> of problems.
> 
> What problems? Could you list the concrete user problems that RDFa 
> addresses that make it more complex, and which microdata doesn't address?

I've outlined one such problem above, but there are others.

>> To look at this another way, one could claim that HTML5, Javascript,
>> canvas, or SVG is too complex for regular web authors.
> 
> Yeah, they are. We've spent a huge part of the effort on HTML5 trying to 
> simplify as much as we could while still being compatible with the 
> trillion or so deployed HTML pages. If you have any suggestions for ways 
> we could further simplify the language, please let me know or file a bug.

I'm going to focus on writing specifications and proposing them to this
working group as that seems to be the most effective way of
communicating concrete ideas to the HTML5 specification. There are a
number of things in HTML5 that I disagree with and the best way to
correct those is to provide alternate spec text and have the HTML WG
form consensus around the best path forward.

>> If you think RDFa is too complex, please propose an alternative or
>> propose alternatives to the way RDFa works that are backwards compatible
>> with XHTML+RDFa 1.0. We have some fairly large field deployments of RDFa
>> and the number is growing, not shrinking.
> 
> What is your biggest deployment, in terms of volume of content processed 
> or number of users affected?

I honestly don't know exact numbers, as the data changes daily and I
tend to not focus too much on keeping score. Here's what I know: if by
"biggest" you mean "biggest company", then Google, Yahoo!, or the UK
Government. If by "biggest" you mean "content processed"... I don't know
if Google or Yahoo are processing RDFa from the pages that they're
indexing yet and they probably wouldn't tell us if they were. If by
"biggest" you mean "amount of content on a single website that is marked
up as RDFa", then Wikipedia (via DBPedia) has over 274,000,000+ triples
generated via RDFa. If by "biggest" you mean "number of users affected
by a single software application", then that will probably be the Drupal
7 release (which is in code-freeze for the next release and which has
RDFa support built-in) -- it will affect 200,000+ websites in the next 9
months.

Why do you ask?

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: The Pirate Bay and Building an Equitable Culture
http://blog.digitalbazaar.com/2009/08/30/equitable-culture/
Received on Tuesday, 22 September 2009 05:53:52 UTC