a reply to many posts from Nathan on 2011-06-17 (semantic-web@w3.org from June 2011)

From: Nathan <nathan@webr3.org>
Date: Fri, 17 Jun 2011 21:17:56 +0100
To: Semantic Web <semantic-web@w3.org>, Linked Data community <public-lod@w3.org>
Message-ID: <4DFBB674.7070308@webr3.org>
have been afc/ill for a while and after catching up today I've noticed 
there's been quite a bit going on over the last month, lots of nice 
meaty posts to the list, seems like a good chance to embrace some things 
and see what can be done moving forwards - so a few questions and comments:

schema.org / microdata:

RDFa is to RDF as Microdata is to [____]?

It seems conceivable that people may start using these common schemas 
from schema.org as data schema's behind the public interface in data 
stores - it would be nice to have a framework in there to save a few 
years of people repeating the same code and work.

If we extract the microdata from an html document what is the data 
model, how does one save it? how does one transfer it out of context as 
raw data?

OWL/RDFS is to RDF as [___] is to Microdata?

OWL/RDFS for microdata? how does one validate data, for example domain, 
range and enumerables?

microdata is rdf merged with rdfa cut down to size and given a classes 
and objects style found in many programming languages - this needs 
webized and abstracted out of html. Somewhat complimentary to Harry's 
comments, there's more to be gleaned from this than surface syntax and 
seo needs, to me at least the most interesting details are in how the 
schema's are constructed, many things to be noted and considered here - 
it's a common approach that most programmers will be familiar with - as 
above, and again, it needs webized.

microdata @itemref - doesn't this look like blank node in a surface syntax?

@itemid and @id will probably lead to confusion - if rdfa and microdata 
were both to leverage @id (in addition to html and javascript) then we'd 
have a low cost and high unification of approaches that would lend to 
the frag based many things described on one page approach that is ever 
so common. Consistency would be good in this department.

@itemprop has a clever algorithm, but will probably lead to unexpected 
functionality, it's inconsistent. eg:

   <span itemprop="genre">Science fiction</span>
   <a href="../movies/avatar-theatrical-trailer.html" 
itemprop="trailer">Trailer</a>

lot's of wasted data, consider "Bob Smith" in the following:

   <div itemscope itemtype="http://schema.org/Person">
     <a href="bob.html" itemprop="url">Bob Smith</a>
   </div>

and the use of alt, title - it's useful data that isn't utilized:

   <div itemscope>
    <img itemprop="image" src="google-logo.png" alt="Google">
   </div>

range-14:

the "ambiguous 'like'" and the "comments on a post" are two very good 
cases to be focused on, in the first case, the question of "what is 
being liked?" is very interesting, some sites have already taken a far 
more fine grained approach to this, if you look at sites like reddit 
with karma for both posts and comments you can see a clear need, and 
it's a good use case to focus on, further if one were to consider an 
http friendly read write api for those comments, then I'd argue it 
quickly becomes clear that each comment would potentially need two 
identifiers, a hash one for microdata and linking to the comment in 
context, and another one for CRUDing the comment and linking to it out 
of context - there must be an easy to create pattern between something 
like /post#comment-id and /comment/id .

the old arguments are well played out, and multiple approaches (see 
issue-57 document from JAR) are available - but the above two practical 
use cases will probably give the most long term benefit when addressed.

scruffyness and diversity:

David (Wood) recently mentioned "Neat vs Scruffy" which was a great 
point, and Kingsley has long since encouraged us to embrace the 
diversity of the web and look to translate from one format to another, 
and as Tim says, the web is an open platform that anybody can build on 
top of, thus we have an architecture that encourages diversity and 
innovation - it appears again to me that with schema.org and microdata 
we're being pointed in the direction people want to go, this raises many 
(potentially easy to address) questions, as I've listed above. Yes we 
can look to push RDF more, and convert microdata to rdf, try and get RDF 
and OWL up there to understand schema.org - no harm in that - but as 
I've mentioned before, some form of (simple) universal data could easily 
be designed on the back of this, and rdf, to offer something practical 
to the web masses that anybody can use, build on top of, and save lots 
of work around the globe.

Quite some time ago the RDFa/Microdata thing was noted, and it was clear 
then that the two needed merged before too much weight went behind 
microdata and legacy issues meant it was hard to change, that time is 
past now, I'm aware that some efforts may happen in this department, but 
really I'd be looking to see what lessons can be learned for RDF here, 
before the same thing happens there two, as there's been a huge 
investment in the sem web stack, and a large corporate trio could easily 
rip the ground from underneath this relatively small community, may not 
happen, but there's a risk of it, and it does appear that there's a 
strong long term message of "rdf is too complicated, we like to do 
things like this [x]", the same message can be found in microdata, and 
ignoring it could be risky.

apologies for being a bit quiet of late, and for the cruffyness of this 
mail - not been too well, trying to slowly get back in to things at the 
minute - hope to be back on form soon.

best, nathan
Received on Friday, 17 June 2011 20:18:56 UTC