Re: ISSUE-1: Status of RDFa Profiles from Mark Birbeck on 2010-03-14 (public-rdfa-wg@w3.org from March 2010)

From: Mark Birbeck <mark.birbeck@webbackplane.com>
Date: Sun, 14 Mar 2010 11:19:34 +0000
To: Ivan Herman <ivan@w3.org>
Cc: Ben Adida <ben@adida.net>, Manu Sporny <msporny@digitalbazaar.com>, RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <640dd5061003140419p2624c1beidd3ad47a03014ade@mail.gmail.com>
Hi Ivan,

I think we might be getting to some pretty fundamental discussions
here, about the future of the semantic web and who our audience should
be. (Which is good. :))


> On 2010-3-12 11:46 , Mark Birbeck wrote:
>> Hi Ben,
>>
> [skip]
>>
>> In fact, I believe that we would have failed if we have not (a)
>> provided them with a way to write things simply, like @rel="knows",
>> and (b) ensured that they *never* need to use another prefix again
>> (unless they want to).
>>
>
> While I full agree with (a), I actually disagree with (b).
>
> My feeling is that we are forgetting about one of the major advantages
> of RDFa over, say, microdata (and whether we like it or not, I consider
> microdata as being here to say). And that is that RDFa scales gracefully
> over vocabularies while microdata (or microformats) do not. And that is
> due to the prefix mechanism.

RDFa scales because it took into account URIs -- the building-block of
the semantic web.

How you reflect those URIs is another thing entirely. You could use a
single token ("name"), integers ("7344589"), or a prefix mechanism
("foaf:name"); whatever you use, as long as it maps to a URI, it
scales.

I realise people might say "GRDDL", but the problem is that
Microformats were not designed to map -- that was a layer of
interpretation applied by others. This means that when parsing you can
never be certain of the author's intent, so you could never be
definitive in your mapping to URIs.

And I realise you might also say, that


PREFIXES

Prefixes are a way of saying 'here is a vocabulary and you can use any
term from it, simply by appending the term to a base URI'.

However, it has become clear to me in my ontology work (and I'm not
alone in this), that what you invariably end up doing is bringing
together a collection of terms for a particular purpose -- terms that
come from different vocabularies.

I'll give you two examples, one from my own work, and one being
Google's vocabularies.


ARGOTS

A year or so ago I was commissioned to create an ontology for
government consultations. After looking at it for a while, I concluded
that a consultation is effectively a foaf:Document, with a dc:type of
'consultation':

  _:a a foaf:Document .
  _:a dc:type xyz:Consultation .

I was off and running, and so far I had only had to invent one term --
xyz:Consultation.

However, whilst I hadn't had to invent many terms, I did need some way
to say to people that they should use a combination of
'foaf:Document", 'dc:type' and 'xyz:Consultation'. I came up with the
idea of an 'argot', which is a combination of terms from various
vocabularies, and defined the ontology accordingly [1]. (You'll see
that whilst there are 15 or so terms, only one of them was created by
me -- the rest come from FOAF and Dublin Core.)

Interestingly enough, I then heard that Dublin Core had been thinking
along the same lines, creating something they called 'application
profiles' [2].


GOOGLE

Now, let's pretend to be Google. We want to create an easy way for
people to mark up documents that contain information about products
for sale -- the same as the consultation example.

However, they don't want people to have to start their documents with
a whole bunch of namespaces declarations, so they decide not to simply
use 'as is' the various vocabularies that are around, and instead
create all of the terms they need in their own, single, namespace.

Rightly or wrongly (recall that they got a lot of flack for their
decision), they are trying to make it easy for authors, but we now
have a tension between language design and ease of authoring.

I believe that modern vocabulary design should really involve the
creation of argots, rather than the creation of brand new
vocabularies, and ease of authoring requires a single namespace.

So to combine the two (an argot that uses terms from many
vocabularies, and simple authoring) we simply remove the prefixes and
go to tokens.


> Ie, I would not go out of my way to hide
> this or to relegate it to some sort of RDF geek/specialist corner of the
> RDFa community.

I'm a little surprised by this...you don't think that the RDF
community is under threat, do you? :)

All we're talking about here is prioritising the end-user.

The key driver for RDFa has always been to make it easy for authors to
add semantics to their documents in a way that the RDF community can
benefit from. We shouldn't lose sight of that. The RDF community is
the beneficiary of RDFa, but it is not the target audience of RDFa.

True, we can also argue that RDFa is more convenient than RDF/XML,
alongside many other benefits, and so is of interest directly to the
RDF community. But the key thing remains that if we make it painless
to publish RDF, then we can get the rest of the world to publish
semantics, and we can then do clever stuff with the output.


> As I said, the keywords mechanism is fine and essential. Of course, CC
> can provide a profile document if they wish so, and people will use it;
> it is not very complicated because the CC vocabulary is relatively
> small. For FOAF I begin to doubt; and if I look at the bibliography
> ontology, or the music ontology, I just do not believe that it is
> realistic to expect that anybody would come up with a keyword vocabulary
> for those, ie, a vocabulary file that would list each individual terms
> in those URIs to avoid using bibo: or mo: or foaf:.

You've lost me...how is creating the token "name" any more difficult
to come up with than "foaf:name"?

You might be saying that we can't be sure that "name" doesn't exist in
two different profiles, but that's just a case of us agreeing on how
to decide which wins. Microformats never had such a mechanism, which
is what made it difficult. I've already proposed a few techniques, and
no doubt others will suggest further ones.

But this kind of namespacing has been working in Java and C++ for
years; you import a library, and then you can use it unprefixed
throughout your code. If two terms conflict, you have to sort it out.


> It is also
> unrealistic to expect an individual author to take the time and energy
> to form a separate vocabulary file for, say, the subset of bibo he/she
> would use locally.

This is a crucial point, I think.

First, I don't know why you think that prefixes would be removed in my
proposal; tokens and prefixes co-exist. So they can still use "bibo:"
if they want.

But more importantly, people don't mark-up documents in a vacuum;
"Individual authors" in the sense of the HTML author, will almost
certainly use the profile that most suits the work they are trying to
do. If they're concerned to get their page indexed by Google, then
they'll probably use Google's 'profile', when such a thing exists. If
they work for a library or are a scientist, then they will probably
use a profile that's been agreed at some international conference or
other. (The application profiles idea from Dublin Core [2] is also
relevant here.)

Any author that starts to form their own vocabulary is in my view
moving into the more techie end of the spectrum, rather than being
somehow 'everyday', and different criteria will apply.


> On the contrary, we should loudly encourage users to
> use a prefix mechanism when and if they need it...

I really have to disagree strongly!

:)

The mechanism is there, for people who need it -- we don't need to be
loud about it. What we should be loud about is that "everyday authors"
can now write documents without having to use namespace prefixes, and
Google can publish vocabularies that use standard terms, but still
under their own control, without incurring the wrath of the 'RDF
community'.


> ... and we should make it
> easier than using just a load of @xmlns attributes. Ie, to paraphrase
> you I think we will also fail if we have not provided authors with a way
> to write prefixes simply.

You mean 'declare prefix-mappings simply', of course -- since the
writing of prefixes doesn't change.

We would be kidding ourselves if we thought that there would be
dancing in the streets when we add a way to declare a bunch of
prefixes. :)

We should be more ambitious.


> And, I am sorry Mark to disagree with you again, I maintain that a
> separate prefix and keyword mechanism are simpler concepts to grasp for
> non-experts than mixing the two together...

So to recap on why I disagree with you disagreeing with me disagreeing with you:

1. For those who have never used namespaces before -- which should be
the bulk of our audience now -- telling them that they can use a token
on its own ("blah") or a token followed by a suffix ("blah:foo") is so
simple I'm shocked that you think people won't get it. (Ok...not
shocked...that was for effect...mildly surprised.)

2. Given that people will be using tokens in a context, they will
invariably use the token collection (profile) that is most useful to
their task in hand. Since profiles can include other profiles, then
it's quite easy for profile creators to get the terms structured
properly.

Regards,

Mark

[1] <http://code.google.com/p/argot-hub/wiki/ArgotConsultation>
[2] <http://efoundations.typepad.com/efoundations/2009/11/coi-guidance-on-use-of-rdfa.html>

--
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)
Received on Sunday, 14 March 2010 11:20:14 UTC