thoughts on the profile issue

Wow:-) Many things have happened while I was on vacations...

This mail is set of slightly random thoughts on the profile discussion. Instead of answering each individual mails, I would rather gather my thoughts at one place; it may also help in triggering new discussions. I am sorry it is fairly long... Bear with me!

1. I fully understand the difficulties of implementing the @profile features. Been there, done that. It is true that the @profile implementation (which includes a local caching mechanism) is the most complex part of the new version of my pyRdfa, for example.

Of course, you can look at this fact from different points of view. I do not believe we should optimize on what is easy for implementers, as long as the feature is implementable (and it is); what we are looking for are features that are useful for the users. Ie, I take the _implementation_ complexity argument with a pinch of salt. But I appreciate the issue. (And yes, Henri's comment gave some more food for thought although I agree with Shane that the processing model of RDFa is really a one pass thing that is fairly deterministic. However, Web Application practice might be different; not my strong point, cannot really comment on that...)

2. Shane was asking for a use case. Let me be very specific with one. At SemTech2011 I was at a presentation on rNews. You guys probably know what it is, and (as somebody referred to this in the thread), one of the issues a lot of people have with rNews is that, in the vocabulary, they reinvent the wheel. Ie, many of the terms they use are actually identical to terms in Dublin Core, in FOAF, in you-name-it. And that is bad if rNews items are to be integrated with other data: this integration is the very essence of what the Semantic Web is all about, and the non-specific vocabulary URI elements are the key to that. 

I had a chat with the presenter to find out why. The answer was absolutely clear: the rNews community cannot expect its users to define a load of prefix declaration (with whatever syntax). So they use only their own terms. On the other hand, the rNews committee seems to be _very_ interested in the RDFa 1.1 profile because it may give them the best of both worlds: one single declaration for rNews somewhere but mapping to other vocabularies, and a set of simple terms for their users.

(As an aside: for rNews it would be of course important to publish their profile and let others use it. Somebody in the thread proposed to restrict the @profile file to be published by the author of the RDFa file: I do not think that would fly...)

If we fail to provide means to our users in RDFa to do that _somehow_, then we will encourage what I would call the 'single vocabulary' cases in practice, which is what schema.org, rNews and others do: define your _own_ vocabulary, with _one_ namespace, even if the terms are identical with others. What becomes much more difficult is the integration of, say, rNews data with schema.org events.

3. I looked at Niklas' mail on the three alternative proposals he had: microsyntax, RDFa-Sem (let me call it this way), and GRDDL. I will come back to the RDFa-Sem issue below. I am sorry Niklas, but I do not think the microsyntax proposal solves my use case, for example. I am not saying we should not consider the microsyntax stuff (we had this discussion before, I am not sure how and why it stalled), but not for the current thread. As for GRDDL: there is very very little uptake of GRDDL out there. Whether it is because, in practice, it is bound to XSLT or something else, I do not know, but I do not think we could realy on that (besides, GRDDL is bound to XML, ie, would not work with HTML5...)

4. The RDFa-Sem alternative _is_ interesting. What Niklas is saying (if my understanding is correct) is that a URI used for a @vocab _may_ be a reference to an RDFS vocabulary; so an RDFa processor may pick up all the RDFS vocabularies in the file, merge all these graphs, and do an RDFS reasoning on the merged graph. Just follow the RDFS semantics' document! In this sense, the usage of map:ProxyProperty is actually superfluous: by virtue of the RDFS semantics subPropertyOf, for example, should suffice. There are some details to handle (which version of the RDFS reasoning would one use), but that can be done.

Note that Niklas had a very reduced RDFS handling in mind, essentially exploiting subPropertyOf and subClassOf only. But why stopping there, why not exploiting, for example, range and domain statements? (Ok, I may ask too much here:-)

So yes, that is an interesting line. Of course... implementing the full RDFS, though possible, is comparable in complexity to the management of profiles (though, with the expected size of an RDF graph in an RDFa file, a very simple, straightforward forward chaining reasoner would do the trick. But handling blank nodes in literals in an RDFS reasoner is still a bit tricky). I am wondering:  how many implementations will there be around that would produce not only the basic RDF graph, but the extended one as well? (We have several implementers around!) Note that a similar caching mechanism as the one discussed for profiles would be necessary to really make a good job.

(I would do it. O.k, I have the advantage of having implementd an RDFS reasoner in the past in Python, so...)

5. If I am a user doing, say, SPARQL on the output of an RDFa processor, what would I query?

- If RDFa uses @profile, would my query rely on all terms and prefixes/uris that are defined through @profiles? I think yes. Indeed, the question here is: what is the probability of failing to get a profile file and therefore missing out triples? Here comes Shane's argument: the probability is very small, in fact. There won't be many profiles around, kosher RDFa processors would cache those anyway, so in a majority of the cases we could rely on all, expanded terms and URI-s.
- If RDFa uses RDFa-Sem, would my query rely not only on the core terms with the @vocab value but _also_ on the RDFS expanded terms? Well... not unless managing the RDFS reasoning is mandatory in the RDFa processor! Of course, if it is, then the same arguments apply as for profiles: there won't be that many @vocab-s with RDFS statements out there, kosher RDFa-Sem processors would cache those anyway, etc.

So: is RDFa-Sem mandatory? Because if not, then users may rely on those terms only if they use their own RDFa processors, or environments that have RDFS processing built in. And here is the catch: unfortunately, at the moment, not many RDF environment have RDFS processing built in, out of the box (eg, RDFLib does not have that). 

6. Niklas proposed to drop the term definitions in the default profiles. I am not a big fan of those (they really pollute a bunch of RDF files in the RDFa 1.0 case) but can we really do that? Eg, Creative Commons put a huge emphasis on the fact that we can use the rel="copyright" out of the box... On the other hand, Nathan said many times that many of the @rel values are semantically bogus... I am torn. (There is also backward compatibility here with RDFa 1.0, if that is still to be considered.)

7. We could also think about a profile registration mechanism. I would actually hate it:-) but we may still give some thoughts to it: an RDFa processor would have a set of profiles it accepts (and therefore it caches) and those are centrally registered. Much like the default profile right now, except that the user would have to declare the ones he/she uses explicitly. I see all kinds of troubles with that, but I though it is worth raising it.

8. For prefix usage, we could also consider using a centralized service like prefix.cc (Richard Cyganiak just commented on Google+ on that effect[2] on some other thread; sorry if you do not have access to the G+ link...). I am not saying it is good, but, again, I thought it is worth raising it.

Bottomline: at the moment, I find Niklas' RDFa-Sem proposal appealing and it might be considered as a possible improvement of @vocab that may make @profile unnecessary. Actually, it might make a bridge to the microdata discussion, too; after all, the mechanism would be an extension to what an RDF mapping of microdata does, and that might be good... But it is still unclear to me whether it is realistic go down that line in practice. If this does not work, though, than I would be fairly uneasy about dropping profiles

Sorry for this looooong mail

B.t.w.: I think fully using RDFa-Sem this way would really require community feedback. I wonder whether somebody (Niklas? Manu?) could do a, say, Google+ or a blog entry somewhere with the explicit goal of asking for feedback (Google+ seems to be the most active community discussion place these days...)

Ivan


[1] http://lists.w3.org/Archives/Public/public-rdfa-wg/2011Jul/0048.html
[2] https://plus.google.com/u/0/112095156983892490612/posts/aUqGQSLzDPv


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Tuesday, 2 August 2011 13:48:11 UTC