Re: Ongoing objection to RDFa Profiles format (as XHTML+RDFa) from Mark Birbeck on 2010-10-08 (public-rdfa-wg@w3.org from October 2010)

From: Mark Birbeck <mark.birbeck@webbackplane.com>
Date: Fri, 8 Oct 2010 11:21:59 +0100
To: Ivan Herman <ivan@w3.org>
Cc: nathan@webr3.org, Manu Sporny <msporny@digitalbazaar.com>, RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <AANLkTi=ZbwCd=D4_hm6zak5tOQHMBiF81HQw9urS2=ZN@mail.gmail.com>
Hi Ivan/Nathan,

I just want to chime in with a big +1 on Ivan's points.

I'd also like to continue on his point that in reality there will
almost certainly be a caching mechanism going on, which is key to the
whole concept.

In fact, I think a better way round to understand profiles is to
regard the URI for the profile as a 'key' that makes some tokens
available to the author. Note that this is how profiles are already
defined in HTML4, and my mental model of profiles in RDFa has always
been that we are /extending/ the HTML4 functionality, rather than
inventing something new. In other words, the fact that we can actually
'follow our noses' and obtain profiles on the fly is an /enhancement/
to the basic functionality, but we shouldn't lose sight of the fact
that it is not /itself/ the basic functionality.

For example, let's say that Google create a Rich Snippets profile, and
place it at (roughly) the same URL as their vocabulary; I can now
write this at the top of my document:

  <head profile="http://rdf.data-vocabulary.org/2010/#">

and I can create mark-up like this (a modified version of Google's sample):

<div typeof="Person">
   My name is <span property="name">Bob Smith</span>,
   but people call me <span property="nickname">Smithy</span>.
   Here is my homepage:
   <a href="http://www.example.com" rel="url">www.example.com</a>.
   I live in
   <span rel="address">
      <span typeof="Address">
         <span property="locality">Albuquerque</span>,
         <span property="region">NM</span>
      </span>
   </span>
   and work as an <span property="title">engineer</span>
   at <span property="affiliation">ACME Corp</span>.
   My friends:
   <a href="http://darryl-blog.example.com" rel="friend">Darryl</a>,
   <a href="http://edna-blog.example.com" rel="friend">Edna</a>
</div>

Now, anyone writing an RDFa parser is free to add a couple of lines of
code that make the Rich Snippets tokens (Person, friend, region, etc.)
available to the author, whenever the particular profile URI is
detected. In other words, the parser writer is under no obligation to
retrieve the actual profile at parse time.

The only fly in the ointment is that the URL for the profile should
include a date or version number, so that parser writers know that
they can safely hard-code the tokens -- that's why I added "2010" to
the Rich Snippets URI above.

This way of looking at things is very close to the way profiles are
currently viewed in HTML4, in that the spec says that some kind of
semantic information is obtained but it doesn't really look at how
that happens.

All we're doing with RDFa is saying that in addition a parser can try
to retrieve the document referenced by the URI, and use any data
obtained. I.e., we are enhancing the basic HTML4 functionality.

If you view it this way around then you can see that the system is far
less 'flakey' than you think it is, because as Ivan stressed, it's
almost certain that people building parsers will either (a) detect
profile URIs that are of interest to them, and hard-code the values,
and/or (b) cache retrieved profiles.

(And that's not even taking into account that profiles can be stored
anywhere, so the techniques used by Google and Yahoo! to host YUI,
jQuery, Dojo, etc. for general use can also be applied to profiles.)

Regards,

Mark


On Fri, Oct 8, 2010 at 10:16 AM, Ivan Herman <ivan@w3.org> wrote:
> Nathan,
>
> Let us try to keep focussed:-) I have a lot of sympathy on what you write about RDFS3.0 (whatever the name is) but we should not solve that issue in this working group.
>
> There are two different issues that we are discussing
>
> #1: whether we have profiles in the first place
> #2: if the answer to #1 is 'True', what is the format of that file (at present, it is RDF, others would prefer to use another format)
>
> This thread is, essentially, on #2, whereas your arguments are on #1 (well, against #1).
>
> So I will comment on #1 a bit, but keep in mind that this is not the issue this thread is discussing, so we may have to take it 'elsewhere', so to say.
>
> - I _do_ see the dangers you are referring to. But I expect, in practice, much less problems with those than they way you describe. Indeed, I do not expect a proliferation of profiles all over the place; I would rather expect a small number of profiles published by organizations like google, Creative Commons, Facebook, the New York Times, etc. These profiles will not really change frequently. I would also expect implementations to have some sort of a caching mechanism at least for the best known profiles, so that the profile URI would be used as an identification for those well known profiles (I know my implementation has that; a complete implementation would probably need an extra crontab job to refresh those profiles once a day, but that is easy to set up).
>
> No, these are not full solution and there are many corner cases where it can go wrong. I know that. But all this is not really different from, say, the way GRDDL is defined or, to take a somewhat more remote analogy, the way xml schemas or dtd-s are used in practice.
>
> - There is one aspect of profiles that you seem to miss. Profiles can be used for prefix mapping but also for term mapping. Although I can see how the prefix mapping could be handled by some sort of an OWL based reasoning (though I do not see that happening soon), term mapping is a different issue. And, in many ways, the real value of profiles is the term mapping. It is the fact that lambda HTML authors can simply write
>
> <div profile="google-profile-uri">
>    <span property="googleterm1" content="blabla"/>
>    <span property="googleterm2" content="yepyepyep"/>
> </div>
>
> and still generate proper RDF. If you look at the long (sorry, looooooong) discussion on the HTML5 group, for example, on the usage or not of namespaces or anything that is remotely looking like those, if you look at the argumentations around microdata vs. RDFa, in many cases that is where it boils down to. To make it a little bit more extreme: if RDFa cannot provide some solution for these term mappings, then it will be used only for complex cases where the microdata model breaks down (essentially, when using lots of different and sometimes esoteric vocabularies). By providing term mappings, RDFa can be used for simple cases, too.
>
> That is why my feeling is (and, I believe, the feeling of the working group) that the very real downsides that you describe in your mail are acceptable dangers in view of the overall gains...
>
> Ivan
>
>
>
> On Oct 8, 2010, at 05:15 , Nathan wrote:
>
>> Manu, Mark, All,
>>
>> I hate to say it but I also don't support RDFa Profiles (not just the format, I don't support RDFa Profiles at all), that said the approach I feel is the correct one (which I'll out line in a moment) I can't see being supported in tooling for quite some time sadly (although it is in some tools and I could support it myself easily enough).
>>
>> My Reasons against:
>>
>> - increased network load, several resources may be required to process a single resource, decreasing network efficiency, adding in additional interactions and reducing user-perceived performance (this is quite major..)
>>
>> - constrains an RDFa document to be protocol bound, without profiles one could retrieve a RDFa document via FTP, P2P, SCP or any other means and still extract the graph serialized, with profiles it's entirely likely that an HTTP (or other) agent would also be required to process the document.
>>
>> - exponentially increases the overhead involved in batch processing RDFa documents, if you consider recursively wget'ing a website and then batch extracting the RDF in to a triple store.
>>
>> - an RDFa document does not contain the graph serialized within it when processed offline, or when one or more RDFa Profiles cannot be resolved / successfully retrieved (partial messages can be a real problem, especially when the missing information is critical, signatures, trust metrics, assertions of falsehood, update streams etc also [1])
>>
>> - temporal issues, if I serialize an RDF graph in an RDFa document today, I'd quite like to be able to get that graph back out in 10 years time without having to pray the profiles are still on-line.
>>
>> [1] Manu, this one may resonate with you - let's move down the line a couple of years and say a large chunk of the ecommerce world is using good relations and the RDFa profile for it, consider what happens if http://www.heppnetz.de/grprofile/ goes down for a day.. or perhaps it's hacked and those prefix mappings are changed.. could be quite a big problem?
>>
>> I do fully understand why RDFa Profiles were introduced, I just feel it's the wrong approach, as per the above, the approach I feel is the correct one, is as follows:
>>
>>
>> Alternative Solution:
>>
>> Leverage OWL to create proxy ontologies using owl:equivalentClass and owl:equivalentProperty
>>
>> If Martin Hepp's Good Relations ontology requires the use of a couple of geo: properties, some from dcterms and so forth, he could simply define aliases to them in the good relations ontology and assert their equivalence.
>>
>> Likewise if I was to create a blogger type platform I could simply assert that x:title owl:equivalentProperty dcterms:title, rdfs:label . and so forth.
>>
>> The point being that there's no reasons a full RDF(a) document could not use a single schema which aliases / proxies multiple different schemas. I'm sure you all follow without needing to go in to too much detail :)
>>
>> Manu's comments on OWL leads me back to why I fully agreed with Jim Hendler's proposal on RDFS 3.0 [2] at the next steps workshop, and indeed why I'm surprised it wasn't considered more, there's a general sentiment of being ontology/schema shy around the linked data camps, preferring to use out of band knowledge about classes and properties rather than having a basic schema awareness within tooling, but this is an issue engrained deeper within the community, and which is a fundamental issue behind many of the more discussed RDFa WG issues at present - for instance when I mentioned pulling the range of properties to work out whether a string uri in an RDFa document is a resource or not - would be nice if it was under our remit to encourage and promote good form in this respect (arguable?!).
>>
>> On this note I also feel there's an 80/20 rule for general usage of RDF on the web, but it's more like 95/5, where if you defined RDFS 3.0, then took the common properties used in say 10k personal profiles and made a proxy ontology, then did the same for 10k blog posts/micro blogs/comments/forum posts/articles, and 10k ecommerce sites then you'd have covered most of the common usage on the web in just 3-4 schemas, leaving the rest down to people who are more familiar with RDF and don't really make the mistakes we're trying to cater for (copy and paste, fear of the prefix etc).
>>
>> [2] http://www.w3.org/2009/12/rdf-ws/papers/ws31
>>
>> To summarise in general, even in the all too likely case where what I've suggested doesn't happen any time soon, I still think that adding RDFa Profiles is leading to internet scale problems which could last a generation, and far better to have those relatively short lived copy-paste and a little bit harder to write problems coupled with a "let's teach them" approach than to introduce something we later (perhaps much later) regret.
>>
>> Of course I could be wildly wrong, regardless I'll pass on my regards and hope that this mail finds you all well :)
>>
>> Best,
>>
>> Nathan
>>
>> Manu Sporny wrote:
>>> On 09/08/2010 02:30 PM, Mark Birbeck wrote:
>>>> On Wed, Sep 8, 2010 at 3:08 PM, Ivan Herman <ivan@w3.org> wrote:
>>>>> [snip]
>>>>> I am sorry but these things have already been discussed, and the WG has
>>>>> decided to go along the lines it has now. I do not see any new information
>>>>> here, ie, no argument that has not been discussed before. Reopening a closed
>>>>> issue is really not a good way forward.
>>>> As you rightly say the issue was resolved by the WG some months ago.
>>>> However, I never supported the original resolution:
>>>>
>>>>  <http://www.w3.org/2010/02/rdfa/meetings/2010-04-15#resolution_3>
>>>>
>>>> and I'm afraid I can't support it now. I'm not really sure what people
>>>> expect me to do, since I didn't say I could live with this -- I said I
>>>> oppose it.
>>>>
>>>> For me this is particularly compounded by the fact that I've yet to
>>>> see a decent argument in favour of using RDF to express the prefix
>>>> mappings (as opposed to name/value pairs as is done in N3, SPARQL,
>>>> RDF/XML, and so on); you say that "these things have already been
>>>> discussed", but I don't feel the discussion really nailed this.
>>> Hi Mark,
>>> Sorry for taking so long to respond. I had promised you a follow-up to
>>> this e-mail at some point in the past month. I've seen the "why are we
>>> using RDF to express prefix mappings?" question raised by you several
>>> times with no in-depth answer from the list, so here's my attempt at
>>> summarizing the conversation over the past several months with the
>>> various parties involved.
>>> RDFa Vocabulary/Profile Orthogonality
>>> -------------------------------------
>>> I think the short answer is that we're using RDF to express the
>>> information because we expect that many of the RDFa Profile documents
>>> will be most useful to people as human-readable documents that just
>>> happen to contain machine-readable RDFa. Take the FOAF vocabulary for
>>> example - it's human-readable:
>>> http://xmlns.com/foaf/spec/
>>> but it is also machine-readable via XHTML+RDFa, here are the triples:
>>> http://check.rdfa.info/check?url=http://xmlns.com/foaf/spec/&version=1.0
>>> We are expecting the RDFa Profile documents to be marked up in the same
>>> way, for example, here is the Good Relations RDFa Profile document:
>>> http://www.heppnetz.de/grprofile/
>>> and here's the machine-readable triples from the document:
>>> http://check.rdfa.info/check?url=http://www.heppnetz.de/grprofile/&version=1.1
>>> Note that the same document is used to provide both the human-readable
>>> (HTML) and machine readable (RDFa) information for the FOAF Vocabulary.
>>> Also note that the identical XHTML+RDFa mechanism was used to generate
>>> the Good Relations RDFa Profile.
>>> This orthogonality is very important, and is one of the main driving
>>> reasons to mark up RDFa Profiles in RDFa.
>>>
>>> There is no difference between how one goes about creating an ideal
>>> vocabulary document and an ideal profile document for use with RDFa. So,
>>> if someone understands how to write XHTML+RDFa, the likelihood that they
>>> will be able to write an RDFa Vocabulary document and an RDFa Profile
>>> document is higher if we don't switch the underlying format on them.
>>> Now, let's take a few of the suggestions that you have made - flat text
>>> files with key-value pairs, JSON, and using a @prefix-based mechanism.
>>> Each one of these approaches requires someone that already knows
>>> XHTML+RDFa to understand that RDFa Profiles operate differently than the
>>> rest of XHTML+RDFa. That is, one must write XHTML+RDFa Vocabulary
>>> documents in one way, and RDFa Profile documents in another way.
>>> Human-readability of RDFa Profiles
>>> ----------------------------------
>>> In the case of flat text files with key-value pairs, we don't have any
>>> human-readable aspect to the files - just machine readable data. The
>>> same problem exists with JSON (which may or may not be understood by the
>>> person writing HTML+RDFa). To understand why this is such a bad idea,
>>> one can look at the OWL vocabulary:
>>> http://www.w3.org/2002/07/owl
>>> Trying to understand how to use OWL by just looking at the
>>> machine-readable vocabulary is painful and error prone. Given the choice
>>> between the way FOAF describes how their vocabulary can be used, and the
>>> way that OWL does the same thing - the choice is pretty clear from a
>>> human-readability point of view.
>>> So, advocating a mechanism to express RDFa Profiles in a way that is not
>>> human-readable is a non-starter as far as I'm concerned.
>>> You had also mentioned that we could perhaps re-use @prefix in an RDFa
>>> Profile document to express all of the prefixes and terms for the RDFa
>>> Profile. This would allow us to express the document in a human and
>>> machine readable way. However, the major drawback to this approach is
>>> that all of the prefix/term settings would be shoved into one attribute.
>>> Future Proofing RDFa Profiles
>>> -----------------------------
>>> If we wanted to modify RDFa Processor behavior by decorating prefix/term
>>> mappings via another mechanism, we'd be blocked in doing so due to the
>>> nature of the fairly simple @prefix syntax.
>>> For example, if we wanted to define terms in the future that generated 4
>>> triples every time a term was found, we couldn't do so via the @prefix
>>> mechanism. That is, if we wanted to generate dc:title, foo:title,
>>> bar:title and zurg:title when the term "title" was used like so:
>>> <span property="title">Zorgon The Emphatic</span>
>>> we'd have to invent a new backwards-compatible syntax for @prefix as
>>> used in RDFa Profile documents.
>>> Alternatively, the RDFa Profiles mechanism that used RDFa markup to
>>> express the profile terms and prefixes would just add another triple
>>> that states the other triples that must be generated when "title" is
>>> found in the markup. In other words, we're also using RDF to express the
>>> RDFa Profile documents because it is extensible.
>>> Concerns
>>> --------
>>> That is not to say that I don't agree with your notion that we're mixing
>>> the layers a bit here, but at the end of the day, authors rarely care
>>> about that. Language designers care about that kind of thing and I can't
>>> think of how it may bite us later down the line at the moment. What
>>> usability issues do you see with this approach? What technical issues do
>>> you see with this approach? I think I understand the design issues
>>> you're raising, but even I have to admit that they are a bit purist.
>>> What is the worst-case here? If an SVG+RDFa processor must implement an
>>> XML-compatible+RDFa processor (which it has to do anyway) to read RDFa
>>> Profiles, what is the down-side to that? We have between 18-22 RDFa
>>> processors at present, do we think that we'll only have a handful of
>>> RDFa 1.1 processors due to this design decision?
>>> To put it another way, your reaction to this seems to be fairly strong,
>>> Mark. Perhaps all of us that don't see it as a big issue are missing
>>> something, but I can't understand what that something must be.
>>> I think the best way forward at this point is for you to submit a solid
>>> alternative proposal. You've mentioned several ways forward, could you
>>> pick the winner as far as you see it and we can discuss that in order to
>>> make the conversation a bit more concrete?
>>> -- manu
>>
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
Received on Friday, 8 October 2010 10:22:57 UTC