Re: Identification of RDFa content from Mark Birbeck on 2006-12-07 (public-rdf-in-xhtml-tf@w3.org from December 2006)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Thu, 7 Dec 2006 14:24:14 +0000
To: "Ivan Herman" <ivan@w3.org>
Cc: "Ben Adida" <ben@mit.edu>, "public-rdf-in-xhtml task force" <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <640dd5060612070624o7d5ccb2ap125574f24abecb61@mail.gmail.com>
Hi Ivan,

I wonder if we're posing the main question the wrong way. We agree
that an HTML document contains metadata, whether we invented RDFa or
not; it might contain information about stylesheets that are linked to
the document, the next and previous documents in a chain, and so on.
And we also agree that RDFa was cunningly crafted so that 'normal'
HTML metadata features are part of RDFa, even though RDFa of course
introduces all sorts of extra features if you need them.

Now, the question that is currently being asked is, how do we signify
that the document has RDFa in it? And we've many times come back to
the idea that it's a non-question since nearly all HTML documents
contain RDFa anyway.

But what if we change the question to be, how do we tell if a document
has any *useful* RDFa in it? In the example you gave, the very fact
that someone has put 'see also' should mean that there is something
useful at the other end of the 'see also', should it not? In which
case it's worth parsing it.

But I wonder if we could go further and generalise this; is metadata
about the document not really of interest, whilst metadata about
something else--Ivan, Mark, my car, your house--is the real stuff we
are after? Now, if that is the case, why not use a URL that points to
the item in question? Why not use:

  <http://www.w3.org/People/Ivan/foaf.html#me >

for example?

Let's use Ben's updated FOAF example to illustrate; he gave the
following URL in his email:

  <http://ben.adida.net/card.xhtml>

This gives us a nice web-page with all sorts of information about Ben,
and this could be used by *any* processor, such as a browser. However,
if you look at the mark-up you'll see that we have metadata about the
document:

  <> xh:stylesheet </includes/stlye.css>

and then we have metadata about Ben himself:

  <#i> rdf:type foaf:Person
  <#i> foaf:givenname "Ben"

and so on. The 'interesting' RDFa is therefore actually at:

  <http://ben.adida.net/card.xhtml#i>

With this approach you really can use the same URL for both your
semantic and clickable web presences. Just to make that part clear,
take a look at Ben's mark-up for his (sadly small... ;) list of
friends:

    <h3>People I know</h3>

    <ul>
        <li> <a rel="foaf:knows"
href="http://www.w3.org/People/djweitzner/foaf#DJW">Danny
Weitzner</a></li>
        <li> <a rel="foaf:knows"
href="http://www.w3.org/People/Berners-Lee/card#i">TimBL</a></li>
        <li> <a rel="foaf:knows"
href="http://www.w3.org/People/EM/contact#me">Eric Miller</a></li>
    </ul>

This is fine, but if you follow any of those links you get an RDF/XML
file. However, if each of these files had been expressed as RDFa we
could have used the same URL in both the HTML+RDFa version, and the
RDF/XML version:

  <rdfs:seeAlso rdf:resource="http://ben.adida.net/card.xhtml#i"/>

There's an important sub-plot here; @rdfs:seealso is supposed to refer
to a resource. Nothing wrong with that, except in the example you've
given Ivan, the 'see also' obviously points to an HTML document to be
retrieved, and so it is referring to an 'information resource'. Two
important things come from this:

 * some *other* metadata elsewhere should really be saying what type
   of document this is, and what rules should be followed to govern
   its retrieval--i.e., we need some statements 'about' that
   'information resource';

 * the fact that it is an 'information resource' means that it can't
   be a 'person' as well.

This last point has been discussed in many places, and I've also added
to the 'noise' around the question with my own summary of the ideas
[1]. So if you want to actually 'see also' another profile of Ivan as
opposed to a web-based document, then the URL should be something like
this:

  <http://www.w3.org/People/Ivan/foaf.html#me>

With a URL like this you are saying:

  'see also' another person

not:

  'see also' some document and parse its contents.

Note that this is just the same as in RDF/XML; when you 'see also' one
RDF/XML document from another, you are not saying anything about the
'carrier' of the metadata (the document); put another way, you are
pointing to a resource, and *not* and information resource. The
carrier is therefore transparent to the process, and the same should
be true of RDFa, which is achieved by pointing to a resource (via a
fragment identifier) rather than a document.


This makes me think that we might be able to find a solution here. I
won't venture any concrete proposals just yet, but I'll summarise some
of my findings made whilst looking into this:

 * although all HTML documents could be said to contain RDFa, there
   is a difference between RDFa about the document itself, and RDFa
   about other resources;

 * the presence of RDFa about other resources is most often what
   we're trying to point out to a parser;

 * if a parser follows a link via @seealso then there is most likely
   useful information there;

 * since a resource should not be both an 'information resource' and
   a non-information resource at the same time, then most of our URLs
   will be referring to the 'useful' RDFa within a document, and not
   the document itself.

Regards,

Mark

[1] <http://internet-apps.blogspot.com/2006/05/information-resource-debate-and-rdfa.html
>

On 25/11/06, Ivan Herman <ivan@w3.org> wrote:
 > Hi Mark,
>
 > your points are well taken and no disagreement anywhere. Let us say: we
 > are in wild agreement in the fundamental approach, ie, that a (X)HTML
> document *is* RDFa!
 >
> For me the issue is really a question of efficiency rather than anything
> else. The use case below may be a bit artificial, but nevertheless. Let
 > us say I have a foaf file (in good old RDF/XML format), which includes
 > the following statement:
>
 > <rdfs:seeAlso rdf:resource="http://www.w3.org/People/Ivan/"/>
 >
> And I feed this into one of those programs that try to include all
 > seeAlso-s into its final triple store. Tabulator is an example or, say,
 > the server set by Chris Bizer in Berlin is another one (I will use
> 'tabulator' as a generic example)
>
> What would a tabulator do? It would then
>
> - look at that URI, see from the return header that it is not
> application/rdf+xml or n3 (I am not sure there is a mime type for that
 > one...). It sees that it is some form of HTML (let us not go into the
 > details of which one)
>
 > - It has then the choice of:
>
 >     1. look at the start of the document (essentially, sniff it) to see
> if there is a GRDDL profile; in this case it will GRDDL it and add the
> result to its triple store
 >     2. *parse* the whole document with an RDFa parser to, possibly, get
> an empty triple store
 >     3. if both of these steps yield an empty document, than it will give
 > up extending its triple store and add an external link to its user
> interface.
 >
> My issue is simply the *efficiency* of step #2. A tool like tabulator
> already suffers from efficiency (o.k, it is a student project at MIT,
 > and stuff like that, but we should not underestimate the issue). If it
 > wants to be a tool recognizing RDFa content, I do not think it can do
> anything else than #2, but that may lead to a large number of
> unnecessary parsing [there is a whole discussion on whether seeAlso is
 > the best tool for such effects, but let us leave that aside].
 >
> That is why I think a non-obligatory 'marker' is or may be useful. At
 > the moment, I do not see any other solution than a profile.
> Alternatively, people will use the GRDDL mechanism to retrieve RDFa and
> that is all what a tabulator would accept...
 >
> *I see the danger*, do not take me wrong! That would mean that, though
 > the marker is conditional, people would feel it necessary to have a tool
 > like tabulator work properly. But one could also have an optimization
> flag in tabulator, ie, it would do #2 above, but would switch to
> sniffing the profile first.
 >
> Clearly, we are brainstorming here, and I am trying to find a solution.
 > My immediate concern, as I guess is yours, too, is to provide for a
 > rapid mechanism for the usage of RDFa...
>
 > Cheers
>
 > Ivan
>
 > Mark Birbeck wrote:
> > Hi Ivan,
 > >
> > This is an interesting issue, and the problem has nothing to do with
 > > modules, XHTML 1.2, XHTML 2, or anything like that. The key difficulty
 > > is that RDFa has been specifically designed to beef up the metadata
> > features that HTML already has, and as a consequence, all HTML
> > documents are already RDFa-compliant.
 > >
> > Take something like this (in HTML):
 > >
> >  <head>
 > >    <title>My site</title>
> >    <link rel="next" href="...">
> >    <link rel="previous" href="...">
 > >  </head>
> >
 > > This tells us that the current document has 'next' and 'previous'
 > > documents, and is simple, standard, HTML. Now, there's no reason at
> > all why some processing software shouldn't store the following
> > information about that document:
 > >
> >  <> h:next <...> .
 > >  <> h:previous <...> .
> >
 > > Now we can use SPARQL to find all documents that refer to some other
> > document, and even documents that are the last in a chain. The fact
> > that this document 'contains' RDFa is down to the processor, and not
 > > down to the author--it's in the eye of the beholder :).
> >
> > I've said this before, and it's generally been met with the claim that
 > > we're 'hijacking' people's data. Hopefully, an example like this shows
 > > that we're certainly not 'forcing' documents to be RDFa when people
> > don't want them to be; what we're doing is saying that HTML documents
> > already have metadata, and RDFa defines some rules about how to treat
 > > that metadata from an RDF standpoint. The fact that these rules are
 > > entry-level RDFa is of course fortuitous, since it means that you can
> > also add far richer metadata later on.
> >
> > So, what I'm interested to hear is a use case for something that
> > indicates the presence of RDFa in a document. It's pointless having it
 > > *in* the document, since as I've shown, an RDFa parser is not going to
 > > 'fail' if it processes an HTML document with limited metadata, so you
> > could just process all documents that way.
> >
> > But you could say that we don't want to process such documents, and
> > therefore the indicator would need to be outside, since you want to
 > > save the cost of retrieval. (If you have to retrieve it to find out
 > > whether to process it, as Ben says you might as well just go ahead and
> > process it.) But then you're into the problem that FoaF has--how do
> > you bootstrap the whole thing? Do we maintain a list of
 > > RDFa-conformant documents?
> >
 > > To put this another way--indicating that a document 'is' RDFa is
 > > pointless, since all documents 'just are', which means that any
> > indicator we devise is only playing the role of pointing out that some
> > document was intended to be part of some community of
 > > specially-prepared documents that have been crafted to contain useful
 > > metadata. That may or may not be useful--I couldn't say, but I just
> > wanted to clarify that some stamp of approval is not the same as
> > indicating the class of the document (the latter being unnecessary).
 > >
> > All the best,
 > >
> > Mark
 > >
> >
 > > On 23/11/06, Ivan Herman <ivan@w3.org> wrote:
 > >
> >> Hm. If we want a quick usage and spread of RDFa, then this may not be
 > >> fully satisfactory at least in my view. Nobody knows when XHTML 1.2 will
 > >> be published as a Rec, let alone XHTML 2.0 (the group's charter has just
> >> been sent to the AC, ie, there is not group yet!). What happens in the
> >> meantime?
 > >>
> >> My hope is that the XHTML1.x RDFa module, as well as the final technical
 > >> spec, will be published way before the full XHTML1.2, and that we can
 > >> start using RDFa big time and quickly. Using a (possibly optional)
> >> profile tag might help that.
> >>
 > >> Of course, we could rely on GRDDL and, say, Fabien's XSLT script [as an
> >> aside: we should have a clear test set; Fabien's script, for example,
> >> does not produce the same result as Elias' one, I think there are
 > >> missing features...]. However, if I take an environment like Redland,
 > >> that means that it would have to go and execute an 'outsider' script
> >> every time it wants to retrieve RDFa content (which also means that it
> >> would not work off-line) whereas if it knew via a profile that this is
 > >> RDFa, it could parse the file right away and locally.
 > >>
> >> Bottomline: I am still not convinced:-(; and I do not see harm in
 > >> declaring a separate profile...
> >>
 > >> Ivan
> >>
 > >> Ben Adida wrote:
> >> > Ivan,
 > >> >
> >> > Sorry for the delayed response here.
> >> >
 > >> > RDFa is meant to be a natural part of XHTML. In other words,
> >> declaring a
 > >> > document to be XHTML 1.2 or 2.0 is enough to make a parser look for
 > >> > RDFa. This may be done by specifying a GRDDL profile in the XHTML 1.2
> >> > and 2.0 namespace documents.
> >> >
 > >> > Of course, parsers may choose to be more promiscuous than that and look
> >> > inside XHTML  1.1 and 1.0 if they so choose...
> >> >
 > >> > -Ben
> >> >
 > >> > Ivan Herman wrote:
> >> >
 > >> >>This may have been discussed before, in which case apologies. I have
> >> not
 > >> >>seen a reference to it in the latest draft.
> >> >>
> >> >>The question: how does one discover that an XHTML file is 'RDFa-d'? The
 > >> >>issue stroke me as a result of some discussions lately around the
 > >> >>Tabulator[1] and Chris Bizer's announcement[2]. In both cases one can
> >> >>see engines that are able to make an indirect step, so to say; ie, they
> >> >>get a URI to a traditional site, but they can deduce the presence of a
 > >> >>corresponding RDF data which they can add to their graph they build and
 > >> >>explore. Examples are the <link references to RDF data, or the GRDDL
> >> >>profile.
> >> >>
 > >> >>Hence the question again: how does an automatic procedure 'know'
> >> that an
 > >> >>XHTML file contains RDFa encoded extra RDF data? Of course, a processor
 > >> >>could RDFa process *all* XHTML file it gets hold of, but it may be
> >> worth
 > >> >>adding some standard notification. Also, if such identification was
 > >> >>around, the same URI could be used both for human consumption and
> >> for an
 > >> >>RDFa-aware RDF environment.
> >> >>
> >> >>One would think of a profile attribute or is some sort of a special and
 > >> >>predefined <link>... whichever. Something would be good.
 > >> >>
> >> >>Any thoughts?
 > >> >>
> >> >>Ivan
 > >> >>
> >> >>
 > >> >>[1] http://dig.csail.mit.edu/breadcrumbs/node/165
 > >> >>[2] http://lists.w3.org/Archives/Public/semantic-web/2006Oct/0065.html
 > >> >>
> >> >
 > >> >
> >>
 > >> --
> >>
 > >> Ivan Herman, W3C Semantic Web Activity Lead
> >> URL:  http://www.w3.org/People/Ivan/
> >> PGP Key:  http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
> >> FOAF: http://www.ivan-herman.net/foaf.rdf
> >>
 > >>
> >>
 > >
> >
 >
> --
 >
> Ivan Herman, W3C Semantic Web Activity Lead
 > URL: http://www.w3.org/People/Ivan/
> PGP Key:  http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>


 --
Mark Birbeck
 CEO
x-port.net Ltd.

e: Mark.Birbeck@x-port.net
 t: +44 (0) 20 7689 9232
w: http://www.formsPlayer.com/
b: http://internet-apps.blogspot.com/

Download our XForms processor from
 http://www.formsPlayer.com/
Received on Thursday, 7 December 2006 14:24:34 UTC