Re: HTML in RDF

Hi Michael,

> You mean something like http://esw.w3.org/topic/HtmlToRdf ?
>
> If so you may find useful stuff at
>
> http://simile.mit.edu/wiki/RDFizers
>
> (or even bug the SIMILE guys to write one for you ;)

I still don't think this captures it. Obviously I don't want to impose
my own views onto Bent's, but I have a feeling that we are actually
talking about the same thing. (Correct me if I'm wrong, Bent.)

To convert HTML to RDF would not simply be to take out the content in
the way that we have done with RDFa, or that SIMILE does via XSLT.
Instead, it would mean to actually convert the document's structure
_itself_ to RDF. For example, at the moment if an RDFa parser sees
this:

  <img src="http://...holiday.png" />

it will not generate anything. If it sees this:

  <img rel="foaf:depiction" src="http://...holiday.png" />

then it will generate a triple that establishes are relationship
between some object and a resource, stating that the resource is a
'depiction' of the object:

  <> foaf:depiction <http://...holiday.png> .

However, you could not round-trip the document from this information.
I.e., you could not take the triples and reconstruct the HTML
document, or construct some other document in a different language
(say Docbook) based on the higher level metadata you've stored. For
that you would need to store triples that relate to the elements in
the mark-up. In other words, even the first example, with no @rel
value, would need to generate some triples, even if it's as simple as:

  <> xh:img <http://...holiday.png> .

Obviously this is a discussion for the future, but I believe this to
be an important scenario for the 'next generation web'...whatever that
is :). In short, if you could store the 'intent' of your mark-up
rather than the mark-up itself then you would be insulated from
changes to rendering language, it would be much easier to be
device-independent, you could deliver different versions of a document
to different audiences, and so on.

(Which is incidentally why I was keen to keep @role and @id out of the
_data_ level of RDFa, since they allow us to talk about the document
itself, separately form the document _content_.)

Regards,

Mark

-- 
  Mark Birbeck, formsPlayer

  mark.birbeck@formsPlayer.com | +44 (0) 20 7689 9232
  http://www.formsPlayer.com | http://internet-apps.blogspot.com

  standards. innovation.

Received on Thursday, 25 October 2007 08:55:15 UTC