Re: HTML in RDF

Hi Bent,

Although we're going in similar directions, I was suggesting something
different to what you have here. For me it's not so much about
preserving the entire document as RDF--although there are many
advantages to doing that--but about obtaining the 'meaning' of the
document so that it can be used in a number of ways. I think the
examples become clearer when you bring @role in, although you can do
the same thing without it.

The whole purpose of @role is to give an indication of the _purpose_
of some mark-up. So your example might be recast:

  <html>
    <head>
      <title>unification</title>
      <script role="my:welcome">alert('unified, at last')</script>
    </head>
    <body>
      <p>It has to be possible!</p>
      <p role="menu">It just
        <a href="imperative.html">
          <em>has</em>
        </a>
        to be!
      </p>
    </body>
  </html>

This says to me that we have a document with a title and a script that
is intended to show a welcome message, as well as having two blocks of
information, one the main body of text, and the other being the main
menu.

Now, if we began with that metadata, rather than beginning with the
mark-up, we could end up with many different serialisations without
changing the end result. For example, since we know which part of the
mark-up is the main menu, then we no longer care in what order the
document is serialised--the menu could come before or after the "It
has to be possible" paragraph.

Another example is that since we know the purpose of the script--that
it is a welcome message--then we know that when targeting a device
that supports XForms (for example), we could also send this:

  <html>
    <head>
      <title>unification</title>
      <xf:model>
        <xf:message ev:event="xforms-ready" role="my:welcome">
          unified, at last
        </xf:message>
      </xf:model>
    </head>
    .
    .
    .
  </html>

As you can see, we haven't changed the 'meaning' of our original
metadata, but the resulting mark-up is very different.

Another example would be to consider a very small device that cannot
show both the main document and the menu at the same time, so instead
we have to split it into two documents:

main.html:

  <html>
    <head>
      <title>unification</title>
      <script role="my:welcome">alert('unified, at last')</script>
    </head>
    <body>
      <p>It has to be possible!</p>
      [<a href="menu.html">MENU</a>]
    </body>
  </html>

menu.html:

  <html>
    <head>
      <title>Menu</title>
    </head>
    <body>
      <p>It just
        <a href="imperative.html">
          <em>has</em>
        </a>
        to be!
      </p>
      [<a href="main.html">BACK</a>]
    </body>
  </html>

And so on.

The key point is that providing an exact serialisation is less useful,
I think, than capturing the 'intent' of the original document so that
it can be reused in different circumstances.

This also means that we need to store far less. To illustrate, let's
reduce the document to a minimum:

  <html>
    <head>
      <title>unification</title>
    </head>
    <body>
    </body>
  </html>

The only thing we need to store here is this:

  <> dc:title "unification" .

The 'knowledge' of the hierarchy isn't needed until you get to the
process of reconstruction, which is where we bring back in the <head>
and <body> parts of HTML. And note also that to get from this metadata
to Docbook or even SVG would also be easy.

Of course, you could say, why not just store the document as XML, and
then use XSLT to create the other serialisations, and that is
certainly a possibility. But getting the document into RDF is exciting
because you can then apply an enormous range of tools to your
information, and start to analyse and process your documents in many
different ways. (Not to mention the fact that with triples you don't
even need to have such rigid notions of where a document begins and
ends.)

Regards,

Mark


On 28/10/2007, Bent Rasmussen <incredibleshrinkingsphere@gmail.com> wrote:
>
>
> Allright, I wasn't quite aware of the Turtle/N3 semantics and made some
> assumptions, so I'll take a second attempt at the problem after having read
> up a bit. Hopefully this is more in line with the semantics.
>
> First, the example HTML document.
>
> <html>
> <head>
> <title>unification</title>
> <script>alert('unified, at last')</script>
> </head>
> <body><p>It has to be possible!</p><p>It just <a
> href="imperative.html"><em>has</em></a> to be!</p></body>
> </html>
>
> Then, the Turtle interpretation
>
> @prefix rdf:
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
> @prefix h:      <urn:ns:html#> .
>
> _:html
>   rdf:type h:html ;
>   rdf:value (
>     [ rdf:type h:head ;
>       rdf:value (
>         [ rdf:type h:htitle ;
>           rdf:value (
>             "unification"
>           )
>         ]
>         [ rdf:type h:script ;
>           rdf:value (
>             "alert('unified, at last')"
>           )
>         ]
>       ) .
>     ]
>     [ rdf:type h:body ;
>       rdf:value (
>         [ rdf:type h:p ;
>           rdf:value (
>             "It has to be possible!"
>           )
>         ]
>         [ rdf:type h:p ;
>           rdf:value (
>             "It just "
>             [ rdf:type h:a ;
>               h:href "imperative.html" ;
>               rdf:value (
>                 [ rdf:type h:em ;
>                   rdf:value "has"
>                 ]
>               )
>             ]
>             " to be!"
>           )
>         ]
>       ) .
>     ]
>   ) .
>
> And the raw triples
>
> _:html <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <urn:ns:html#html> .
> _:bnode0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <urn:ns:html#head> .
> _:bnode1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <urn:ns:html#htitle> .
> _:bnode2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first>
> "unification" .
> _:bnode2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
> _:bnode1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value>
> _:bnode2 .
> _:bnode3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first>
> _:bnode1 .
> _:bnode4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <urn:ns:html#script> .
> _:bnode5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first>
> "alert('unified, at last')" .
> _:bnode5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
> _:bnode4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value>
> _:bnode5 .
> _:bnode3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> _:bnode6 .
> _:bnode6 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first>
> _:bnode4 .
> _:bnode6 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
> _:bnode0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value>
> _:bnode3 .
> _:bnode7 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first>
> _:bnode0 .
> _:bnode8 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <urn:ns:html#body> .
> _:bnode9 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <urn:ns:html#p> .
> _:bnode10
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "It has
> to be possible!" .
> _:bnode10 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
> _:bnode9 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value>
> _:bnode10 .
> _:bnode11
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:bnode9
> .
> _:bnode12 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <urn:ns:html#p> .
> _:bnode13
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "It just
> " .
> _:bnode14 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <urn:ns:html#a> .
> _:bnode14 <urn:ns:html#href> "imperative.html" .
> _:bnode15 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <urn:ns:html#em> .
> _:bnode15
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "has" .
> _:bnode16
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#first>
> _:bnode15 .
> _:bnode16 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
> _:bnode14
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value>
> _:bnode16 .
> _:bnode13 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> _:bnode17 .
> _:bnode17
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#first>
> _:bnode14 .
> _:bnode17 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> _:bnode18 .
> _:bnode18
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> " to
> be!" .
> _:bnode18 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
> _:bnode12
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value>
> _:bnode13 .
> _:bnode11 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> _:bnode19 .
> _:bnode19
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#first>
> _:bnode12 .
> _:bnode19 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
> _:bnode8 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value>
> _:bnode11 .
> _:bnode7 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> _:bnode20 .
> _:bnode20
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:bnode8
> .
> _:bnode20 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
> _:html <http://www.w3.org/1999/02/22-rdf-syntax-ns#value>
> _:bnode7 .
>
> Thanks to Joshua Tauberer's RDF/XML/N3 validation service.
>
> I'm not so sure how optimal this model is, but at least it appears to me to
> be a feasible model. The structure being
>
> - rdf:value for element content
> - rdf:List for node ordering
> - rdf:Property subproperty instances for attributes
>
> Somehow I suspect the mechanism could be generalized for XML.
>
> I'm not sure how easy this is to deal with (query). It's certainly verbose,
> but I'm not that concerned with verbosity, I'm more interested in
> expressiveness and how easy it is to query it - lists and such. Next step...
>
> Comments?
>
> Regards
>
> Bent
>
>


-- 
  Mark Birbeck, formsPlayer

  mark.birbeck@formsPlayer.com | +44 (0) 20 7689 9232
  http://www.formsPlayer.com | http://internet-apps.blogspot.com

  standards. innovation.

Received on Monday, 29 October 2007 10:57:45 UTC