RDF-In-XHTML; A "New" Approach

Abstract: This is an approach for embedding page metadata in HTML in such a
way as to be backwards and specification compliant, but so that there is an
unambiguous mapping into RDF, with tools (in XSLT) to do so.

It occured to me that <meta/> elements can be thought of as triples with
the page URI as a subject, the "name" attribute as a predicate, and the
"content" attribute as a literal object. Likewise, <link/> can be thought
of as a triple with the page as a subject, "rel" as a predicate, and "href"
as the URI Reference for the object.

Of course @name and :rel aren't grounded in URI space... but the HTML 4.01
specification does mention that the profile attribute on the <head> element
can be used to do set out unique profiles:-

[[[
The profile attribute of the HEAD specifies the location of a meta data
profile. The value of the profile attribute is a URI.
]]] - http://www.w3.org/html401/struct/global#profiles

It doesn't mention that properties are formed by directly concatentating
the profile with the meta @name and link @rel attributes... but it doesn't
forbid it, either. In other words, is does not constrain one particularly
as to how the profile attribute should be used (it doesn't rule out the
mechanism that I am "proposing").

Also, note that "URI" in the quote above is a typo for "URI Reference" (the
HTML 4.01 specification does that a *lot*; it doesn't even officially allow
URI References in hyperlinks!), so there's no problem there. @@ Is this in
the HTML 4.01 errata? It should be.

If we used this mechanism, it would quite easily allow us to derive RDF
from XHTML. For example:-

   [...]
   <head profile="http://example.org/#">
   <meta name="myProp" value="My Object"/>
   <link rel="myOtherProp" href="http://myuri.net/"/>
   </head>
   [...]

gets converted into:-

   @prefix : <http://example.org/#> .
   this :myProp "My Object" .
   this :myOtherProp <http://myuri.net/> .

or, rather:-

   @prefix : <http://example.org/#> .
   <mypage> :myProp "My Object" .
   <mypage> :myOtherProp <http://myuri.net/> .

O.K., so what about when no profile is set? What is the default profile for
XHTML? As mentioned before, the HTML specification is very unrestrictive
about what the profile attribute is used for, so the lack of a profile
attribute simply falls under the set of cases whereby we cannot map
directly onto RDF. In other words, this mechanism has to be explicitly set
up; e.g. the XSLT stylesheet that I came up with (below) only works for
pages which have their profile attribute explicitly declared.

As far as multiple rel attribute values go (they are space separated
NMTOKENS), I decided that "alternate stylesheet" would be converted to
"alternateStylesheet", and that apart from that, the stylesheet only
processes the first value. This is a limitation of the stylesheet... if
someone could modify it to somehow grok multiple rel values, then I'd be
grateful. Note that the value "alternate" has a special meaning, which it
what makes it a little difficult.

Of course, this presumes that XHTML link types carry on over even when new
profiles are set. The specification doesn't mention anything about this...
For example, it doesn't mention if:-

   [...]<head>
   <link rel="stylesheet" href="style.css"/>[...]

is the same as:-

   [...]<head profile="http://example.org/#">
   <link rel="stylesheet" href="style.css"/>[...]

I'll bet that all of the XHTML implementations treat it that way:
experiments with IE and Ns confirm this. Thankfully, the stylesheet is
neutral towards this issue, but XHTML instances won't be. Perhaps names
reserved by the HTML specification should be mapped into the XHTML
namespace? Dunno.

The next issue is what to do about type attributes on <link/>. That's an
easy one to solve:-

   <link rel="stylesheet" type="text/css" href="style.css"/>

goes to:-

   <mypage> :stylesheet <style.css> .
   <style.css> dc:format "text/css" .

There's also an issue with the "scheme" attribute on <meta/>, but I'll skip
that for now. Here's what I have for the stylesheet so far:-

<t:stylesheet
    xmlns="http://www.w3.org/1999/xhtml" version="1.0"
    xmlns:t="http://www.w3.org/1999/XSL/Transform"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:h="http://www.w3.org/1999/xhtml"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" >

<!-- This converts XHTML into RDF
   <meta name="a" value="b"/> => this :a "b" .
   <link rel="x" type="y" href="z"/> => this :x <z> . <z> dc:format "y" .
-->

<t:param name="xmlfile"/>

<t:template match="/">
   <rdf:RDF>
       <t:apply-templates/>
   </rdf:RDF>
</t:template>

<t:template match="/h:html/h:head[@profile]">
<rdf:Description rdf:about="{$xmlfile}">
<t:variable name="profile" select="(@profile)"/>

<t:for-each select=".//h:meta[@name]">
 <t:element name="{@name}" namespace="{$profile}">
   <t:value-of select="(@content)"/>
 </t:element>
</t:for-each>

<t:for-each select=".//h:link[@rel]">
<t:variable name="rel" select="(@rel)"/>
<t:choose>
 <t:when test="contains(@rel,' ')">
   <t:choose>
     <t:when test="(@rel) = 'alternate stylesheet'">
      <t:variable name="rel" select="'alternateStylesheet'"/>
         <t:call-template name="outputrel"><t:with-param
          name="rel" select="$rel"/><t:with-param
          name="profile" select="$profile"/></t:call-template>
     </t:when>
     <t:otherwise>
      <t:variable name="rel" select="substring-before(@rel,' ')"/>
        <t:call-template name="outputrel"><t:with-param
          name="rel" select="$rel"/><t:with-param
          name="profile" select="$profile"/></t:call-template>
     </t:otherwise>
   </t:choose>
 </t:when>
 <t:otherwise>
   <t:variable name="rel" select="(@rel)"/>
   <t:call-template name="outputrel"><t:with-param
     name="rel" select="$rel"/><t:with-param
     name="profile" select="$profile"/></t:call-template>
   <!-- Heh, so much like outputrel(rel, profile) in Python -->
  </t:otherwise>
</t:choose>
</t:for-each>
</rdf:Description>

<t:for-each select=".//h:link">
<t:if test="(@type)">
 <rdf:Description rdf:about="{@href}" dc:format="{@type}"/>
</t:if>
</t:for-each>
</t:template>

<t:template name="outputrel">
<t:param name="rel"/>
<t:param name="profile"/>
<t:element name="{$rel}" namespace="{$profile}">
 <t:attribute name="rdf:resource"><t:value-of
   select="(@href)"/></t:attribute>
</t:element>
</t:template>

<!-- Don't pass text through -->
<t:template match="text()|@*">
</t:template>
</t:stylesheet>

If I write this experiment up, I shall do so at [1]. For now, I'll just
make it redirect to this email, or something. I'll save the above
stylesheet to [2], so that people can play around with it using the W3C's
XSLT service [3].

Summary: the HTML specification does not go into great detail about the
"profile" mechanism, and hence does not constrain one to its use. Because
of that, it is fine to say that for certain profiles, one can use a
mechanism (the one outlined above) to map directly to RDF. There can
obviously be no repository of the profiles that use this mechanism, so
you'll have to sniff for that somehow. I'm guessing that this can be used
more as something from the author's point of view (by linking to a
conversion), than the user (by trying to apply conversions to every site
that one visits). Therefore, this can be looked upon as a tool that enables
one to add backwards-compatable and specification compliant metadata to
XHTML that maps unambiguously to RDF.

P.S. I know that things *like* this have been proposed before, but I don't
think that there has ever been a mechanism exactly like this. My apoloiges
if there has been.

P.P.S. +BCC to www-rdf-logic because this is in "response" to the "Re: XML
Serialization" thread. I felt that this subject was more appropriate for
www-rdf-interest, however, so I have sent the mail there.

[1] http://infomesh.net/2001/08/rdfinxhtml/
[2] http://infomesh.net/2001/08/rdfinxhtml/test.xsl
[3] http://www.w3.org/2001/05/xslt

--
Kindest Regards,
Sean B. Palmer
@prefix : <http://webns.net/roughterms/> .
:Sean :hasHomepage <http://purl.org/net/sbp/> .

Received on Thursday, 30 August 2001 22:17:37 UTC