Re: HTML 4 Profile for RDFa from Tom Morris on 2009-05-14 (public-html@w3.org from May 2009)

From: Tom Morris <tom@tommorris.org>
Date: Thu, 14 May 2009 22:02:07 +0100
To: Philip Taylor <pjt47@cam.ac.uk>
Cc: Sam Ruby <rubys@intertwingly.net>, Shane McCarron <shane@aptest.com>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-ID: <d375f00f0905141402g27d9ecfh8eda00d0e1874da7@mail.gmail.com>

On Thu, May 14, 2009 at 21:11, Philip Taylor <pjt47@cam.ac.uk> wrote:
> "If the object of a triple would be an XMLLiteral, and the input to the
> processor is not well-formed [XML]" - I don't understand what that means in
> an HTML context. Is it meant to mean something like "the bytes in the HTML
> file that correspond to the contents of the relevant element could be parsed
> as well-formed XML (modulo various namespace declaration issues)"? If so,
> that seems impossible to implement. The input to the RDFa processor will
> most likely be a DOM, possibly manipulated by the DOM APIs rather than
> coming straight from an HTML parser, so it may never have had a byte
> representation at all.
>
> Even without scripting, there isn't always a contiguous sequence of bytes
> corresponding to the content of an element. E.g. if the HTML input is:
>  <table>
>    <tr some-attributes-to-say-this-element-outputs-an-XMLLiteral>
>      <td> This text goes inside the table </td>
>      This text gets parsed to *outside* the table
>      <td> This text goes inside the table </td>
>    </tr>
>  </table>
> then (according to the HTML 5 parsing algorithm, and implemented in (at
> least) Firefox) the content of the <tr> element includes the first and third
> lines of text, but not the second. How would you decide whether the content
> is well-formed XML?
>
> For this to make sense in real HTML implementations, the definition should
> be in terms of the document layer rather than the byte layer. (The
> XMLLiteral should be an XML-fragment serialisation of the element, and some
> error handling (like ignoring the triple) would occur if it's impossible to
> serialise as XML, similar to the requirements in
> <http://www.whatwg.org/specs/web-apps/current-work/multipage/the-xhtml-syntax.html#serializing-xhtml-fragments>)
>

As someone who has written an RDF/XML parser, an XMLLiteral is stored
by RDF libraries as a string that's just marked as an XML literal.

So, if we had <span [...property/subject declaration...]
datatype="rdf:XMLLiteral"><a href="http://example.org/">Bla bla
bla</a></span>

This would become in N-Triples, N3 and Turtle:
_:somesubject ex:some property """<a href="http://example.org/">Bla
bla bla</a>"""^^rdf:XMLLiteral

The point of XML Literals is that basically RDF/XML needn't encode
XML-like data. So, you can have constructions like:
<rdf:Description rdf:resource="http://flickr.com/photos/blabla/1234">
  <xhtml_representation xmlns="http://example.org/>
    <img xmlns="http://www.w3.org/1999/xhtml"
src="http://flickr.com/blabla/1234.jpg" alt="Pretty picture of
somewhere nice" />
  </xhtml_representation>
</rdf:Description>

Without parseType="XMLLiteral", you'd basically have to encode it as a
string (lots of ampersand signs and lt, gt etc.) or use a CDATA
section. An XMLLiteral just has to be well-formed XML - that is, if
you stuck <?xml verison="1.0" ?> before it, it'd load into an XML DOM.

By "could be parsed as XML", what this means is that a little chunk of
it could be inserted into an RDF/XML document in a property
declaration using parseType="XMLLiteral" without anything breaking. So
<property><br></property> breaks, but <property><br /></property>
doesn't (where property is short for a namespaced property declaration
with rdf:parseType="XMLLiteral").

Since HTML5 as text/html isn't XML, the simple solution is to
basically specify that RDFa in HTML doesn't use XMLLiterals. Or say
that authors may specify XMLLiteral datatypes but if the document
fragment doesn't smell like it could become an XMLLiteral very easily,
the RDFa parser should just treat it as an ordinary Literal string.

RDFa minus XML Literals is probably no bad thing. I can't see a
compelling use for XML Literals in RDFa anyhow. There's a reason why
there's a TODO line in my RDF library that says "the thought of XML
literals makes me want to wretch".

HTML WG members may wish to pursue some of these pages regarding RDFa
and XML Literals:
http://www.w3.org/2006/07/SWD/wiki/RDFa/XMLLiteral

And also look at the RDFa test suite:
http://www.w3.org/2006/07/SWD/RDFa/testsuite/

-- 
Tom Morris
http://tommorris.org/

Received on Thursday, 14 May 2009 21:02:51 UTC