HTML datatype proposal (ISSUE-63) from Richard Cyganiak on 2012-05-10 (public-rdf-wg@w3.org from May 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 10 May 2012 01:32:47 +0100
To: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <1677B1B3-C8D2-4D5F-8186-A84BC89F7F0F@cyganiak.de>

See below for a proposal for an HTML datatype. The lexical space is all Unicode strings (HTML5 explains how to parse any gunk into a DOM tree). The value space is normalized DOM DocumentFragments, like in the new rdf:XMLLiteral. The L2V mapping is HTML5's “fragment parsing algorithm”. The canonical mapping (from values to canonical lexical forms) is HTML5's “fragment serialization algorithm”.

Like the new rdf:XMLLiteral (and all other datatypes), this datatype is entirely optional.

Best,
Richard


== The rdf:HTML Datatype ==

RDF provides for HTML content as a possible literal value. This allows markup in literal values. Such content is indicated in an RDF graph using a literal whose datatype is a special built-in datatype rdf:HTML.

rdf:HTML is defined as follows.

=== An IRI denoting this datatype ===
is http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML.

=== The lexical space ===
is the set of Unicode strings.

=== The value space ===
is a set of DOM DocumentFragment nodes [DOM4:1]. Two DocumentFragment nodes A and B are considered equal if and only if the DOM method A.isEqualNode(B) [DOM4:2] returns true.

=== The lexical-to-value mapping ===
is defined as:

1. Let domnodes be the list of DOM nodes [DOM4:3] that result from applying the HTML fragment parsing algorithm [HTML5:1] to the literal's lexical form, without a context element.
2. Let domfrag be a DOM DocumentFragment [DOM4:1] whose childNodes attribute is equal to domnodes
3. Return domfrag.normalize() [DOM4:4]

=== The canonical mapping ===
defines a canonical lexical form [XMLSCHEMA11-2:1] for each member of the value space. The rdf:HTML canonical mapping is the HTML fragment serialization algorithm [HTML5:2].

NOTE: Any language annotation desired in the HTML content must be included explicitly in the HTML literal (@lang="…").

NOTE: RDF applications may use additional equivalence relations, such as that which relates an xsd:string with an rdf:HTMLLiteral corresponding to a single text node of the same string.

== References ==
[DOM4:1] http://www.w3.org/TR/dom/#interface-documentfragment
[DOM4:2] http://www.w3.org/TR/dom/#dom-node-isequalnode
[DOM4:3] http://www.w3.org/TR/dom/#node
[HTML5:1] http://www.w3.org/TR/html5/the-end.html#parsing-html-fragments
[DOM4:4] http://www.w3.org/TR/dom/#dom-node-normalize
[HTML5:2] http://www.w3.org/TR/html5/the-end.html#serializing-html-fragments
[XMLSCHEMA11-2:1] http://www.w3.org/TR/xmlschema11-2/#dt-canonical-mapping

Received on Thursday, 10 May 2012 00:33:18 UTC