Re: HTML datatype proposal (ISSUE-63)

Ivan,

On 10 May 2012, at 09:28, Ivan Herman wrote:
> http://www.w3.org/TR/html5/the-end.html#serializing-html-fragments
> 
> does not seem to be 100% canonicalization algorithm. What I see right away is that it does not say anything about the order in which attributes should appear in the element (C14N requires them to be in alphabetical order).

Good catch. Sorry, I should have noticed that myself.

> As a consequence, I would propose not to define any canonical form at all in this case, maybe adding a note that when that issue will be solved by the HTML5 community then this datatype might adopt that.

That's ok with me.

Best,
Richard



> 
> Ivan
> 
> 
> On May 10, 2012, at 02:32 , Richard Cyganiak wrote:
> 
>> See below for a proposal for an HTML datatype. The lexical space is all Unicode strings (HTML5 explains how to parse any gunk into a DOM tree). The value space is normalized DOM DocumentFragments, like in the new rdf:XMLLiteral. The L2V mapping is HTML5's “fragment parsing algorithm”. The canonical mapping (from values to canonical lexical forms) is HTML5's “fragment serialization algorithm”.
>> 
>> Like the new rdf:XMLLiteral (and all other datatypes), this datatype is entirely optional.
>> 
>> Best,
>> Richard
>> 
>> 
>> == The rdf:HTML Datatype ==
>> 
>> RDF provides for HTML content as a possible literal value. This allows markup in literal values. Such content is indicated in an RDF graph using a literal whose datatype is a special built-in datatype rdf:HTML.
>> 
>> rdf:HTML is defined as follows.
>> 
>> === An IRI denoting this datatype ===
>> is http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML.
>> 
>> === The lexical space ===
>> is the set of Unicode strings.
>> 
>> === The value space ===
>> is a set of DOM DocumentFragment nodes [DOM4:1]. Two DocumentFragment nodes A and B are considered equal if and only if the DOM method A.isEqualNode(B) [DOM4:2] returns true.
>> 
>> === The lexical-to-value mapping ===
>> is defined as:
>> 
>> 1. Let domnodes be the list of DOM nodes [DOM4:3] that result from applying the HTML fragment parsing algorithm [HTML5:1] to the literal's lexical form, without a context element.
>> 2. Let domfrag be a DOM DocumentFragment [DOM4:1] whose childNodes attribute is equal to domnodes
>> 3. Return domfrag.normalize() [DOM4:4]
>> 
>> === The canonical mapping ===
>> defines a canonical lexical form [XMLSCHEMA11-2:1] for each member of the value space. The rdf:HTML canonical mapping is the HTML fragment serialization algorithm [HTML5:2].
>> 
>> NOTE: Any language annotation desired in the HTML content must be included explicitly in the HTML literal (@lang="…").
>> 
>> NOTE: RDF applications may use additional equivalence relations, such as that which relates an xsd:string with an rdf:HTMLLiteral corresponding to a single text node of the same string.
>> 
>> == References ==
>> [DOM4:1] http://www.w3.org/TR/dom/#interface-documentfragment
>> [DOM4:2] http://www.w3.org/TR/dom/#dom-node-isequalnode
>> [DOM4:3] http://www.w3.org/TR/dom/#node
>> [HTML5:1] http://www.w3.org/TR/html5/the-end.html#parsing-html-fragments
>> [DOM4:4] http://www.w3.org/TR/dom/#dom-node-normalize
>> [HTML5:2] http://www.w3.org/TR/html5/the-end.html#serializing-html-fragments
>> [XMLSCHEMA11-2:1] http://www.w3.org/TR/xmlschema11-2/#dt-canonical-mapping
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 

Received on Thursday, 10 May 2012 12:37:37 UTC