Re: HTML datatype proposal (ISSUE-63)

I'm unsure about what the usecase is for (semi-)canonicalised equality.

Can someone give me an example?

Without that it's hard to know if we should be trying for full C14N, lexical comparison, or something in between.

- Steve

On 10 May 2012, at 01:28, Ivan Herman wrote:

> Richard,
> 
> I think this is the right approach, and I am in favour of doing it. However...
> 
> http://www.w3.org/TR/html5/the-end.html#serializing-html-fragments
> 
> does not seem to be 100% canonicalization algorithm. What I see right away is that it does not say anything about the order in which attributes should appear in the element (C14N requires them to be in alphabetical order). I have not checked all the details but that is enough to say that the canonical forms would not be enough to make a string comparison for equality. If so, what is the purpose of having it?
> 
> Because, at the moment, I do not know of any work happening in HTML5 land in direction of HTML5 signature, I do not see that the concern of exact canonicalization will be on the agenda for the months/years to come. As a consequence, I would propose not to define any canonical form at all in this case, maybe adding a note that when that issue will be solved by the HTML5 community then this datatype might adopt that.
> 
> Ivan
> 
> 
> On May 10, 2012, at 02:32 , Richard Cyganiak wrote:
> 
>> See below for a proposal for an HTML datatype. The lexical space is all Unicode strings (HTML5 explains how to parse any gunk into a DOM tree). The value space is normalized DOM DocumentFragments, like in the new rdf:XMLLiteral. The L2V mapping is HTML5's “fragment parsing algorithm”. The canonical mapping (from values to canonical lexical forms) is HTML5's “fragment serialization algorithm”.
>> 
>> Like the new rdf:XMLLiteral (and all other datatypes), this datatype is entirely optional.
>> 
>> Best,
>> Richard
>> 
>> 
>> == The rdf:HTML Datatype ==
>> 
>> RDF provides for HTML content as a possible literal value. This allows markup in literal values. Such content is indicated in an RDF graph using a literal whose datatype is a special built-in datatype rdf:HTML.
>> 
>> rdf:HTML is defined as follows.
>> 
>> === An IRI denoting this datatype ===
>> is http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML.
>> 
>> === The lexical space ===
>> is the set of Unicode strings.
>> 
>> === The value space ===
>> is a set of DOM DocumentFragment nodes [DOM4:1]. Two DocumentFragment nodes A and B are considered equal if and only if the DOM method A.isEqualNode(B) [DOM4:2] returns true.
>> 
>> === The lexical-to-value mapping ===
>> is defined as:
>> 
>> 1. Let domnodes be the list of DOM nodes [DOM4:3] that result from applying the HTML fragment parsing algorithm [HTML5:1] to the literal's lexical form, without a context element.
>> 2. Let domfrag be a DOM DocumentFragment [DOM4:1] whose childNodes attribute is equal to domnodes
>> 3. Return domfrag.normalize() [DOM4:4]
>> 
>> === The canonical mapping ===
>> defines a canonical lexical form [XMLSCHEMA11-2:1] for each member of the value space. The rdf:HTML canonical mapping is the HTML fragment serialization algorithm [HTML5:2].
>> 
>> NOTE: Any language annotation desired in the HTML content must be included explicitly in the HTML literal (@lang="…").
>> 
>> NOTE: RDF applications may use additional equivalence relations, such as that which relates an xsd:string with an rdf:HTMLLiteral corresponding to a single text node of the same string.
>> 
>> == References ==
>> [DOM4:1] http://www.w3.org/TR/dom/#interface-documentfragment
>> [DOM4:2] http://www.w3.org/TR/dom/#dom-node-isequalnode
>> [DOM4:3] http://www.w3.org/TR/dom/#node
>> [HTML5:1] http://www.w3.org/TR/html5/the-end.html#parsing-html-fragments
>> [DOM4:4] http://www.w3.org/TR/dom/#dom-node-normalize
>> [HTML5:2] http://www.w3.org/TR/html5/the-end.html#serializing-html-fragments
>> [XMLSCHEMA11-2:1] http://www.w3.org/TR/xmlschema11-2/#dt-canonical-mapping
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 

-- 
Steve Harris, CTO
Garlik, a part of Experian 
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 653331 VAT # 887 1335 93
Registered office: Landmark House, Experian Way, NG2 Business Park, Nottingham, Nottinghamshire, England NG80 1ZZ

Received on Thursday, 10 May 2012 14:23:56 UTC