Re: Adding a datatype for HTML literals to RDF (ISSUE-63) from Steve Harris on 2012-05-08 (public-rdf-wg@w3.org from May 2012)

From: Steve Harris <steve.harris@garlik.com>
Date: Tue, 8 May 2012 10:17:32 -0700
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: Richard Cyganiak <richard@cyganiak.de>, public-rdf-wg@w3.org
Message-Id: <1BBC7FB7-8760-4627-A2BA-508D72214129@garlik.com>
+1, my guess is that it would mean there are not very conforming implementations, and an HTML datatype is useful without equality, for store and display, e.g. in a CMS.

- Steve

On 3 May 2012, at 02:27, Andy Seaborne wrote:

> 
> 
> On 03/05/12 09:19, Richard Cyganiak wrote:
>> Hi Andy,
>> 
>> It sounds like you'd rather prefer an HTML datatype with a simple 1:1
>> correspondence between lexical space and value space.
> 
> I think that's a viable approach, yes.
> 
>> Your objection seems to be that something more complex isn't really
>> needed. Which might be true, but do you think that something more
>> complex would actually do any harm, and would be worse?
> 
> I'm not objecting.
> 
> I'm simply putting forward a case because I felt that the conversation was heading to infoset-value without much consideration of usage.
> 
> The primary UC is passing around display fragments.  Better dc:title.
> 
> One (implementation) argument is that some systems only have DOM access.
> Another is that other systems don't have an HTML5 parser at all.
> 
> Given experiences of rdf:XMLLiterals, not just the fact they are hard-wired into RDF, it is not obvious, to me at least, that a complex scheme is a good idea.
> 
>> And is this preference for a simpler scheme from an implementer's
>> point of view, or is it from a WG resources/spec complexity point of
>> view, or something else?
> 
> Yes (implementation generally).
> 
> If people in the WG want to spend time on infoset-value, that's fine.
> 
> 	Andy
> 
>> 
>> Thanks, Richard
>> 
>> 
>> On 2 May 2012, at 21:47, Andy Seaborne wrote:
>>> On 02/05/12 20:29, Richard Cyganiak wrote:
>>>> On 2 May 2012, at 19:15, Andy Seaborne wrote:
>>>>> I think I'm saying, start simple, prove a need for more
>>>>> complicated.
>>>>> 
>>>>> We can define a value space that is all character sequences
>>>>> (and is disjoint from xsd:string).  Do we need to be more
>>>>> complicated? What's the use case?
>>>> 
>>>> One use case might be RDFa parsers with HTML literal support.
>>>> 
>>>> Let's say you have @datatype="rdf:HTMLLiteral" on some element,
>>>> and the element contains text with markup, and the desire is that
>>>> the resulting HTML literal contains the text with markup intact.
>>>> 
>>>> Now the RDFa parser may not have access to the actual HTML
>>>> string, but only to a representation that has already been parsed
>>>> into a DOM tree.
>>>> 
>>>> So the parser may have to serialize the DOM into a string, which
>>>> would probably be different from the original string.
>>> 
>>> Certainly something to consider.
>>> 
>>> Thought: if the original string isn't available, does it matter?
>>> Will it be available to anyone else?
>>> 
>>>> 
>>>> (Or is this nonsense and the parser could always just do
>>>> myDOMElement.innerHTML to get the original HTML?)
>>> 
>>> I'm insufficiently up with the tool space to know.  (gavin?)
>>> 
>>>> 
>>>> Anyways, the advantage of having a value space that is isomorphic
>>>> to the DOM is that you can parse and re-serialize the HTML and
>>>> still get the same value.
>>>> 
>>>>> (Not all RDF systems have access to info set support code now
>>>>> that we are standardising Turtle and N-triples.)
>>>> 
>>>> Yeah and that's why we're trying to change rdf:XMLLiteral to make
>>>> it optional and to relax its lexical space.
>>>> 
>>>> I imagine that rdf:HTMLLiteral would be optional too, and the
>>>> lexical space should certainly be as unrestrictive as possible.
>>>> 
>>>> Only those who want to compare HTML literals, or those who *need*
>>>> to parse and re-serialize HTML literals, need to care what the
>>>> value space is. (And yeah, if we can't come up with evidence that
>>>> some systems need to do one of those, then there's little point
>>>> in defining anything more complicated than a 1:1 L2V mapping.)
>>> 
>>> Comparison may be done in another system - these literals are
>>> published and ingested by another system that might be asked if two
>>> literals are the same.  e.g. a reasoner or a SPARQL engine.
>>> Whether the ability to value-equals two literals with different
>>> lexical forms is sufficiently important, I can't say.
>>> 
>>> I feel that this isn't that likely - HTML5 literals are display
>>> material to be passed about.  For that,  equality processing is
>>> unlikely, and the fragments go in and come out on on some generated
>>> HTML.
>>> 
>>> Andy
>>> 
>>> 
>>>> 
>>>> Best, Richard
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Andy
>>>>> 
>>>>>> 
>>>>>> Ivan
>>>>>> 
>>>>>>> Best, Richard
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>>> And I guess in theory, DOMs and XML Infosets should be
>>>>>>>>> isomorphic, no?
>>>>>>>> 
>>>>>>>> In theory:-) To be checked. There may be corner cases.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Between all these transformations, there should be
>>>>>>>>> something that works for us. The devil is in the
>>>>>>>>> details of course.
>>>>>>>> 
>>>>>>>> Exactly...
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Or we could just avoid all of that trouble and simply
>>>>>>>>> define the value space of the HTML datatype as
>>>>>>>>> identical to the lexical space.
>>>>>>>> 
>>>>>>>> And then we are back to the same issue as we had with
>>>>>>>> XML Literals. Except that... there is no such thing as a
>>>>>>>> formal canonical HTML5
>>>>>>>> 
>>>>>>>> Ivan
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best, Richard
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Just some food for thoughts...
>>>>>>>>>> 
>>>>>>>>>> Ivan
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On May 1, 2012, at 18:41 , Gavin Carothers wrote:
>>>>>>>>>> 
>>>>>>>>>>> On Tue, May 1, 2012 at 6:46 AM, Richard
>>>>>>>>>>> Cyganiak<richard@cyganiak.de>    wrote:
>>>>>>>>>>>> All,
>>>>>>>>>>>> 
>>>>>>>>>>>> The 2004 WG worked under the assumption that the
>>>>>>>>>>>> future of HTML was XHTML, and that the use case
>>>>>>>>>>>> of shipping HTML markup fragments as RDF payloads
>>>>>>>>>>>> would be addressed by rdf:XMLLiteral. But in
>>>>>>>>>>>> 2012, shipping HTML fragments really means HTML5.
>>>>>>>>>>>> Is rdf:XMLLiteral still adequate for this task?
>>>>>>>>>>>> Is a new datatype with a lexical space consisting
>>>>>>>>>>>> of HTML5 fragments needed? This question is
>>>>>>>>>>>> ISSUE-63.
>>>>>>>>>>>> 
>>>>>>>>>>>> I think it would be useful to have a straw poll
>>>>>>>>>>>> sometime soon on this question:
>>>>>>>>>>>> 
>>>>>>>>>>>> PROPOSAL: RDF-WG will work on an HTML datatype
>>>>>>>>>>>> that would be defined in RDF Concepts.
>>>>>>>>>>> 
>>>>>>>>>>> +1, and for internationalization should be a
>>>>>>>>>>> required datatype, might also have a simple syntax
>>>>>>>>>>> in Turtle (though would likely require a new last
>>>>>>>>>>> call but a Web formating that doesn't understand
>>>>>>>>>>> HTML doesn't seem like much of a web format)
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> If there is general support for this, then we
>>>>>>>>>>>> could start work on the details of the datatype
>>>>>>>>>>>> definition (lexical space, value space, L2V
>>>>>>>>>>>> mapping and so on).
>>>>>>>>>>>> 
>>>>>>>>>>>> All the best, Richard
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead
>>>>>>>>>> Home: http://www.w3.org/People/Ivan/ mobile:
>>>>>>>>>> +31-641044153 FOAF:
>>>>>>>>>> http://www.ivan-herman.net/foaf.rdf
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153
>>>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF:
>>>>>> http://www.ivan-herman.net/foaf.rdf
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

-- 
Steve Harris, CTO
Garlik, a part of Experian 
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 653331 VAT # 887 1335 93
Registered office: Landmark House, Experian Way, NG2 Business Park, Nottingham, Nottinghamshire, England NG80 1ZZ
Received on Tuesday, 8 May 2012 17:18:01 UTC