Re: Adding a datatype for HTML literals to RDF (ISSUE-63) from Andy Seaborne on 2012-05-03 (public-rdf-wg@w3.org from May 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Thu, 03 May 2012 10:27:33 +0100
To: Richard Cyganiak <richard@cyganiak.de>
CC: public-rdf-wg@w3.org
Message-ID: <4FA24F85.2030509@epimorphics.com>
On 03/05/12 09:19, Richard Cyganiak wrote:
> Hi Andy,
>
> It sounds like you'd rather prefer an HTML datatype with a simple 1:1
> correspondence between lexical space and value space.

I think that's a viable approach, yes.

> Your objection seems to be that something more complex isn't really
> needed. Which might be true, but do you think that something more
> complex would actually do any harm, and would be worse?

I'm not objecting.

I'm simply putting forward a case because I felt that the conversation 
was heading to infoset-value without much consideration of usage.

The primary UC is passing around display fragments.  Better dc:title.

One (implementation) argument is that some systems only have DOM access.
Another is that other systems don't have an HTML5 parser at all.

Given experiences of rdf:XMLLiterals, not just the fact they are 
hard-wired into RDF, it is not obvious, to me at least, that a complex 
scheme is a good idea.

> And is this preference for a simpler scheme from an implementer's
> point of view, or is it from a WG resources/spec complexity point of
> view, or something else?

Yes (implementation generally).

If people in the WG want to spend time on infoset-value, that's fine.

	Andy

>
> Thanks, Richard
>
>
> On 2 May 2012, at 21:47, Andy Seaborne wrote:
>> On 02/05/12 20:29, Richard Cyganiak wrote:
>>> On 2 May 2012, at 19:15, Andy Seaborne wrote:
>>>> I think I'm saying, start simple, prove a need for more
>>>> complicated.
>>>>
>>>> We can define a value space that is all character sequences
>>>> (and is disjoint from xsd:string).  Do we need to be more
>>>> complicated? What's the use case?
>>>
>>> One use case might be RDFa parsers with HTML literal support.
>>>
>>> Let's say you have @datatype="rdf:HTMLLiteral" on some element,
>>> and the element contains text with markup, and the desire is that
>>> the resulting HTML literal contains the text with markup intact.
>>>
>>> Now the RDFa parser may not have access to the actual HTML
>>> string, but only to a representation that has already been parsed
>>> into a DOM tree.
>>>
>>> So the parser may have to serialize the DOM into a string, which
>>> would probably be different from the original string.
>>
>> Certainly something to consider.
>>
>> Thought: if the original string isn't available, does it matter?
>> Will it be available to anyone else?
>>
>>>
>>> (Or is this nonsense and the parser could always just do
>>> myDOMElement.innerHTML to get the original HTML?)
>>
>> I'm insufficiently up with the tool space to know.  (gavin?)
>>
>>>
>>> Anyways, the advantage of having a value space that is isomorphic
>>> to the DOM is that you can parse and re-serialize the HTML and
>>> still get the same value.
>>>
>>>> (Not all RDF systems have access to info set support code now
>>>> that we are standardising Turtle and N-triples.)
>>>
>>> Yeah and that's why we're trying to change rdf:XMLLiteral to make
>>> it optional and to relax its lexical space.
>>>
>>> I imagine that rdf:HTMLLiteral would be optional too, and the
>>> lexical space should certainly be as unrestrictive as possible.
>>>
>>> Only those who want to compare HTML literals, or those who *need*
>>> to parse and re-serialize HTML literals, need to care what the
>>> value space is. (And yeah, if we can't come up with evidence that
>>> some systems need to do one of those, then there's little point
>>> in defining anything more complicated than a 1:1 L2V mapping.)
>>
>> Comparison may be done in another system - these literals are
>> published and ingested by another system that might be asked if two
>> literals are the same.  e.g. a reasoner or a SPARQL engine.
>> Whether the ability to value-equals two literals with different
>> lexical forms is sufficiently important, I can't say.
>>
>> I feel that this isn't that likely - HTML5 literals are display
>> material to be passed about.  For that,  equality processing is
>> unlikely, and the fragments go in and come out on on some generated
>> HTML.
>>
>> Andy
>>
>>
>>>
>>> Best, Richard
>>>
>>>
>>>
>>>>
>>>> Andy
>>>>
>>>>>
>>>>> Ivan
>>>>>
>>>>>> Best, Richard
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> And I guess in theory, DOMs and XML Infosets should be
>>>>>>>> isomorphic, no?
>>>>>>>
>>>>>>> In theory:-) To be checked. There may be corner cases.
>>>>>>>
>>>>>>>>
>>>>>>>> Between all these transformations, there should be
>>>>>>>> something that works for us. The devil is in the
>>>>>>>> details of course.
>>>>>>>
>>>>>>> Exactly...
>>>>>>>
>>>>>>>>
>>>>>>>> Or we could just avoid all of that trouble and simply
>>>>>>>> define the value space of the HTML datatype as
>>>>>>>> identical to the lexical space.
>>>>>>>
>>>>>>> And then we are back to the same issue as we had with
>>>>>>> XML Literals. Except that... there is no such thing as a
>>>>>>> formal canonical HTML5
>>>>>>>
>>>>>>> Ivan
>>>>>>>
>>>>>>>>
>>>>>>>> Best, Richard
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Just some food for thoughts...
>>>>>>>>>
>>>>>>>>> Ivan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On May 1, 2012, at 18:41 , Gavin Carothers wrote:
>>>>>>>>>
>>>>>>>>>> On Tue, May 1, 2012 at 6:46 AM, Richard
>>>>>>>>>> Cyganiak<richard@cyganiak.de>    wrote:
>>>>>>>>>>> All,
>>>>>>>>>>>
>>>>>>>>>>> The 2004 WG worked under the assumption that the
>>>>>>>>>>> future of HTML was XHTML, and that the use case
>>>>>>>>>>> of shipping HTML markup fragments as RDF payloads
>>>>>>>>>>> would be addressed by rdf:XMLLiteral. But in
>>>>>>>>>>> 2012, shipping HTML fragments really means HTML5.
>>>>>>>>>>> Is rdf:XMLLiteral still adequate for this task?
>>>>>>>>>>> Is a new datatype with a lexical space consisting
>>>>>>>>>>> of HTML5 fragments needed? This question is
>>>>>>>>>>> ISSUE-63.
>>>>>>>>>>>
>>>>>>>>>>> I think it would be useful to have a straw poll
>>>>>>>>>>> sometime soon on this question:
>>>>>>>>>>>
>>>>>>>>>>> PROPOSAL: RDF-WG will work on an HTML datatype
>>>>>>>>>>> that would be defined in RDF Concepts.
>>>>>>>>>>
>>>>>>>>>> +1, and for internationalization should be a
>>>>>>>>>> required datatype, might also have a simple syntax
>>>>>>>>>> in Turtle (though would likely require a new last
>>>>>>>>>> call but a Web formating that doesn't understand
>>>>>>>>>> HTML doesn't seem like much of a web format)
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If there is general support for this, then we
>>>>>>>>>>> could start work on the details of the datatype
>>>>>>>>>>> definition (lexical space, value space, L2V
>>>>>>>>>>> mapping and so on).
>>>>>>>>>>>
>>>>>>>>>>> All the best, Richard
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead
>>>>>>>>> Home: http://www.w3.org/People/Ivan/ mobile:
>>>>>>>>> +31-641044153 FOAF:
>>>>>>>>> http://www.ivan-herman.net/foaf.rdf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153
>>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF:
>>>>> http://www.ivan-herman.net/foaf.rdf
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Received on Thursday, 3 May 2012 09:28:09 UTC