Re: Adding a datatype for HTML literals to RDF (ISSUE-63) from Andy Seaborne on 2012-05-02 (public-rdf-wg@w3.org from May 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Wed, 02 May 2012 21:47:47 +0100
To: Richard Cyganiak <richard@cyganiak.de>
CC: public-rdf-wg@w3.org
Message-ID: <4FA19D73.3020809@epimorphics.com>
On 02/05/12 20:29, Richard Cyganiak wrote:
> On 2 May 2012, at 19:15, Andy Seaborne wrote:
>> I think I'm saying, start simple, prove a need for more
>> complicated.
>>
>> We can define a value space that is all character sequences (and is
>> disjoint from xsd:string).  Do we need to be more complicated?
>> What's the use case?
>
> One use case might be RDFa parsers with HTML literal support.
>
> Let's say you have @datatype="rdf:HTMLLiteral" on some element, and
> the element contains text with markup, and the desire is that the
> resulting HTML literal contains the text with markup intact.
>
> Now the RDFa parser may not have access to the actual HTML string,
> but only to a representation that has already been parsed into a DOM
> tree.
>
> So the parser may have to serialize the DOM into a string, which
> would probably be different from the original string.

Certainly something to consider.

Thought: if the original string isn't available, does it matter?   Will 
it be available to anyone else?

>
> (Or is this nonsense and the parser could always just do
> myDOMElement.innerHTML to get the original HTML?)

I'm insufficiently up with the tool space to know.  (gavin?)

>
> Anyways, the advantage of having a value space that is isomorphic to
> the DOM is that you can parse and re-serialize the HTML and still get
> the same value.
>
>> (Not all RDF systems have access to info set support code now that
>> we are standardising Turtle and N-triples.)
>
> Yeah and that's why we're trying to change rdf:XMLLiteral to make it
> optional and to relax its lexical space.
>
> I imagine that rdf:HTMLLiteral would be optional too, and the lexical
> space should certainly be as unrestrictive as possible.
>
> Only those who want to compare HTML literals, or those who *need* to
> parse and re-serialize HTML literals, need to care what the value
> space is. (And yeah, if we can't come up with evidence that some
> systems need to do one of those, then there's little point in
> defining anything more complicated than a 1:1 L2V mapping.)

Comparison may be done in another system - these literals are published 
and ingested by another system that might be asked if two literals are 
the same.  e.g. a reasoner or a SPARQL engine.  Whether the ability to 
value-equals two literals with different lexical forms is sufficiently 
important, I can't say.

I feel that this isn't that likely - HTML5 literals are display material 
to be passed about.  For that,  equality processing is unlikely, and the 
fragments go in and come out on on some generated HTML.

	Andy


>
> Best, Richard
>
>
>
>>
>> Andy
>>
>>>
>>> Ivan
>>>
>>>> Best, Richard
>>>>
>>>>
>>>>
>>>>>> And I guess in theory, DOMs and XML Infosets should be
>>>>>> isomorphic, no?
>>>>>
>>>>> In theory:-) To be checked. There may be corner cases.
>>>>>
>>>>>>
>>>>>> Between all these transformations, there should be
>>>>>> something that works for us. The devil is in the details of
>>>>>> course.
>>>>>
>>>>> Exactly...
>>>>>
>>>>>>
>>>>>> Or we could just avoid all of that trouble and simply
>>>>>> define the value space of the HTML datatype as identical to
>>>>>> the lexical space.
>>>>>
>>>>> And then we are back to the same issue as we had with XML
>>>>> Literals. Except that... there is no such thing as a formal
>>>>> canonical HTML5
>>>>>
>>>>> Ivan
>>>>>
>>>>>>
>>>>>> Best, Richard
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Just some food for thoughts...
>>>>>>>
>>>>>>> Ivan
>>>>>>>
>>>>>>>
>>>>>>> On May 1, 2012, at 18:41 , Gavin Carothers wrote:
>>>>>>>
>>>>>>>> On Tue, May 1, 2012 at 6:46 AM, Richard
>>>>>>>> Cyganiak<richard@cyganiak.de>   wrote:
>>>>>>>>> All,
>>>>>>>>>
>>>>>>>>> The 2004 WG worked under the assumption that the
>>>>>>>>> future of HTML was XHTML, and that the use case of
>>>>>>>>> shipping HTML markup fragments as RDF payloads would
>>>>>>>>> be addressed by rdf:XMLLiteral. But in 2012, shipping
>>>>>>>>> HTML fragments really means HTML5. Is rdf:XMLLiteral
>>>>>>>>> still adequate for this task? Is a new datatype with
>>>>>>>>> a lexical space consisting of HTML5 fragments needed?
>>>>>>>>> This question is ISSUE-63.
>>>>>>>>>
>>>>>>>>> I think it would be useful to have a straw poll
>>>>>>>>> sometime soon on this question:
>>>>>>>>>
>>>>>>>>> PROPOSAL: RDF-WG will work on an HTML datatype that
>>>>>>>>> would be defined in RDF Concepts.
>>>>>>>>
>>>>>>>> +1, and for internationalization should be a required
>>>>>>>> datatype, might also have a simple syntax in Turtle
>>>>>>>> (though would likely require a new last call but a Web
>>>>>>>> formating that doesn't understand HTML doesn't seem
>>>>>>>> like much of a web format)
>>>>>>>>
>>>>>>>>>
>>>>>>>>> If there is general support for this, then we could
>>>>>>>>> start work on the details of the datatype definition
>>>>>>>>> (lexical space, value space, L2V mapping and so on).
>>>>>>>>>
>>>>>>>>> All the best, Richard
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153
>>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF:
>>>>> http://www.ivan-herman.net/foaf.rdf
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF:
>>> http://www.ivan-herman.net/foaf.rdf
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
Received on Wednesday, 2 May 2012 20:48:18 UTC