W3C home > Mailing lists > Public > public-rdf-wg@w3.org > May 2012

Re: Adding a datatype for HTML literals to RDF (ISSUE-63)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 3 May 2012 09:19:35 +0100
Cc: public-rdf-wg@w3.org
Message-Id: <F002808A-0527-4627-9C42-82534107F5F2@cyganiak.de>
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Hi Andy,

It sounds like you'd rather prefer an HTML datatype with a simple 1:1 correspondence between lexical space and value space.

Your objection seems to be that something more complex isn't really needed. Which might be true, but do you think that something more complex would actually do any harm, and would be worse?

And is this preference for a simpler scheme from an implementer's point of view, or is it from a WG resources/spec complexity point of view, or something else?

Thanks,
Richard


On 2 May 2012, at 21:47, Andy Seaborne wrote:
> On 02/05/12 20:29, Richard Cyganiak wrote:
>> On 2 May 2012, at 19:15, Andy Seaborne wrote:
>>> I think I'm saying, start simple, prove a need for more
>>> complicated.
>>> 
>>> We can define a value space that is all character sequences (and is
>>> disjoint from xsd:string).  Do we need to be more complicated?
>>> What's the use case?
>> 
>> One use case might be RDFa parsers with HTML literal support.
>> 
>> Let's say you have @datatype="rdf:HTMLLiteral" on some element, and
>> the element contains text with markup, and the desire is that the
>> resulting HTML literal contains the text with markup intact.
>> 
>> Now the RDFa parser may not have access to the actual HTML string,
>> but only to a representation that has already been parsed into a DOM
>> tree.
>> 
>> So the parser may have to serialize the DOM into a string, which
>> would probably be different from the original string.
> 
> Certainly something to consider.
> 
> Thought: if the original string isn't available, does it matter?   Will it be available to anyone else?
> 
>> 
>> (Or is this nonsense and the parser could always just do
>> myDOMElement.innerHTML to get the original HTML?)
> 
> I'm insufficiently up with the tool space to know.  (gavin?)
> 
>> 
>> Anyways, the advantage of having a value space that is isomorphic to
>> the DOM is that you can parse and re-serialize the HTML and still get
>> the same value.
>> 
>>> (Not all RDF systems have access to info set support code now that
>>> we are standardising Turtle and N-triples.)
>> 
>> Yeah and that's why we're trying to change rdf:XMLLiteral to make it
>> optional and to relax its lexical space.
>> 
>> I imagine that rdf:HTMLLiteral would be optional too, and the lexical
>> space should certainly be as unrestrictive as possible.
>> 
>> Only those who want to compare HTML literals, or those who *need* to
>> parse and re-serialize HTML literals, need to care what the value
>> space is. (And yeah, if we can't come up with evidence that some
>> systems need to do one of those, then there's little point in
>> defining anything more complicated than a 1:1 L2V mapping.)
> 
> Comparison may be done in another system - these literals are published and ingested by another system that might be asked if two literals are the same.  e.g. a reasoner or a SPARQL engine.  Whether the ability to value-equals two literals with different lexical forms is sufficiently important, I can't say.
> 
> I feel that this isn't that likely - HTML5 literals are display material to be passed about.  For that,  equality processing is unlikely, and the fragments go in and come out on on some generated HTML.
> 
> 	Andy
> 
> 
>> 
>> Best, Richard
>> 
>> 
>> 
>>> 
>>> Andy
>>> 
>>>> 
>>>> Ivan
>>>> 
>>>>> Best, Richard
>>>>> 
>>>>> 
>>>>> 
>>>>>>> And I guess in theory, DOMs and XML Infosets should be
>>>>>>> isomorphic, no?
>>>>>> 
>>>>>> In theory:-) To be checked. There may be corner cases.
>>>>>> 
>>>>>>> 
>>>>>>> Between all these transformations, there should be
>>>>>>> something that works for us. The devil is in the details of
>>>>>>> course.
>>>>>> 
>>>>>> Exactly...
>>>>>> 
>>>>>>> 
>>>>>>> Or we could just avoid all of that trouble and simply
>>>>>>> define the value space of the HTML datatype as identical to
>>>>>>> the lexical space.
>>>>>> 
>>>>>> And then we are back to the same issue as we had with XML
>>>>>> Literals. Except that... there is no such thing as a formal
>>>>>> canonical HTML5
>>>>>> 
>>>>>> Ivan
>>>>>> 
>>>>>>> 
>>>>>>> Best, Richard
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Just some food for thoughts...
>>>>>>>> 
>>>>>>>> Ivan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On May 1, 2012, at 18:41 , Gavin Carothers wrote:
>>>>>>>> 
>>>>>>>>> On Tue, May 1, 2012 at 6:46 AM, Richard
>>>>>>>>> Cyganiak<richard@cyganiak.de>   wrote:
>>>>>>>>>> All,
>>>>>>>>>> 
>>>>>>>>>> The 2004 WG worked under the assumption that the
>>>>>>>>>> future of HTML was XHTML, and that the use case of
>>>>>>>>>> shipping HTML markup fragments as RDF payloads would
>>>>>>>>>> be addressed by rdf:XMLLiteral. But in 2012, shipping
>>>>>>>>>> HTML fragments really means HTML5. Is rdf:XMLLiteral
>>>>>>>>>> still adequate for this task? Is a new datatype with
>>>>>>>>>> a lexical space consisting of HTML5 fragments needed?
>>>>>>>>>> This question is ISSUE-63.
>>>>>>>>>> 
>>>>>>>>>> I think it would be useful to have a straw poll
>>>>>>>>>> sometime soon on this question:
>>>>>>>>>> 
>>>>>>>>>> PROPOSAL: RDF-WG will work on an HTML datatype that
>>>>>>>>>> would be defined in RDF Concepts.
>>>>>>>>> 
>>>>>>>>> +1, and for internationalization should be a required
>>>>>>>>> datatype, might also have a simple syntax in Turtle
>>>>>>>>> (though would likely require a new last call but a Web
>>>>>>>>> formating that doesn't understand HTML doesn't seem
>>>>>>>>> like much of a web format)
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> If there is general support for this, then we could
>>>>>>>>>> start work on the details of the datatype definition
>>>>>>>>>> (lexical space, value space, L2V mapping and so on).
>>>>>>>>>> 
>>>>>>>>>> All the best, Richard
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153
>>>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF:
>>>>>> http://www.ivan-herman.net/foaf.rdf
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF:
>>>> http://www.ivan-herman.net/foaf.rdf
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
Received on Thursday, 3 May 2012 08:20:11 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:48 GMT