- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Thu, 03 May 2012 10:27:33 +0100
- To: Richard Cyganiak <richard@cyganiak.de>
- CC: public-rdf-wg@w3.org
On 03/05/12 09:19, Richard Cyganiak wrote: > Hi Andy, > > It sounds like you'd rather prefer an HTML datatype with a simple 1:1 > correspondence between lexical space and value space. I think that's a viable approach, yes. > Your objection seems to be that something more complex isn't really > needed. Which might be true, but do you think that something more > complex would actually do any harm, and would be worse? I'm not objecting. I'm simply putting forward a case because I felt that the conversation was heading to infoset-value without much consideration of usage. The primary UC is passing around display fragments. Better dc:title. One (implementation) argument is that some systems only have DOM access. Another is that other systems don't have an HTML5 parser at all. Given experiences of rdf:XMLLiterals, not just the fact they are hard-wired into RDF, it is not obvious, to me at least, that a complex scheme is a good idea. > And is this preference for a simpler scheme from an implementer's > point of view, or is it from a WG resources/spec complexity point of > view, or something else? Yes (implementation generally). If people in the WG want to spend time on infoset-value, that's fine. Andy > > Thanks, Richard > > > On 2 May 2012, at 21:47, Andy Seaborne wrote: >> On 02/05/12 20:29, Richard Cyganiak wrote: >>> On 2 May 2012, at 19:15, Andy Seaborne wrote: >>>> I think I'm saying, start simple, prove a need for more >>>> complicated. >>>> >>>> We can define a value space that is all character sequences >>>> (and is disjoint from xsd:string). Do we need to be more >>>> complicated? What's the use case? >>> >>> One use case might be RDFa parsers with HTML literal support. >>> >>> Let's say you have @datatype="rdf:HTMLLiteral" on some element, >>> and the element contains text with markup, and the desire is that >>> the resulting HTML literal contains the text with markup intact. >>> >>> Now the RDFa parser may not have access to the actual HTML >>> string, but only to a representation that has already been parsed >>> into a DOM tree. >>> >>> So the parser may have to serialize the DOM into a string, which >>> would probably be different from the original string. >> >> Certainly something to consider. >> >> Thought: if the original string isn't available, does it matter? >> Will it be available to anyone else? >> >>> >>> (Or is this nonsense and the parser could always just do >>> myDOMElement.innerHTML to get the original HTML?) >> >> I'm insufficiently up with the tool space to know. (gavin?) >> >>> >>> Anyways, the advantage of having a value space that is isomorphic >>> to the DOM is that you can parse and re-serialize the HTML and >>> still get the same value. >>> >>>> (Not all RDF systems have access to info set support code now >>>> that we are standardising Turtle and N-triples.) >>> >>> Yeah and that's why we're trying to change rdf:XMLLiteral to make >>> it optional and to relax its lexical space. >>> >>> I imagine that rdf:HTMLLiteral would be optional too, and the >>> lexical space should certainly be as unrestrictive as possible. >>> >>> Only those who want to compare HTML literals, or those who *need* >>> to parse and re-serialize HTML literals, need to care what the >>> value space is. (And yeah, if we can't come up with evidence that >>> some systems need to do one of those, then there's little point >>> in defining anything more complicated than a 1:1 L2V mapping.) >> >> Comparison may be done in another system - these literals are >> published and ingested by another system that might be asked if two >> literals are the same. e.g. a reasoner or a SPARQL engine. >> Whether the ability to value-equals two literals with different >> lexical forms is sufficiently important, I can't say. >> >> I feel that this isn't that likely - HTML5 literals are display >> material to be passed about. For that, equality processing is >> unlikely, and the fragments go in and come out on on some generated >> HTML. >> >> Andy >> >> >>> >>> Best, Richard >>> >>> >>> >>>> >>>> Andy >>>> >>>>> >>>>> Ivan >>>>> >>>>>> Best, Richard >>>>>> >>>>>> >>>>>> >>>>>>>> And I guess in theory, DOMs and XML Infosets should be >>>>>>>> isomorphic, no? >>>>>>> >>>>>>> In theory:-) To be checked. There may be corner cases. >>>>>>> >>>>>>>> >>>>>>>> Between all these transformations, there should be >>>>>>>> something that works for us. The devil is in the >>>>>>>> details of course. >>>>>>> >>>>>>> Exactly... >>>>>>> >>>>>>>> >>>>>>>> Or we could just avoid all of that trouble and simply >>>>>>>> define the value space of the HTML datatype as >>>>>>>> identical to the lexical space. >>>>>>> >>>>>>> And then we are back to the same issue as we had with >>>>>>> XML Literals. Except that... there is no such thing as a >>>>>>> formal canonical HTML5 >>>>>>> >>>>>>> Ivan >>>>>>> >>>>>>>> >>>>>>>> Best, Richard >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Just some food for thoughts... >>>>>>>>> >>>>>>>>> Ivan >>>>>>>>> >>>>>>>>> >>>>>>>>> On May 1, 2012, at 18:41 , Gavin Carothers wrote: >>>>>>>>> >>>>>>>>>> On Tue, May 1, 2012 at 6:46 AM, Richard >>>>>>>>>> Cyganiak<richard@cyganiak.de> wrote: >>>>>>>>>>> All, >>>>>>>>>>> >>>>>>>>>>> The 2004 WG worked under the assumption that the >>>>>>>>>>> future of HTML was XHTML, and that the use case >>>>>>>>>>> of shipping HTML markup fragments as RDF payloads >>>>>>>>>>> would be addressed by rdf:XMLLiteral. But in >>>>>>>>>>> 2012, shipping HTML fragments really means HTML5. >>>>>>>>>>> Is rdf:XMLLiteral still adequate for this task? >>>>>>>>>>> Is a new datatype with a lexical space consisting >>>>>>>>>>> of HTML5 fragments needed? This question is >>>>>>>>>>> ISSUE-63. >>>>>>>>>>> >>>>>>>>>>> I think it would be useful to have a straw poll >>>>>>>>>>> sometime soon on this question: >>>>>>>>>>> >>>>>>>>>>> PROPOSAL: RDF-WG will work on an HTML datatype >>>>>>>>>>> that would be defined in RDF Concepts. >>>>>>>>>> >>>>>>>>>> +1, and for internationalization should be a >>>>>>>>>> required datatype, might also have a simple syntax >>>>>>>>>> in Turtle (though would likely require a new last >>>>>>>>>> call but a Web formating that doesn't understand >>>>>>>>>> HTML doesn't seem like much of a web format) >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> If there is general support for this, then we >>>>>>>>>>> could start work on the details of the datatype >>>>>>>>>>> definition (lexical space, value space, L2V >>>>>>>>>>> mapping and so on). >>>>>>>>>>> >>>>>>>>>>> All the best, Richard >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead >>>>>>>>> Home: http://www.w3.org/People/Ivan/ mobile: >>>>>>>>> +31-641044153 FOAF: >>>>>>>>> http://www.ivan-herman.net/foaf.rdf >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home: >>>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 >>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home: >>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF: >>>>> http://www.ivan-herman.net/foaf.rdf >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >
Received on Thursday, 3 May 2012 09:28:09 UTC