- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 9 Aug 2012 13:30:40 +0200
- To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Cc: Jirka Kosek <jirka@kosek.cz>, public-multilingualweb-lt@w3.org
- Message-ID: <CAL58czrfu55tEA-1_Rv+1=B2FG3KToSOxw1DJ_Eqrz+V7+-OUw@mail.gmail.com>
Hi Sebastian, all, I tried to create the NIF output (since we need two implementations) for <html xmlns:its="http://www.w3.org/2005/11/its"> <body> <h2 its:translate="yes">Welcome to <span its:translate="no" >Dublin</span> in <b its:translate="no">Ireland</b>! </h2> </body> </html> (I used an XML input here, but otherwise this is the same like your example in the wiki. Does the below output make sense? I am sure that the uuid is wrong, but I don't know how to generate one. [ @prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#>. @prefix str: <http://nlp2rdf.lod2.eu/schema/string/>. @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. <http://example.com/exampledoc.html#offset_0_50> str:referenceContext <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#0_50>; a <str:String>; itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>. <http://example.com/exampledoc.html#offset_14_44> str:referenceContext <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#14_44>; a <str:String>; itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>. <http://example.com/exampledoc.html#offset_25_31> str:referenceContext <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_31>; a <str:String>; itsrdf:translate "no"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>. <http://example.com/exampledoc.html#offset_25_32> str:referenceContext <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_32>; a <str:String>; itsrdf:translate "no"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>. <http://example.com/exampledoc.html#offset_5_49> str:referenceContext <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#5_49>; a <str:String>; itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>. <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#0_50> str:isString "\r\n \r\n Welcome to Dublin in Ireland! \r\n \r\n"; str:occursIn <http://example.com/exampledoc.html>; a <str:Context>. <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#14_44> str:isString "Welcome to Dublin in Ireland! "; str:occursIn <http://example.com/exampledoc.html>; a <str:Context>. <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_31> str:isString "Dublin"; str:occursIn <http://example.com/exampledoc.html>; a <str:Context>. <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_32> str:isString "Ireland"; str:occursIn <http://example.com/exampledoc.html>; a <str:Context>. <urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#5_49> str:isString "\r\n Welcome to Dublin in Ireland! \r\n "; str:occursIn <http://example.com/exampledoc.html>; a <str:Context>. ] Thanks, Felix 2012/8/9 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> > Hi Jirka, > thanks, for your feedback. I thought it was a requirement that the DOM > should not be touched. I really never had any whitespace problems in any > RDF serialization formats, so this was new to me. By the way, I can > understand now, what your problem with the bloated mapping is. We really > don't need to serialize it. Actually it can be kept in memory, which is > more efficient. I added serialization as optional. Also I made an XML > version, because for transferring such kind of data, XML is much better > suited. (Is the XML alright?) I made all the changes you suggested, the > new version is online here: > http://wiki.nlp2rdf.org/index.**php?title=ITS2NIF2ITS&oldid=**622#Example<http://wiki.nlp2rdf.org/index.php?title=ITS2NIF2ITS&oldid=622#Example> > > all the best, > Sebastian > > > Am 09.08.2012 11:59, schrieb Jirka Kosek: > > On 9.8.2012 11:47, Sebastian Hellmann wrote: >> >> you found an interesting point. >>> >>> I wrote some notes on the optimization: >>> http://wiki.nlp2rdf.org/wiki/**ITS2NIF2ITS#Notes_on_optional_** >>> optimizations<http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS#Notes_on_optional_optimizations> >>> http://wiki.nlp2rdf.org/index.**php?title=ITS2NIF2ITS&oldid=** >>> 614#Notes_on_optional_**optimizations<http://wiki.nlp2rdf.org/index.php?title=ITS2NIF2ITS&oldid=614#Notes_on_optional_optimizations> >>> >>> I think, it generally depends on the use case, whether you would >>> optimize. Do you think we should specify/limit what optimizations are >>> possible? >>> It might be easier to explain implications to help developers, >>> but leave the implementation under-specified. >>> Do you think I should remove them from the algorithm description and >>> move them to a completely different section? Would this help the >>> structure of the document? >>> >> I think that NIF mapping is so unnatural as is that optimization can >> make it really messy. If the goal of optimization was to create less >> complex RDF representation with not blank text nodes and trimmed text >> nodes with a lot of whitespace I can think that easier and workable >> approach would be to: >> >> - remove all whitespace optimization from mapping algorithm >> >> - saying that algorithm can produce a lot of "phantom" predicates from >> excessive whitespace >> >> - recommending to normalize whitespace in the input XML/HTML/DOM in >> order to minimize such phantom predicates >> >> This way each user/application can create custom whitespace >> normalization based on nature of input data and we don't have to care >> about it. >> >> For example for your sample document it is safe (knowing HTML whitespace >> handling rules) to normalize it to >> >> <html><body><h2 translate = "yes" >Welcome to <span >> its-disambig-ident-ref = "http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>” >> translate >> = "no">Dublin</span> in <b translate="no">Ireland</b>!</** >> h2></body></html> >> >> (Actually one line with no excessive whitespace.) >> >> Does this sounds reasonable to my SemWeb-educated friends? >> >> Jirka >> >> > > -- > Dipl. Inf. Sebastian Hellmann > Department of Computer Science, University of Leipzig > Events: > * http://sabre2012.infai.org/**mlode <http://sabre2012.infai.org/mlode>(Leipzig, Sept. 23-24-25, 2012) > * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*) > Projects: http://nlp2rdf.org , http://dbpedia.org > Homepage: http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann> > Research Group: http://aksw.org > > > -- Felix Sasaki DFKI / W3C Fellow
Received on Thursday, 9 August 2012 11:31:18 UTC