- From: Jirka Kosek <jirka@kosek.cz>
- Date: Thu, 09 Aug 2012 11:59:17 +0200
- To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- CC: public-multilingualweb-lt@w3.org
- Message-ID: <502389F5.20107@kosek.cz>
On 9.8.2012 11:47, Sebastian Hellmann wrote: > you found an interesting point. > > I wrote some notes on the optimization: > http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS#Notes_on_optional_optimizations > http://wiki.nlp2rdf.org/index.php?title=ITS2NIF2ITS&oldid=614#Notes_on_optional_optimizations > > I think, it generally depends on the use case, whether you would > optimize. Do you think we should specify/limit what optimizations are > possible? > It might be easier to explain implications to help developers, > but leave the implementation under-specified. > Do you think I should remove them from the algorithm description and > move them to a completely different section? Would this help the > structure of the document? I think that NIF mapping is so unnatural as is that optimization can make it really messy. If the goal of optimization was to create less complex RDF representation with not blank text nodes and trimmed text nodes with a lot of whitespace I can think that easier and workable approach would be to: - remove all whitespace optimization from mapping algorithm - saying that algorithm can produce a lot of "phantom" predicates from excessive whitespace - recommending to normalize whitespace in the input XML/HTML/DOM in order to minimize such phantom predicates This way each user/application can create custom whitespace normalization based on nature of input data and we don't have to care about it. For example for your sample document it is safe (knowing HTML whitespace handling rules) to normalize it to <html><body><h2 translate = "yes" >Welcome to <span its-disambig-ident-ref = "http://dbpedia.org/resource/Dublin” translate = "no">Dublin</span> in <b translate="no">Ireland</b>!</h2></body></html> (Actually one line with no excessive whitespace.) Does this sounds reasonable to my SemWeb-educated friends? Jirka -- ------------------------------------------------------------------ Jirka Kosek e-mail: jirka@kosek.cz http://xmlguru.cz ------------------------------------------------------------------ Professional XML consulting and training services DocBook customization, custom XSLT/XSL-FO document processing ------------------------------------------------------------------ OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member ------------------------------------------------------------------
Received on Thursday, 9 August 2012 09:59:42 UTC