Re: [ISSUE-29][ACTION-164] ITS2NIF2ITS - RDF roundtrip

Hi Jirka,
thanks, for your feedback. I thought it was a requirement that the DOM 
should not be touched. I really never had any whitespace problems in any 
RDF serialization formats, so this was new to me. By the way, I can 
understand now, what your problem with the bloated mapping is. We really 
don't need to serialize it. Actually it can be kept in memory, which is 
more efficient. I added serialization as optional. Also I made an XML 
version, because for transferring such kind of data, XML is much better 
suited. (Is the XML alright?)  I made all the changes you suggested, the 
new version is online here:

all the best,

Am 09.08.2012 11:59, schrieb Jirka Kosek:
> On 9.8.2012 11:47, Sebastian Hellmann wrote:
>> you found an interesting point.
>> I wrote some notes on the optimization:
>> I think, it  generally depends on the use case, whether you would
>> optimize.  Do you think we should specify/limit what optimizations are
>> possible?
>> It might be easier to explain implications to help developers,
>> but leave the implementation under-specified.
>> Do you think I should remove them from the algorithm description and
>> move them to a completely different section? Would this help the
>> structure of the document?
> I think that NIF mapping is so unnatural as is that optimization can
> make it really messy. If the goal of optimization was to create less
> complex RDF representation with not blank text nodes and trimmed text
> nodes with a lot of whitespace I can think that easier and workable
> approach would be to:
> - remove all whitespace optimization from mapping algorithm
> - saying that algorithm can produce a lot of "phantom" predicates from
> excessive whitespace
> - recommending to normalize whitespace in the input XML/HTML/DOM in
> order to minimize such phantom predicates
> This way each user/application can create custom whitespace
> normalization based on nature of input data and we don't have to care
> about it.
> For example for your sample document it is safe (knowing HTML whitespace
> handling rules) to normalize it to
> <html><body><h2 translate = "yes" >Welcome to <span
> its-disambig-ident-ref = "” translate
> = "no">Dublin</span> in <b translate="no">Ireland</b>!</h2></body></html>
> (Actually one line with no excessive whitespace.)
> Does this sounds reasonable to my SemWeb-educated friends?
>    Jirka

Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
   * (Leipzig, Sept. 23-24-25, 2012)
   * (*Deadline: July 31st 2012*)
Projects: ,
Research Group:

Received on Thursday, 9 August 2012 11:06:32 UTC