RE: issue-73 (Re: Comment on ITS 2.0 WD-its20-20121206 - NLP Interchange Format (NIF) - 1.0/2.0, Canonical XML, Unicode Normalization Forms)

Hi Felix,

Thanks for the effort you put into this.

The work/solutions to which you refer in your mail resolve my three comments.

Cheers,
Christian

From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Freitag, 22. Februar 2013 14:49
To: Lieske, Christian
Cc: public-multilingualweb-lt-comments@w3.org
Subject: issue-73 (Re: Comment on ITS 2.0 WD-its20-20121206 - NLP Interchange Format (NIF) - 1.0/2.0, Canonical XML, Unicode Normalization Forms)

Hi Christian, all,

I have spoken to Sebastian Hellmann, see below. The below has some additional replies I had sent, but hadn't get feedback from Christian yet (or I may have missed it).

Am 10.01.13 10:50, schrieb Lieske, Christian:
Hi,

Please find below comments/observations/questions/ideas concerning the ITS 2.0 working draft dated December 6, 2012 (http://www.w3.org/TR/2012/WD-its20-20121206/).  Please feel free to contact me for clarifications if anything is unclear.

The objectives of the NLP Interchange Format (NIF) - such as interoperability between Natural Language Processing (NLP) tools, language resources and annotations, and easy conversion to Resource Description Format (RDF) - from my point of view are important ones. Accordingly, relating ITS 2.0  - with its direction to move ITS 1.0 closer to Natural Language Processing (NLP) - to NIF may help to realize synergies.

While looking at the relation between ITS 2.0 and NIF in the current Working Draft (WD), I have come up with the observations/questions below. I apologize in advance if a reply to this comment may require that discussions which presumably already took place may have to be summarized.

1. Does the WD refer to NIF 1.0, or 2.0? NIF 2.0 already seems to be under development.



2. I am a bit unsure about the approval procedure, the official status, and the organizational home of NIF 1.0 (and NIF 2.0). My assumption is that the LOD2 Consortium declared NIF 1.0 as finished, and hasn't handed it over to an accredited standardization organization such as ISO.


Sebastian will provide NIF2.0 under a stable CCBY license and will have it hosted with a persistence policy by University. of Leipzig. This will not be hosting by a standards body, but the licensign will allow re-use of NIF, and the hosting will provide stability.

Not 100% related, but FYI: Sebastian will provide - in addition to my implementation of
http://www.w3.org/TR/2012/WD-its20-20121206/#conversion-to-nif
an additional implementation of the conversion. So we won't have to declare this as feature at risk.





3. Wouldn't the ITS2NIF mapping benefit from/need the following as prerequisites?



a. Input and output have to be Canonical XML (for XML-based formats)



b. Input and output have to consider Unicode Normalization Forms/Unicode Equivalence (e.g. so that the algorithm does produce identical results for sentences that contain "Äffin" and "A\u0308ffin")

A few weeks ago I had provided an answer to normalization (which I would like to extend to Canonical XML) - taken from
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0210.html

[

<trackbot> Created ACTION-430 - Draft text explaining
    importance of Unicode normalization and best practices on
    ISSUE-67 [on Shaun McCance - due 2013-02-04].

>FS: Christian, would such a BP note also help with your concerns about the NIF conversion? See your comment 3a at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0101.html
in that you mail "have to consider". But you don't say "require", "make a testable assertation", "provide tests" etc. Can you clarify whether a note would be sufficent?
Also as a reply to issue-85, 3b: if your answer to my question is "require", "make a testable assertation": why? We of course won't to be good citiczens with regards to normalization, but why require more than XQuery, XPath, HTML5, SPARQL ...?

]

One additional thought to this: from my implementation experience, normalization or caniconalization are not the problem. It is white space handling. And for this we have a note already
http://www.w3.org/TR/2012/WD-its20-20121206/#conversion-to-nif
"It is recommended to normalize whitespace in the input XML/HTML/DOM in order to minimize such phantom predicates."

Christian, would the above resolve the three comments?

Best,

Felix

Received on Monday, 25 February 2013 11:11:07 UTC