Comment on ITS 2.0 WD-its20-20121206 - NLP Interchange Format (NIF) - 1.0/2.0, Canonical XML, Unicode Normalization Forms


Please find below comments/observations/questions/ideas concerning the ITS 2.0 working draft dated December 6, 2012 (  Please feel free to contact me for clarifications if anything is unclear.

The objectives of the NLP Interchange Format (NIF) - such as interoperability between Natural Language Processing (NLP) tools, language resources and annotations, and easy conversion to Resource Description Format (RDF) - from my point of view are important ones. Accordingly, relating ITS 2.0  - with its direction to move ITS 1.0 closer to Natural Language Processing (NLP) - to NIF may help to realize synergies.

While looking at the relation between ITS 2.0 and NIF in the current Working Draft (WD), I have come up with the observations/questions below. I apologize in advance if a reply to this comment may require that discussions which presumably already took place may have to be summarized.

1. Does the WD refer to NIF 1.0, or 2.0? NIF 2.0 already seems to be under development.

2. I am a bit unsure about the approval procedure, the official status, and the organizational home of NIF 1.0 (and NIF 2.0). My assumption is that the LOD2 Consortium declared NIF 1.0 as finished, and hasn't handed it over to an accredited standardization organization such as ISO.

3. Wouldn't the ITS2NIF mapping benefit from/need the following as prerequisites?

a. Input and output have to be Canonical XML (for XML-based formats)

b. Input and output have to consider Unicode Normalization Forms/Unicode Equivalence (e.g. so that the algorithm does produce identical results for sentences that contain "Äffin" and "A\u0308ffin")

Best regards,


Received on Thursday, 10 January 2013 09:51:15 UTC