W3C home > Mailing lists > Public > public-multilingualweb-lt-comments@w3.org > February 2013

issue-73 (Re: Comment on ITS 2.0 WD-its20-20121206 - NLP Interchange Format (NIF) - 1.0/2.0, Canonical XML, Unicode Normalization Forms)

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 22 Feb 2013 14:48:47 +0100
Message-ID: <5127773F.2010704@w3.org>
To: "Lieske, Christian" <christian.lieske@sap.com>
CC: "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>
Hi Christian, all,

I have spoken to Sebastian Hellmann, see below. The below has some 
additional replies I had sent, but hadn't get feedback from Christian 
yet (or I may have missed it).

Am 10.01.13 10:50, schrieb Lieske, Christian:
>
> Hi,
>
> Please find below comments/observations/questions/ideas concerning the 
> ITS 2.0 working draft dated December 6, 2012 
> (http://www.w3.org/TR/2012/WD-its20-20121206/). Please feel free to 
> contact me for clarifications if anything is unclear.
>
> The objectives of the NLP Interchange Format (NIF) -- such as 
> interoperability between Natural Language Processing (NLP) tools, 
> language resources and annotations, and easy conversion to Resource 
> Description Format (RDF) -- from my point of view are important ones. 
> Accordingly, relating ITS 2.0  - with its direction to move ITS 1.0 
> closer to Natural Language Processing (NLP) - to NIF may help to 
> realize synergies.
>
> While looking at the relation between ITS 2.0 and NIF in the current 
> Working Draft (WD), I have come up with the observations/questions 
> below. I apologize in advance if a reply to this comment may require 
> that discussions which presumably already took place may have to be 
> summarized.
>
> 1. Does the WD refer to NIF 1.0, or 2.0? NIF 2.0 already seems to be 
> under development.
>
> 2. I am a bit unsure about the approval procedure, the official 
> status, and the organizational home of NIF 1.0 (and NIF 2.0). My 
> assumption is that the LOD2 Consortium declared NIF 1.0 as finished, 
> and hasn't handed it over to an accredited standardization 
> organization such as ISO.
>


Sebastian will provide NIF2.0 under a stable CCBY license and will have 
it hosted with a persistence policy by University. of Leipzig. This will 
not be hosting by a standards body, but the licensign will allow re-use 
of NIF, and the hosting will provide stability.

Not 100% related, but FYI: Sebastian will provide - in addition to my 
implementation of
http://www.w3.org/TR/2012/WD-its20-20121206/#conversion-to-nif
an additional implementation of the conversion. So we won't have to 
declare this as feature at risk.

> 3. Wouldn't the ITS2NIF mapping benefit from/need the following as 
> prerequisites?
>
> a. Input and output have to be Canonical XML (for XML-based formats)
>

> b. Input and output have to consider Unicode Normalization 
> Forms/Unicode Equivalence (e.g. so that the algorithm does produce 
> identical results for sentences that contain "Äffin" and "A\u0308ffin")
>

A few weeks ago I had provided an answer to normalization (which I would 
like to extend to Canonical XML) - taken from
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0210.html

[

<trackbot> Created ACTION-430 - Draft text explaining

importance of Unicode normalization and best practices on

ISSUE-67 [on Shaun McCance - due 2013-02-04].

 >FS: Christian, would such a BP note also help with your concerns about 
the NIF conversion? See your comment 3a at

http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0101.html

in that you mail "have to consider". But you don't say "require", "make 
a testable assertation", "provide tests" etc. Can you clarify whether a 
note would be sufficent?

Also as a reply to issue-85, 3b: if your answer to my question is 
"require", "make a testable assertation": why? We of course won't to be 
good citiczens with regards to normalization, but why require more than 
XQuery, XPath, HTML5, SPARQL ...?


]

One additional thought to this: from my implementation experience, 
normalization or caniconalization are not the problem. It is white space 
handling. And for this we have a note already
http://www.w3.org/TR/2012/WD-its20-20121206/#conversion-to-nif
"It is recommended to normalize whitespace in the input XML/HTML/DOM in 
order to minimize such phantom predicates."

Christian, would the above resolve the three comments?

Best,

Felix
Received on Friday, 22 February 2013 13:49:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 22 February 2013 13:49:17 GMT