W3C home > Mailing lists > Public > public-esw-thes@w3.org > May 2020

ANN: Finto AI, automated subject indexing service based on Annif

From: Osma Suominen <osma.suominen@helsinki.fi>
Date: Thu, 28 May 2020 17:02:13 +0300
To: public-esw-thes@w3.org
Message-ID: <37b80b8d-64ca-273d-7666-5b193968050b@helsinki.fi>
Dear all,

I'm delighted to report the launch of Finto AI, an automated subject 
indexing service developed by the National Library of Finland. The 
service is based on the Annif tool, which is an open source toolkit we 
have developed over the last three years. I believe this is the first 
mention of Annif on this list, so I will introduce both. The 
relationship between Finto AI and Annif is roughly similar to the 
relationship between Finto and Skosmos - one is the service, the other 
is the open source software that can also be used for other purposes.

Finto AI

Finto has launched an automatic subject indexing service called Finto 
AI. It’s currently available for three languages, Finnish, English and 
Swedish. You can find Finto AI, and more information about it, at 
ai.finto.fi. The web page has a form which can be used to submitted for 
analysis. Finto AI also has an open API that enables integration to 
existing systems.

For over six years Finto has offered thesauri and ontologies to support 
subject indexing. Now, Finto AI brings machine learning and language 
technology solutions to aid in the work as well. Finto AI is based on 
Annif, an automatic indexing tool that has been developed at the 
National Library of Finland.

Annif has been developed and offered as an experimental service for some 
three years. Many users have found it already:  for example at the 
University of Jyväskylä, students submitting their Master’s thesis to 
the JYX repository get suggestions from Annif that they can use or 
modify, then a librarian does a final check. A similar approach is being 
piloted at the University of Vaasa.

In the development process of Annif we have discovered and tested 
several algorithms, and selected the currently best combination for 
Finto AI. The ensemble model based on neural networks has been trained 
with data from Finna discovery service. The development work of Annif is 
ongoing and we will offer updates and improvements to Finto AI accordingly.

More information about Finto AI: https://www.kiwi.fi/x/DYDbCQ

Annif (annif.org) is a Python based open source tool for automated 
subject indexing using a controlled vocabulary such as a thesaurus or 
classification. It integrates many natural language processing 
techniques and machine learning algorithms. It is designed to be 
multilingual and it can support any subject vocabulary (in SKOS or a 
simple TSV format). It can be used either via a command-line interface 
or a microservice-style REST API.

Annif is being developed on GitHub [1]. The GitHub site includes 
extensive usage documentation in the wiki section.

In addition, we have produced a tutorial [2] for getting started with 
Annif. The first Annif tutorial was organized at the SWIB19 conference. 
The materials are also suitable for self-study. At the moment, we are 
planning a virtual Annif tutorial at the DCMI Virtual event in 
September, based on the same materials.


[1] https://github.com/NatLibFi/Annif

[2] https://github.com/NatLibFi/Annif-tutorial

Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
Tel. +358 50 3199529
Received on Thursday, 28 May 2020 14:02:32 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 28 May 2020 14:02:32 UTC