SKOS Computability Levels? XML Serialization? from Christophe Dupriez on 2010-07-16 (public-esw-thes@w3.org from July 2010)

From: Christophe Dupriez <christophe.dupriez@destin.be>
Date: Fri, 16 Jul 2010 13:01:31 +0200
To: SKOS <public-esw-thes@w3.org>
Message-ID: <4C403C0B.9090202@destin.be>
In the discussion about "validation" (including different KQIs:Key 
Quality Indicators or Exception listings), one aspect is very important 
for me as an implementor: computability...

I see that SKOS, compared to ISO 25964 or zThes, is very expandable.
But will it remain computable in PRACTICE? For the big thesauri 
(Agrovoc, Mesh including Substances...) we MUST manage?

To parse SKOS in RDFS, using (sub-)classes and (sub-)properties 
definitions possibilities, you need OWL artillery:
JENA, Sesame/Elmo/AliBaba, Manchester OWL API / Protege SKOSed, others?

I did not tested everything but I am still unaware of an OWL framework 
able to handle BIG thesauri linked with BIG information databases
(with reasonable hardware and response time: my applications are used in 
medical emergencies).

As a (less but still) flexible alternative, I see XSLT as a 
serialization tool for a SKOS file into an XML representation of this 
SKOS data.
For instance, my test of an XSLT to make a nice presentation of a SKOS 
file (http://www.askosi.org/xcss/skosrdf2html.xslt),
a serialization in HTML not XML, I noticed it is easy to make a 
transformation for a RDF flavour (usage pattern) but not for all.

XSLT itself is not very good for very big data file unless you can split 
the data in chunks (transform concept by concept).
A specialized parser would do better.

My proposal: to define "computability levels" for SKOS files (like the 
one existing for OWL)
1) linear: an XML serialization (ISO 25964, zThes or a SKOS XSD to 
standardize) is possible in a linear way (by applying simple 
replacements based on easy pattern matching)
2) serializable but not linear: the whole SKOS file must be read in 
memory to access the necessary data for XML serialization. A generic 
XSLT program is able to do the transformation.
3) limited inference: a specialized XSLT program (which is adapted to 
sub-classes and sub-properties defined in the SKOS file) is able to do 
an adequate and faithful serialization.
4) OWL Lite
5) OWL DL
6) OWL Full
and to implement a tool to check the computability level of any given 
SKOS file.

My opinion is that SKOS is for humans having to efficiently make 
efficient(Search) User Interfaces.
OWL is for humans having to model data to automate (subtle) processes.

Computability is IMHO an important issue for SKOS: when you restart an 
information server, you want it to be ready to serve in seconds, not hours.
Java loads (to get an appropriate memory structure to serve users) an 
faithful and complete AGROVOC XML serialization in 30 seconds (all 
languages).
Can we hope to do that if a reasoner has to infer relations from an 
unsorted bag of RDF definitions?

Does a SKOS validation process should (optionally) generate an XML 
serialization of SKOS definitions for faster processing?
Please find here my proposal for the XML Schema (XSD) definition for 
SKOS serialization:
http://www.askosi.org/ConceptScheme.xsd
A readable version is produced using an XSLT from the XS3P project:
http://www.askosi.org/example/ConceptScheme.xml

This XSLT was very hard to find but the effort was well compensated:
http://sourceforge.net/projects/xs3p/
If we could reach the same quality when displaying a SKOS file !

I would be very happy of your suggestions!

Wishing you all a very nice day!

Christophe
Received on Friday, 16 July 2010 11:02:01 UTC