Issue with unbounded alphabets in datatypes

Hi all,

I just wanted to raise a discussion about the currently proposed
assumption that the alphabet of the String-based datatypes is
unbounded. For implementors, it would be quite convenient if standard
packages for regular expressions and automata can be used to support
the pattern facet of String-based datatypes. To the best of my
knowledge, however, existing implementations usually assume a bounded
input alphabet (where complementation and determinisation of the
automaton are well-understood), which means that OWL 2 reasoner
implementors have to implement their own package to handle regular
expressions over unbounded alphabets. In the current version of
HermiT, we use, for example, the implementation from
www.brics.dk/automaton, which supports the current unicode standard,
but is not conformant with the proposed OWL 2 standard due to the
bounded alphabet.
My search for results for finite automata over unbounded alphabets
didn't lead to much and it would be good to really know whether there
are existing implementations of automata over unbounded alphabets and
if not, how involved such an implementation would be, which
theoretical results are available, etc.

If an unbounded alphabet turns out to be a huge burden for
implementors, it might be worth to think about viable alternatives.
One could, for example, fix the alphabet either to some current
unicode standard or to a number that allows for several future
extensions of unicode. The disadvantage with this is that in some
point of the future, the chosen bound could be exceeded, which would
lead to some ontologies being decidable in one OWL standard but not in
the other.

Summing up, I just wanted to initiate a broader discussion about this
and make sure that in particular implementors are aware of what the
OWL 2 standard is expecting from conforming implementations.

Best regards,
Birte

-- 
Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529

Received on Monday, 20 October 2008 20:50:23 UTC