notation3.py language tags

Hi,

the N3 parser is not able to deal with language tags, as described by
BCP 47 [1].
Currently the parser only checks for well-formedness and not validity.
Therefore I suggest to apply the patch in the attachment which works in
the same way but that can handle special language tag constructs too [2].

Examples:
@zh-min-nan
@en-GB-boont-r-extended-sequence-x-private

General syntax for language tags:
language-extlang-script-region-variant-extension-privateuse


Patch for notation3.py (rev 1.201) from CVS tested with cwm 1.197 (from
cwm-1.2.1):

--- notation3.py.orig   2011-12-09 11:56:44.000000000 +0100
+++ notation3.py        2011-12-09 15:02:54.000000000 +0100
@@ -99,7 +99,7 @@
 number_syntax =
re.compile(r'(?P<integer>[-+]?[0-9]+)(?P<decimal>\.[0-9]+)?(?P<exponent>e[-+]?[0-9]+)?')
 digitstring = re.compile(r'[0-9]+')             # Unsigned integer
 interesting = re.compile(r'[\\\r\n\"]')
-langcode = re.compile(r'[a-zA-Z0-9]+(-[a-zA-Z0-9]+)?')
+langcode = re.compile(r'[a-zA-Z]+(-[a-zA-Z0-9]+){0,7}')
 #"


Best regards,
Andreas Radinger

[1] http://www.rfc-editor.org/rfc/bcp/bcp47.txt
[2] http://www.w3.org/International/articles/language-tags/

-- 
Dipl.-Ing. Andreas Radinger
Professur für Allgemeine BWL, insbesondere E-Business
e-business & web science research group
Universität der Bundeswehr München
 
e-mail: andreas.radinger@unibw.de
www:    http://www.unibw.de/ebusiness/

Received on Monday, 12 December 2011 12:32:03 UTC