Re: On constraining/validating datatypes

At 13:42 22/5/97 -0400, Steven J. DeRose wrote:

>a) Do nothing: have no types for #PCDATA, and only the existing attribute
>declared values.

Not an option - users are screaming for datatyping

>b) Define a small, fixed number of atomic types.

Too limiting

>c) Define a language for defining datatypes: regex (say, per POSIX), or
>perhaps HyLex. 

A possibility

>d) Define a way to access *any* programming, scripting, or other language at
>all.

The SGML solution - but could be dangerous in a Web environment unless
supported by a starter-set of useful notations

>I wouldn't mind (d), so long as we require support for regexes as an
>interoperable choice.

This I like

>a) Associate datatypes with data via attributes.

This ignores the real problem - how to data type attribute values.

>b) State the relationships between datatypes and attributes or content right
>with the definitions, for example in header elements that apply for the rest
>of the document. This reduces clutter:
>
>   <datatype-def name="integer"    applies-to="P #PCDATA" 
>                 expr="[0-9]+"     notation="regex">
>   <datatype-def name="letdig"     applies-to="P TYPE" 
>                 expr="[a-z][0-9]" notation="regex">
>   ...
>   <P TYPE="p3">31415926535</P>

The element would have to be one with an XML reserved name, such as
XML-datatype. In your examples they are presumably empty tags as the
expressions are entered as attribute values, but for notations such as
JavaScript and other programming type languages you would probably want it
as content.

>c) In the DTD itself, via an amendment.

Why use an amendment - why not use the existing lextype attribute from the
SGML Extended Facilities annex?

>
><!NOTATION    REGEX      PUBLIC "+//ISBN 0-123-45678-9//POSIX regexes//EN">
>
><!DATATYPE    integer    "[0-9]+"     REGEX>
><!DATATYPE    letdig     "[a-z][0-9]" REGEX>
><!-- like entity dcls, one could allow the value to be an external ID, not
>just a literal -->
>
>i) A new DATATYPE declaration patterned after HyTime's lexical type
>definition AF (this does not introduce any broad dependency on HyTime, since
>the lexical typing is well modularized).
>
You don't need a new markup declaration for datatypes - we can use LEXTYPE.
The SGML Extended Facilities annex already shows how lexical types can be
declared in a reusable document instance that can be referred to by many DTDs
using a <?IS10744 LEXUSE processing instruction. (XML should not invent its
own set of cogs - it should find more efficient ways of using existing SGML
ones!)

>ii) An optional (lextype-name) suffix on attribute declared values (at least
>CDATA) and on the keyword #PCDATA. I believe there is no syntax conflict
>with () in either place; if I missed one, some other delimiter could of
>course be substituted. The declared value name and/or #PCDATA keywords could
>of course be replaced rather than suffixed by the lexical type name, for
>example by #DATATYPE(name).

You don't need a new construct for identifying the lextype of attributes in
declared values (this would require a change to SGML before becoming valid).
Simply adopt the lextype attribute of HyTime. This allows you to associate
different lexical types with different attributes and with content, e.g. 

<!ATTLIST element-x 
a CDATA #REQUIRED
b CDATA #IMPLIED
c CDATA "default"
lextype CDATA #FIXED
"#CONTENT     lextype-x
 a                      lextype-a
 b                      lextype-b
 c`                     lextype-c">

You also need a way for users to override the default lextype in certain
circumstances. Hence my earlier suggestion to allow PI's with notation (or
lexical type) names to <element-x a="<?XML-LEX my-lex?>value">.
----
Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK 
Phone/Fax: +44 1452 714029   WWW home page: http://www.sgml.u-net.com/

Received on Friday, 23 May 1997 06:31:01 UTC