Re: samples directory (action 20220215-01)

Norm Tovey-Walsh writes:

> "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> writes:
>> In fulfillment of the action I took today, I have created an
>> ixml/samples directory (the action said ixml/grammars but I don't think
>> anyone cared much about the name, and 'samples' seemed better when
>> thought about it. Franklin Delano Roosevelt agreed when I consulted
>> him.)

> Cool. I have a suggestion, however. Instead of putting all the grammars
> in the samples directory, I suggest that each sample have its own
> directory. That means it can (and should) have its own README and
> possibly a sample input or two.

Good idea.  Done.  (I'll add the grammar-specific README files later.)

>> For that matter, I found writing an ixml grammar to require the correct
>> check digit also more challenging than I had expected.  If time allows,
>> I expect to add ISBN-10 and ISSN to the grammar as well.)

> I wonder how hard it would be to add logical assertions as annotations
> in the grammar. “Reject this if the following extra-grammatical
> condition doesn’t hold…”

Possible with an attribute-grammar system -- those kinds of logical
constraints are well represented in what I've read about attribute
grammars.

In ixml, possible with pragmas.  Being able to experiment with that kind
of thing in ixml was one of the reasons I wanted ixml to have a well
defined, usable system of pragmas.  

> Writing grammars such that the check digits are enforced by the grammar
> may be amusing, but it doesn’t strike me as exceptionally practical in
> the general case.

It may depend on how general the general case needs to be.  I think the
ISBN example illustrates a way in which finite state automata, and thus
regular grammars, are easier to use than regular expressions.  (And it
also illustrates that regular languages are more powerful than one might
suspect.  The principle of least power says not to use a mechanism more
powerful than necessary to solve a problem -- that encourages me to
think it's worth while showing that we can solve with pure regular
grammars problems like check digits or date validation for which I
suspect most people would reach immediately for a Turing-complete
programming language.)

>> What do people think about
>>
>>   - Mail headers (RFC 822 and successors)
>>   
>>   - ...
>>
>> Are those worth trying to find grammars for and/or create ixml grammars
>> for?
>
> Absolutely!
>
> Are library MARC (MARK?) records a possibility?

The overall structure of MARC records is fairly simple; describing the
rules for the use of specific subfield indicators (or possibly even just
checking that a particular MARC field uses only the subfield labels
defined for that field) might be challenging.  Getting access to MARC
records in a format that could be parsed might be practically
challenging.  Worth looking into, especially if we find that there are
sources of MARC records that don't already have an XML export function
built in.

Maybe BibTeX data would work, too.

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Wednesday, 16 February 2022 13:46:35 UTC