Re: Re-expressing our formalisation of Language from Henry S. Thompson on 2006-09-13 (www-tag@w3.org from September 2006)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Wed, 13 Sep 2006 14:28:29 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: www-tag@w3.org
Message-ID: <f5bhczbx5du.fsf@erasmus.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Pat Hayes writes:

>>A Language is a n-tuple, consisting of
>>  TextSet, a set of strings
>>  InformationSet, a set of infons (intentionally vague)
>
> Too damn vague. We already have no idea what a 'resource' is supposed
> to be or what it means to 'identify' one. Could the Tag please make an
> effort to avoid speaking in tongues, when matters get basic? At least
> give some guidance, or point to some kind of supporting theory or
> external account. Is an 'infon' something like a chunk of information
> (about something? About what?) or is it something more like a part of
> a world or a possible interpretation? Or could it be something like an
> topic, or a thing that some information is about?

Sin in haste, repent at leisure :-)

This entire exercise is contextualised by an interest in the
versioning of languages for the web.  Such languages, and indeed
computationally realised languages in general, present a simpler task
for formalisation than do human languages and logics, in that at one
useful level of abstraction _both_ their syntax _and_ their semantics
are concrete, that is, instantiated in/processable by computational
devices.

For the questions under discussion, then, we're thinking of examples
such as HTML, where a member of TextSet might be

 "<html><body><p>Take <i>that</i>, you brute!</p></body></html>"

and the corresponding member of InformationSet would be

 [an implementation's embodiment of] a box with top margin of ??px and
 ??px of padding, containing the concatenation of 20 boxes drawn from
 ?? font, ?? points.

or a member of TextSet might be

 "<xs:schema xmlns='http://www.w3.org/2001/XMLSchema'>
   <xs:element name='foo/>
  </xs:schema>"

and the corresponding member of InformationSet would be

 [an implementation's embodiment of] a Schema component with one
 member of the [element declarations] property, itself an Element
 Declaration component with [local name] the string "foo", ...

or a member of TextSet might be

 "(gcd 63 42)"

and the corresponding member of InformationSet would be

  [an implementation's embodiment of] the application of the 'gcd'
  function to integer arguments 63 and 42.

Note crucially that the member-of-InformationSet is _not_ "An vivid
injunction" or "the set of XML information items with [local
name]=='foo'" or "21" or "the greatest common divisor of 63 and 42".

That is, we're looking at situations where someone has more-or-less
formally defined a computationally-realised language, and you can
derive a more-or-less detailed story about what the _concrete_,
_computational_ correspondents of the strings of that language are.

We haven't been too specific about the kinds of things a language's
information set can consist of, indeed, we hadn't even given its
members a name.  I utterly repent of and repudiate the 'infon'
suggestion, as it has clearly produced massive confusion :-(.

>>  Interpret, a functional mapping from TextSet to InformationSet,
>>    i.e. a subset of TextSet X InformationSet such that if a,b and c,d
>>    are in Interpret, then a==c implies b==d
>
> Why do you call this 'interpret'? Is this supposed to imply something
> to the effect that 'infons' are interpretations?

Not at all (in the formal logic sense of 'interpretation').

> Main question: Why is this *functional* ?? 

Because almost all (can't actually think of _any_ counter-examples at
the moment) computer languages are unambiguous _in the terms I've
defined above_.  For example, in any given computer language, strings
of numerals map unambiguously to (implementations of) computational
representations of numbers.  Other strings map unambiguously to
(implementations of) representations of symbols.  The fact that those
_symbols_ do not have rigid real-world denotations is _not_ relevant
to the very simplistic formalisation exercise we're engaged in here.

> The most standard notion of meaning that we have, [passionate
> summary of the history of model theory elided]
> . . .
> It is hard to think of a more successful or more widely accepted
> general view of meaning and semantics. So, why are you defining
> terms which not only ignore all this established, successful,
> absolutely standard science, but seem to be actively at odds with
> it?

Because we're not talking about (this rich and powerful sense of)
'meaning' at all!  The word 'meaning' appears nowhere in my post, or
in the diagram it refers to.  There are labels in the diagram, and
terms in my post, which do have definitions in model theory, and that
could be confusing, and we should work to eliminate it.

Or, as Dan Connolly suggested in a subsequent post, we could embrace
that terminology, and at least talk about *interpretation structures*
where I have *InformationSet* and *denotation mapping* where I have
*Mapping*, but I _think_, given the limitation to computational
languages I discussed above I'm not sure that a mathematical logic is
really what we're building here. . .  I'll try to come back to this
question in a subsequent post.

>>If Function is a class with three properties, namely Domain, Range and
>>Mapping, then Language<Function, with TextSet<Domain,
>>InformationSet<Range and Interpret<Mapping.
>>
>>I think it's useful to _also_ say that a Language has zero or more
>>Grammars, which are, informally, expressions of characteristic
>>functions for the TextSet, using e.g. regexps, BNFs, schemas, . . .
>>
>>And that there are zero or more Interpreters, which are, informally,
>>effective computations from members of TextSet to members of
>>InformationSet.
>
> Interpreters *compute* infons? What on earth does that *mean*? Take a
> programming language, for example. LISP interpreters compute functions
> on Sexpressions.

My (terminological) bad again.  What's a better term for the
computation _from_ "(gcd 63 42)" (a string) to an (internal
representation of an) s-expression?

>>Likewise, finally, zero or more Models, which are, informally,
>>expressions of characteristic functions for the InformationSet.
>
> I have no idea what any of this means, and I strongly suspect that it
> does not mean anything at all. Apparently you do not mean 'Model' in
> the sense of "model theory", so what is the terminology supposed to
> suggest? What is a 'characteristic function' (even informally) for an
> 'infon' ?

Again, terminology suggestions welcome, but what I have in mind is a
finite statement of the membership conditions for the InformationSet.
So for scheme we would have a formal constructive definition along the
lines of

 unicode strings are in IS
 int are in IS
 for s a string in IS, (symbol)s is in IS
 for a, d in IS, cons(a,d) is in IS

For HTML there's an informal constructive definition of the so-called
box model.

For W3C XML Schema, there's a near-formal definition in the spirit of
Entity-Relation modelling.

>>Note that the 'expressions of characteristic functions' may be formal,
>>or informal, or a mixture of the two (e.g. "[1-9][0-9]*" plus "the
>>corresponding number per the standard decimal numeral interpretation
>>is prime"
>
> OK, take that last one. What *function* is indicated by that English
> phrase? It seems to be talking about a number, not a function. And
> what kind of 'infon' does it apply to? (Is a number an infon? Are all
> numbers infons? Are all infons numbers?)

That example was indeed confused, but what it was trying to be was an
informal statement of a function from strings of numerals to (should
have been "an implementation's computational embodiment of") the prime
numbers.

With this brush-clearing out of the way, I'll turn to replies to
subsequent messages in this thread.

ht
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFFCAd9kjnJixAXWBoRAglWAJ9/z//KP04oPkhi1D+eAkGoHOAlewCbBVM/
ZC8bR+wff7welPkL+F4ctQc=
=7Jux
-----END PGP SIGNATURE-----
Received on Wednesday, 13 September 2006 13:28:40 UTC