- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 4 Sep 2006 22:09:33 -0400
- To: "David Orchard" <dorchard@bea.com>
- Cc: www-tag@w3.org
Dave Orchard writes: > I'm not pursuaded that a language doesn't include constraints on the > language. I think the key part is that the set of texts may be > determined by the constraints. I think you're still missing my point. I very much want to turn that around to say: it's the set of texts that's fundamental. The constraints are just a convenient shorthand for letting you know what's in the set. I hope the examples below will explain why. ======== Example: Here's an example to illustrate the difference in the two approaches. Let's say that the language is the set of prime numbers less than 1 million, expressed in the obvious way as character strings: "2", "3", "7", "11" and so on. Your definition of language is: "A Language consists of a set of text, any syntactic constraints on the text, a set of information, any semantic constraints on the information, and the mapping between texts and information. " Let's apply that to this prime number example. I think you intend something like: The Prime number language is (Based on Dave's approach): * Set of texts: Strings (note that this is only a loose bound on the set of texts actually in the primes language) * Syntactic constraints: digits only * Set of information: an integer resulting from the mapping given below * Semantic constraints: each item in the information must have no divisors other than 1 and itself, and each item must be < 1,000,000 * The mapping: the obvious atoi() mapping, such as defined by XML Schema for integer lexical to value mappings. Note that what you're calling the set of texts isn't really the texts in the language; it's just a starting point so that the syntactic constraints can be a bit smaller. That seems messy. If you want to talk about the set of texts that are really in the language, then you need to intersect all the constraints. For reasons explained below, I think it's much cleaner and simpler to leave out all that mechanism, and just go with: The Prime number language is (Noah's preferred formulation): * Set of texts: "2", "3", "7", "11" ... "999983" (I.e. it's really a set, and the set contains exactly the strings in the language, no more and no less. Whether I can conveniently enumerate it is a separate question.) * The mapping: the obvious atoi() mapping, such as defined by XML Schema for integer lexical to value mappings (same as for Dave's) In my preferred formulation, the intensional constraint ("no member of the language may be divisible by any number other than 1 or itself") is just a shorthand for setting out the legal set of texts. The constraints are not a part of the language, but they may be used as a convenient shorthand for specifying which texts are in the language. ========== Why my strong preference? Given this formulation, the test for membership of a text is merely a set membership test. With Dave's, you need to test set membership, then syntactic legality, then map, then test semantics. It has to be harder to reason about all that. Similarly, in my approach, the test for language compatibility can be a simple superset relation on the texts, along with a test that the information mapping is consistent. For example, let's say I want to test whether another language L2 is compatible with the Primes language. L2 is the odd numbers between 2 and 8, again represented as character strings: Dave's definition of L2 would likely be: * Loose bound on set of texts: character strings * Syntactic constraint: digits only * Set of information: an integer resulting from the mapping given below * Semantic constraints: 2<x<8; (x mod 2) == 1 * Mapping: atoi() My definition of L2: * Set of texts: "2", "3", "5", "7" * Mapping: atoi() With my definition, there's a clear answer: L2 is a sublanguage of the Prime Number language and is in that sense compatible. Every text in L2 is in Primes, and each maps to the same information in both languages. Done. I'm not sure how we cleanly express compatibility using the formulation with constraints. The constraints for the two languages are expressed in a totally different way. Did I write that "no divisors in English"? Did I write a loop in Java? Now I have to intersect that with (x mod 2 == 1). We have to start asking all kinds of complicated questions about what it means when we try to compare constraints expressed in this way. I think it's a big mistake to tangle those specification layers into the core definition of what the language is. We're trying to build a simple, firm foundation for determining which languages can safely interoperate. > When a processor determines whether a text is in the language, it > doesn't generate all the texts "in hand" and then compare, it will > look at the constraints and evaluate without having all the texts in > hand. I think any constraints are fundamentally part of the language. You're making a big leap from the mathematical characteristics of languages, the intersection of their texts, etc. to how in practice a processor would be built. I don't think we want to tangle those. Whether two languages have texts in common and have compatible interpretations is not fundamentally a statement about how you build processing sofware. It's a characteristic of the languages, whether interpreted by software or not. I think your point is that for certain languages, enumeration is not the most practical way of defining the membership. Agreed. Indeed, if I eliminated the <1,000,000 in the primes example, enumeration would be impossible. So, a processor will indeed likely use some encoding of the constraint, and when I tell you in an email what the language is I will not list the infinite set of its members. I don't think the core notion of language needs to be something that fits into a finite specification in a computer. As noted above, I am perfectly happy to consider the language of primes to be the infinite set of strings that happen to meet the primeness test, along with their mappings to abstract integers. My whole point is that we can get a lot of mileage from very clean mathematical abstractions like sets of texts and information mappings. I do think we need a chapter on the sort of constraint languages that are used to define such languages for use by software, and the techniques used to evolve the specifications of languages using those constraint-based systems. That's where one would talk about how tools use constraint languages like XML Schema to help tailor processing software, and to aid in checking instances that are to be processed. That's where you'd talk about what a processor would likely "have in hand", I think. > Now I could flip this around and suggest we should go the opposite way > and suggest removing Text Set and Information Set: languages have > semantics, syntax and texts are in a language if they meet the syntax. > Languages also have a mapping between any individual text and an > individual information "item". I'm sorry, but that doesn't parse for me. What do "languages have" and what "are in a language"? I will say that I think sets are a fine mathematical formalism for comparing which texts are in which languages, and also for considering which information is conveyed compatibly in two or more languags. I would be reluctant to drop the set formalism. Thanks for the careful response to my concerns! I'll look at the rest of your comments later. -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "David Orchard" <dorchard@bea.com> 09/04/2006 08:17 PM To: <noah_mendelsohn@us.ibm.com> cc: <www-tag@w3.org> Subject: RE: Noah Mendelsohn Comments on July 26 Draft of TAG Versioning Finding Noah, Part 1 of ? Parts. I've gone through your comments. Thanks again for doing some extensive reviewing. The comments that you made did not substantially conflict with much of the work that I had done, which is goodness. I'm going to respond by quoting sections that you wrote followed by my comments. I hope that's the best way to work through the comments. NM>>> The finding claims that constraints are part of the language. I'm not convinced that's a good formulation, since the constraints are embodied in the set of texts & mappings. Stated differently, I think we're confusing a "language" with "the specification of a lanuage", and those are very different. So, I think a language should be a set of texts and their interpretation as information, and I am very happy with the way you present that much. I think we should have separate sections that talk about managing the specifications for languages as they evolve, and certainly constraint languages like XML Schema are among the good tools for writing specifications. It's OK to talk about keeping a language and its specification in sync. and to talk about constraint language features that facilitate versioning. I don't think the constraints are the language. I think they are emergent properties of the language that can sometimes be usefully set down in mathematical and/or machine readable notation such as regex's or XML Schemas. This is an important distinction on which I disagree with the finding as drafted << I'm not pursuaded that a language doesn't include constraints on the language. I think the key part is that the set of texts may be determined by the constraints. Using one of your favourite examples, if I create a language that has Red,Green,Blue. There we have listed the texts. But one of my favourite examples is the Name language, which has given and family, and those are simply strings. Whether Aaaaa and Aaaa and Aaa and Aa are part of the language didn't even occur to me until I wrote this. When a processor determines whether a text is in the language, it doesn't generate all the texts "in hand" and then compare, it will look at the constraints and evaluate without having all the texts in hand. I think any constraints are fundamentally part of the language. It seems to me that some languages, membership is determined by having the set of texts, and in others the set of texts can be generated from the constraints. So, can we come up with a modelling mechanism that allows a language to refer to one thing, rather than the 2 that it currently does (texts and constraints). Perhaps this was what the "membership" bucket was an attempt to model. Now I could flip this around and suggest we should go the opposite way and suggest removing Text Set and Information Set: languages have semantics, syntax and texts are in a language if they meet the syntax. Languages also have a mapping between any individual text and an individual information "item". I thought about breaking the relationship between language and syntax, leaving syntax just connected to text set. If I squint hard enough, I can see that could work. But I think that doesn't pass the common view of language, which is that language is directly related to syntax rather than indirectly via text set, see my name syntax example. NM>> I think we can and should do better in telling a story about whether a particular text is compatible as interpreted in L1 or L2, vs. the senses in which languages L1 and L2 as a whole are compatible. I think the story I would tell would be along the lines of: Of a particular text written per language L1 and interpreted per language L2: "Let I1 be the information conveyed by Text T per language L1. Text T is "fully compatible" with language L2 if and only if when interpreted per language L2 to yield I2, I1 is the same as I2. Text T is "incompatible" if any of the information in I2 is wrong (I.e. was not present in I1 or replaces a value in I1 with a different one...this rules disallows additional information, because only the information in I1 is what the sender thought they were conveying, so anything else is at best correct accidently). There are also intermediate notions of compatibility: e.g. it may be that all of the information in I2 is correct, but that I2 is a subset of I1. [Not sure whether we should name some of these intermediate flavors, but if we do, they should be defined precisely.] Of languages L1 and L2: We say that language L2 is "fully backward compatible" with L1 if every text in L1 is fully compatible with L2. We say that language L1 is "backwards incompatible" with L2 if any text in L1 is incompatible with L2. We say that Language L1 is "fully forwards compatible" with L2 if every text in L2 is fully compatible with L1. We say that L2 is "forwards incompatible with L1" if any text in L2 is incompatible with L1. As with texts, there may be intermediate notions of langauge compatibility for which we do not [or maybe we should?] provide names here. That all seems pretty simple and clean to me, and I think it's a firm foundation for much of the rest of the analysis. Notice that it seems natural to leave out discussion of the constraints in this layer; the story gets simpler without them. The current draft seems to me a bit loose in both talking about and defining issues for languages as a whole vs. for individual texts. << I have been moving towards this space as well in my examination of partial understanding. It is yet another example of this "class" vs "instance" that seems to always come up in modeling and system design. But I still disagree with the removal of syntax. I could easily suggest somewhat alternate wording that makes use of syntax and makes sense to me. "Of languages L1 and L2: We say that language L2 is "fully backward compatible" with L1 if every text valid under L1's constraints is fully compatible with L2". I could even push it further and define the Syntactic constraints, S1 and S2, then rephrase as "We say that language L2 is "fully backward compatible" with L1 if every text valid under S1 ..." NM>> Clarify focus on texts vs. documents ... << I agree. I've inserted part of one of your paragraphs. My comments on your comments ends, > -----Original Message----- > From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] > Sent: Monday, August 28, 2006 4:22 PM > To: David Orchard > Cc: www-tag@w3.org > Subject: Noah Mendelsohn Comments on July 26 Draft of TAG > Versioning Finding > > First of all, thanks again to Dave for the truly heroic work > on the versioning finding. This problem is as tough as they > get IMO, and I think the drafts are making really steady > progress. Still, as I've mentioned on a number of > teleconferences, I have a number of concerns regarding the > conceptual layering in the draft versioning finding, and some > suggestions that I think will make it cleaner and more > effective. Dan Connolly made the very good point that it is > really only approriate to raise such concerns in the context > of a detailed review of what has already been > drafted. So, I've tried to do that. > > A copy of my annotated version of the July 26 draft is > attached. I've taken quite a bit of trouble over these > comments, which are quite extensive, and while I'm sure that > they will prove to be only partly on the right track, I hope > they will get a detailed review not just from Dave > but also from other concerned TAG members. Anyway, what > I've done is to > take Dave's July 26th draft and add comments marked up using > CSS highlighting. These are in two main groups: > > 1) An introductory section sets out some of the main > architectural issues and ideas that I've been trying to > convey. I don't expect these will seem entirely justified > until you read the rest of the comments (if then), but I > think it's important to collect the significant ideas, and to > separate them from the smaller editorial suggestions. > > 2) I've gone through about the first third of Dave's draft, > inserting detailed comments. Some of these are purely > editorial, but most of them are aimed at motivating and > highlighting the concerns that led me to propose the major > points in that introductory chapter. Indeed, I've tried to > hyperlink back from the running comments to the larger > points, as I think that helps to motivate them. > > No editor working on a large draft entirely welcomes > voluminous comments, especially ones that have structural > implications. Dave: I truly hope this is ultimately useful, > and I look forward to working with you on it. > Where possible, I've tried to suggest text fragments you can > steal if you like them. I actually am fairly excited, > because working through Dave's draft has helped me to > crystalize a number of things about versioning in my own > mind. I think we're well along to telling a story that's > very clean, very nicely layered, and perhaps a bit simpler > and shorter than the current draft suggests. I don't think > it involves throwing out vast swaths of what Dave has > drafted, so much as cleaning up and very carefully relayering > some concepts. > > BTW: I will be around on and off until about Wed. afternoon, > then gone > until after US Labor Day weekend. Thanks again, Dave. > Really nice work! > > Noah > > [1] http://www.w3.org/2001/tag/doc/versioning-20060726.html > > > > -------------------------------------- > Noah Mendelsohn > IBM Corporation > One Rogers Street > Cambridge, MA 02142 > 1-617-693-4036 > -------------------------------------- > > > >
Received on Tuesday, 5 September 2006 02:09:57 UTC