- From: Frank Manola <fmanola@mitre.org>
- Date: Tue, 29 Jul 2003 12:25:26 -0400
- To: Graham Klyne <GK-lists@ninebynine.org>
- CC: pat hayes <phayes@ihmc.us>, Martin Duerst <duerst@w3.org>, "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org
Graham-- I understand your point (and I looked at what CHARMOD had to say about octets before I replied). My point about context, though, was that we're using the term "octet" in the context of a discussion that uses a set of different terms ("octet" being only one of them; "character", "text", "string" being others) in order to make specific distinctions between the concepts those terms represent. We may get a clearer definition of "octet" from somewhere else, but it may not be making the same distinction vis a vis the other terms we're using that CHARMOD (which I thought we considered normative) does. --Frank Graham Klyne wrote: > > Frank, > > You make a fair point, except that CHARMOD seems to not offer a clear > definition. > > The best I could find is: > > [[ > 3.1.6 Units of storage > > Computer storage and communication rely on units of physical storage and > information interchange, such as bits and bytes (also known as octets, as > nowadays the word bytes is generally considered to mean 8-bit bytes). A > frequent error in specifications and implementations is the equating of > characters with units of physical storage. The mapping between characters > and such units of storage is actually quite complex, and is discussed in > the next section, 3.2 Digital Encoding of Characters. > > [S] [I] Specifications and software MUST NOT assume a one-to-one > relationship between characters and units of physical storage. > ]] > -- http://www.w3.org/TR/charmod/#sec-Storage > > Anyway, I didn't mean to get into a protracted debate here, just wanted to > try and reduce the number of imponderables. > > #g > -- > > At 11:21 29/07/03 -0400, Frank Manola wrote: > >It seems to me that, given the context of this discussion, we might try > >to stick to the terminology (and distinctions made between the terms) in > >CHARMOD if we can. At least that's a single document... > > > >--Frank > > > >Graham Klyne wrote: > > > > > > At 00:46 29/07/03 -0500, pat hayes wrote: > > > > > > >>Are 'binary octets' different from 'octets'? > > > > > > > >I have absolutely no idea. :-) > > > > > > Noticing that we're banding around this term 'octets', apparently without > > > understanding what they are, I thought I'd dig over some definitions... > > > > > > I see an octet as a sequence of 8 bits, where a bit is one of {0,1}. Octet > > > instances are often described by a number in the range 0..255, with the > > > common relationship between binary numbers and bits, subject to agreeing > > > most significant first or least significant first. In either case, the > > > relationship is 1:1. > > > > > > The UTF-8 spec avoids the bit ordering issue by simply talking about "high > > > order" to "low order" bits, which establishes a single direct relationship > > > between the individual bits and the numbers 0..255. > > > > > > [[ > > > In UTF-8, characters are encoded using sequences of 1 to 6 octets. The only > > > octet of a "sequence" of one has the higher-order bit set to 0, the > > > remaining 7 bits being used to encode the character value. In a sequence of > > > n octets, n>1, the initial octet has the n higher-order bits set to 1, > > > followed by a bit set to 0. The remaining bit(s) of that octet contain bits > > > from the value of the character to be encoded. The following octet(s) all > > > have the higher-order bit set to 1 and the following bit set to 0, leaving > > > 6 bits in each to contain bits from the character to be encoded. > > > ]] > > > -- http://www.rfc-editor.org/rfc/rfc2279.txt > > > > > > The UTF-8 spec generally presents octet values as hexadecimal numerals. > > > > > > Dan Connolly offers a slightly different form of definition: > > > [[ > > > octet > > > an element of the set {0, 1, 2, ..., 255} > > > ]] > > > http://www.w3.org/MarkUp/html-spec/charset-harmful.html > > > > > > Some others: > > > > > > [[ > > > octet: A byte of eight binary digits usually operated upon as an entity. > > > ]] > > > -- http://glossary.its.bldrdoc.gov/fs-1037/dir-025/_3631.htm > > > -- http://www.atis.org/tg2k/_octet.html > > > > > > [[ > > > Definition for: octet > > > > > > Eight bits.Octet is sometimes used instead of the term byte to avoid > > > confusion, because not all computer systems use bytes that are eight > > bits long. > > > ]] > > > -- > > http://www.computeruser.com/resources/dictionary/definition.html?lookup=3442 > > > > > > Google for "octet definition" shows up plenty more > > > > > > Looking for definitions of "binary octet" doesn't show up anything > > > especially useful, but the pattern of its use suggests one of two things: > > > (a) octet values represented as 8 bits (as opposed to, say, a number) > > > (b) octets used to encode binary data (as opposed to textual data). > > > > > > Anyway, returning to the original question (Are 'binary octets' different > > > from 'octets'?), I think the answer is: not for any meaningful purpose as > > > far as RDF is concerned. > > > > > > #g > > > > > > ------------------- > > > Graham Klyne > > > <GK@NineByNine.org> > > > PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E > > > >-- > >Frank Manola The MITRE Corporation > >202 Burlington Road, MS A345 Bedford, MA 01730-1420 > >mailto:fmanola@mitre.org voice: 781-271-8147 FAX: 781-271-875 > > ------------------- > Graham Klyne > <GK@NineByNine.org> > PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E -- Frank Manola The MITRE Corporation 202 Burlington Road, MS A345 Bedford, MA 01730-1420 mailto:fmanola@mitre.org voice: 781-271-8147 FAX: 781-271-875
Received on Tuesday, 29 July 2003 12:25:44 UTC