W3C home > Mailing lists > Public > www-rdf-comments@w3.org > July to September 2003

Re: pfps-04 (why the thread is germane to pfps-04)

From: Graham Klyne <GK-lists@ninebynine.org>
Date: Tue, 29 Jul 2003 09:50:58 +0100
Message-Id: <5.1.0.14.2.20030729091516.0267ec98@127.0.0.1>
To: pat hayes <phayes@ihmc.us>, Martin Duerst <duerst@w3.org>
Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org

At 00:46 29/07/03 -0500, pat hayes wrote:

>>Are 'binary octets' different from 'octets'?
>
>I have absolutely no idea. :-)

Noticing that we're banding around this term 'octets', apparently without 
understanding what they are, I thought I'd dig over some definitions...

I see an octet as a sequence of 8 bits, where a bit is one of {0,1}.  Octet 
instances are often described by a number in the range 0..255, with the 
common relationship between binary numbers and bits, subject to agreeing 
most significant first or least significant first.  In either case, the 
relationship is 1:1.

The UTF-8 spec avoids the bit ordering issue by simply talking about "high 
order" to "low order" bits, which establishes a single direct relationship 
between the individual bits and the numbers 0..255.

[[
In UTF-8, characters are encoded using sequences of 1 to 6 octets. The only 
octet of a "sequence" of one has the higher-order bit set to 0, the 
remaining 7 bits being used to encode the character value. In a sequence of 
n octets, n>1, the initial octet has the n higher-order bits set to 1, 
followed by a bit set to 0. The remaining bit(s) of that octet contain bits 
from the value of the character to be encoded. The following octet(s) all 
have the higher-order bit set to 1 and the following bit set to 0, leaving 
6 bits in each to contain bits from the character to be encoded.
]]
-- http://www.rfc-editor.org/rfc/rfc2279.txt

The UTF-8 spec generally presents octet values as hexadecimal numerals.

Dan Connolly offers a slightly different form of definition:
[[
octet
     an element of the set {0, 1, 2, ..., 255}
]]
http://www.w3.org/MarkUp/html-spec/charset-harmful.html

Some others:

[[
octet: A byte of eight binary digits usually operated upon as an entity.
]]
-- http://glossary.its.bldrdoc.gov/fs-1037/dir-025/_3631.htm
-- http://www.atis.org/tg2k/_octet.html

[[
Definition for: octet

Eight bits.Octet is sometimes used instead of the term byte to avoid 
confusion, because not all computer systems use bytes that are eight bits long.
]]
-- http://www.computeruser.com/resources/dictionary/definition.html?lookup=3442

Google for "octet definition" shows up plenty more

Looking for definitions of "binary octet" doesn't show up anything 
especially useful, but the pattern of its use suggests one of two things:
(a) octet values represented as 8 bits (as opposed to, say, a number)
(b) octets used to encode binary data (as opposed to textual data).

Anyway, returning to the original question (Are 'binary octets' different 
from 'octets'?), I think the answer is:  not for any meaningful purpose as 
far as RDF is concerned.

#g


-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Tuesday, 29 July 2003 09:43:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 21 September 2012 14:16:32 GMT