Choice of coding format for Senior Online

I am involved in an EU-funded project Senior Online
(http://cmc.dsv.su.se/sol/) which aims at making the
Internet more suitable for elderly people. As part of this
project, we are to develop groupware systems and
portal/directory systems. We are just now at the stage
where we are to start specifying the protocols to be used
(a) between two groupware systems, (b) between groupware
systems and portal/directory systems.

The protocols, which we define, may in the future be
submitted as standards proposals in IETF. We will start
implementing them based on two groupware systems under
development, KOM 2000 (http://cmc.dsv.su.se/KOM2000/) and
Web for Groups (http://www.webforus.com/). We are
interested in getting other groupware vendors to try our
protocols, for example First Class
(http://www.business.softarc.com/works/index.shtml) and
Lotus Notes/Domino (http://www.lotus.com/).

One important issue, when starting applications area
protocol development, is the choice of coding method. I
have written an overview of the alternatives with their
advantages and drawbacks. Please give input with comments
on this table and on which choices to recommend.

Here is the overview:

--- --- cut here --- ---

Choice of Coding Format for Senior Online
=========================================

An important, and difficult, choice is the selection of coding
format and base web protocol for the communication between servers
in Senior Online. Two major such types of communication are
envisaged:

(1) The protocol for communication between groupware servers

(2) The protocol for communication between groupware servers and
portal servers

Here is a short description of the choices:

Coding format choices:
---------------------

Note: Some of these formats can be combined. For example, the e-mail
formats can be used for coding of the actual text of messages,
combined with the other formats for other information.

Format                        Description
------                        -----------

MIME                          The standard format for complex e-mail messages,
                              where the body can be split recursively into
                              multiple body parts.

MFORM = Multipart/formdata    Variant of MIME, one of the formats used, when a
                              web user fills in a form in a web page and pushes
                              the SEND button.

MHTML =                       Variant of MIME, the commonly used format for
Multipart/alternative,        sending HTML-formatted messages via e-mail. Used
Text/html and                 by KOM 2000 when sending
messages from KOM 2000 to
Multipart/related             e-mail. (? Web for Groups probably also uses this
                              format in communication with e-mail?)

XML                           A currently very popular format, strongly
                              supported by IBM and Microsoft, for sending
                              structured information on the Internet. Good for
                              complex structures, not so good for binary
                              information (like pictures or attachments)

ASN.1                         A complex and powerful binary format, used by
                              LDAP.

LDAP                          The currently most popular format for
                              communication with directory systems. Good for
                              complex structures and for distributed directory
                              data bases. Uses ASN.1.

LDIF                          A variant of LDAP with
textual, instead of binary,
                              encoding.

RFC822 header format          A simple format common in many protocols,
                              including e-mail headers and HTTP headers.

Corba                         A "remote procedure call" protocol for
                              communication between program modules on
                              different servers, written in common
                              programming languages.

As an aid in selecting this format, here is a table of choices and
their pros and cons. Question marks indicate that I do not know or
am not sure.

Format:       MFORM    MIME      XML      LDAP     LDIF      RFC822   Corba
------        -----    ----      ---      ----     ----      ------   -----

Easy to       Very     Yes (4)   Yes (4)  Bad (1)  Yes (4)   Very     Yes (4)
produce       much                                           much
manually and  (5)                                            (5)
debug

Ease of       OK (3)   OK (3)    OK (3)   Diffi    OK (3)    Easy     Very
coding                                    cult               (4)      easy
                                          (1)                         (5)

Portability   Good     Good      Good     Good     Good      Good     Bad (1)
              (4)      (4)       (4)      (4)      (4)       (4)

Binary data   Good     Good      No (1)   ? (3)    ? (3)     No (3)   Yes?
              (4)      (4)                                            (4)

Acceptabi     Good     Good      Very     Very     Good      Good     Bad (1)
lity as a     (4)      (4)       good     good     (4)       (4)
future                           (5)      (5)
standard

Ease of       OK (3)   OK (3)    Good     Good     Good      Good     Good?
specifica                        (4)      (4)      (4)       (4)      (4)
tion

Total score   23       22        21       18       22        25       19

Recommendation: I suggest that we start with the RFC822 header format,
combined with MIME for the format of messages.

Protocol format choice:
----------------------

Possible choices for the protocol (to be extended for our needs):

Choice            SMTP               HTTP               Corba

Description       The Internet e-    The WWW protocol,  A remote
                  mail format,       based on direct    procedure call
                  based on store-    connections,       method, popular
                  and-forward of     popular as a base  in the telecom
                  messages.          for new            industry.
                                     protocols.

Advantage         Good for sending   Easy to use,       Easy to use.
                  messages, we have  popular.
                  to implement it
                  anyway in order
                  to handle e-mail
                  connectivity,
                  built-in queing
                  and resending
                  facility when the
                  destination
                  server is down.

Disadvantage      Store and forward  Complex, but you   Limited platform
                  means that you     can choose a       availability, not
                  get no direct      subset suitable    acceptable for a
                  responses to       for your needs.    standard
                  queries.                              protocol.

Recommendation: I suggest we use HTTP for all communication except
the sending of messages. For the sending of messages, I am not sure
whether to recommend SMTP or HTTP.

Character set format choices
----------------------------

Choice          ISO Latin 1           Charset                UTF-7, UTF-8
------          -----------           -------                ------------

Description     ISO 8859-1            Several, with          UTF-7, UTF-8
                standard              charset parameter      encodings of
                                                             Unicode/ISO 10646

Advantage       Easy to use           The format used
Expected to be what
                                      today in web and e-    all computers use
                                      mail                   in the future, but
                                                             not yet well
                                                             supported by all
                                                             platforms

Disadvantage    Only good for         Difficult to           Some debugging
                Western European      implement,
problems because it
                languages (not, for   especially for the     is not well
                example, Polish,      search engine          supported by
                Hungarian, Cyrillic,                         existing protocol
                Arabic, Hebrew,                              debugging software
                Asian languages)                             like telnet and
                                                             text editors

UTF-7 and UTF-8 are encodings of the future character set standards
Univode and ISO 10646. These encodings of Unicode/ISO 10646 are
especially suitable for Internet protocols, because all Latin letters
and digits and some common punctuation characters are the same as in
ASCII. IETF recommends UTF-8. The only advantage with UTF-7 is that
 it can be sent without further encoding in e-mail.

Recommendation: I recommend that we start with the Charset choice,
but only using one charset, ISO Latin 1. This can in the future be
extended to either full Charset or Charset with a choice between
ISO Latin 1 and UTF-7 or UTF-8.
------------------------------------------------------------------------
Jacob Palme <jpalme@dsv.su.se> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/~jpalme

Received on Sunday, 4 April 1999 12:08:57 UTC