SGML and MARC and URCs

Larry Masinter (masinter@parc.xerox.com)
Wed, 11 Jan 1995 01:15:40 PST


To: uri@bunyip.com
Subject: SGML and MARC and URCs
From: Larry Masinter <masinter@parc.xerox.com>
Message-Id: <95Jan11.011544pst.2760@golden.parc.xerox.com>
Date: Wed, 11 Jan 1995 01:15:40 PST

I've taken the liberty of exerpting two messages that came forwarded
to me (via Vicky Reich at Stanford); I think they're relevant to the
issue of using SGML syntax to represent metadata in URCs.

================================================================
Date:    Tue, 10 Jan 1995 13:30:15 CST
From:    Susan Hockey <HOCKEY@zodiac.rutgers.edu>
Subject: SGML and MARC

SGML, the Standard Generalized Markup Language, is a very different
thing from MARC. It is a metalanguage for defining encoding schemes.
As such, it provides a syntax within which structures can be defined
to describe information in electronic form. It is most often used
for electronic text (full-text), but can also be used to describe
images, sound and almost anything else in electronic form. By contrast,
MARC is a particular structure designed for storing one particular type
of electronic information.

Therefore it is possible to define an SGML structure (document type
definition or DTD) which holds the information in a MARC record.
I believe that several such DTDs already exist. Conversion from
this SGML format to MARC (and back) could therefore easily be done
by program.

For electronic text, SGML has many advantages.  It consists of plain
ASCII.  It is independent of any particular hardware or software.
It allows multiple views to be encoded within the same text.  It
thus provides an archival form of the data, ensuring longevity which
is very important for the digital library.  Generic SGML software
can be used with any SGML-encoded material, since the software first
reads the DTD and derives information about the structure of the
material from it.  The DTD is thus used to validate the information.
This DTD mechanism also means that it is very easy to change the
structure of SGML-encoded information, if you come across something
which doesn't easily fit into the existing definition.  You simply
modify the DTD.

The Text Encoding Initiative (TEI) has developed a set of SGML DTDs
for electronic text which include encoding not only for the text
itself, but for metadata which is stored in a header to the file.
The header contains SGML tags for bibliographic description, which
are very close to some MARC fields.  It also contains information
which a user of an electronic text needs to know, but which does not
easily fit into MARC.  This includes encoding principles (how the
encoder has treated hyphens, quotations marks, illegible material,
foreign words etc), and a revision history.

At the Center for Electronic Texts in the Humanities (CETH), we are
interested in the relationship between MARC and SGML, and
particularly in how the TEI header can be used by librarians,
scholars and, ultimately by computer software which will process the
text.  Therefore we organized a workshop last May on this topic -
possibly the meeting referred to in Misha Schutt's posting.  A
report of this meeting is available from CETH, ($15 for a printed
version - contact ceth@zodiac.rutgers.edu) or as a PostScript file
techrpt2.ps by ftp from ceth.princeton.edu.

We'd be interested to hear from anyone else who is working in this area,
especially if they are storing metadata as SGML and generating MARC
records (or even a relational database) from it.

Susan Hockey
Director, Center for Electronic Texts in the Humanities
Rutgers and Princeton Universities
hockey@zodiac.rutgers.edu

================================================================

From: jcort@lib.ua.ac.be ()
The mapping of the MARC record structure in an SGML DTD has been the
subject of two articles published in Library Resources and Technical
Services 38(4):  - The documentation of electronics Texts using Text
Encoding Initiative Headers :  an introduction / Richard Giordano,
p.389-401 - Cataloging Electronic Texts:  the University of Virginie
Library Experience / Edward Gaynor, p.  403-413.

The Text Encoding Initiative (TEI) has developped a DTD that
includes a Header element.  This element describes general aspects
of the electronic text.  It also includes specific elements for the
coding of bibliographic elements.  The encoding scheme used is based
on elements that can also be found in the MARC record.  I personally
fully agree with Edward Gaynor:  libraries should try to integrate
electronic text cataloging into the traditional technical services
operations.  One of the conclusions in his article states:
"librarians should consider the usefulness of developing a MARC
document type definition (DTD) in cooperation with the TEI.  A full
blown MARC DTD could make data conversion and interchange a
relatively simple matter of programming." Developping a MARC DTD is
complex but feasible, because both standards (MARC and SGML) are
quite rigid.

The University of Antwerp is using SGML not only to create
electronic versions of library manuals and guides on the Web
(http://www.ua.ac.be/index.html), but also for the purpose of record
exchange (for example in our SDI-service).  If you are interested
you may also have a look at the CCB DTD developped by the
universities of Ghent and Leuven for the creation of the Belgian
Union Catalogue on CD-ROM.  Although this DTD is far from perfect
and does not have the ambition to support the full MARC record
structure, it might however inspire you.  This DTD is available
through FTP.  The url is:  ftp://lib.ua.ac.be/pub/ccb/ccb.dtd.

Kind regards,
Jan Corthouts
Deputy Librarian
UIA-Library
PB13
2610 Antwerp
jcort@lib.uia.ac.be