- From: Larry Masinter <masinter@parc.xerox.com>
- Date: Wed, 11 Jan 1995 01:15:40 PST
- To: uri@bunyip.com
I've taken the liberty of exerpting two messages that came forwarded to me (via Vicky Reich at Stanford); I think they're relevant to the issue of using SGML syntax to represent metadata in URCs. ================================================================ Date: Tue, 10 Jan 1995 13:30:15 CST From: Susan Hockey <HOCKEY@zodiac.rutgers.edu> Subject: SGML and MARC SGML, the Standard Generalized Markup Language, is a very different thing from MARC. It is a metalanguage for defining encoding schemes. As such, it provides a syntax within which structures can be defined to describe information in electronic form. It is most often used for electronic text (full-text), but can also be used to describe images, sound and almost anything else in electronic form. By contrast, MARC is a particular structure designed for storing one particular type of electronic information. Therefore it is possible to define an SGML structure (document type definition or DTD) which holds the information in a MARC record. I believe that several such DTDs already exist. Conversion from this SGML format to MARC (and back) could therefore easily be done by program. For electronic text, SGML has many advantages. It consists of plain ASCII. It is independent of any particular hardware or software. It allows multiple views to be encoded within the same text. It thus provides an archival form of the data, ensuring longevity which is very important for the digital library. Generic SGML software can be used with any SGML-encoded material, since the software first reads the DTD and derives information about the structure of the material from it. The DTD is thus used to validate the information. This DTD mechanism also means that it is very easy to change the structure of SGML-encoded information, if you come across something which doesn't easily fit into the existing definition. You simply modify the DTD. The Text Encoding Initiative (TEI) has developed a set of SGML DTDs for electronic text which include encoding not only for the text itself, but for metadata which is stored in a header to the file. The header contains SGML tags for bibliographic description, which are very close to some MARC fields. It also contains information which a user of an electronic text needs to know, but which does not easily fit into MARC. This includes encoding principles (how the encoder has treated hyphens, quotations marks, illegible material, foreign words etc), and a revision history. At the Center for Electronic Texts in the Humanities (CETH), we are interested in the relationship between MARC and SGML, and particularly in how the TEI header can be used by librarians, scholars and, ultimately by computer software which will process the text. Therefore we organized a workshop last May on this topic - possibly the meeting referred to in Misha Schutt's posting. A report of this meeting is available from CETH, ($15 for a printed version - contact ceth@zodiac.rutgers.edu) or as a PostScript file techrpt2.ps by ftp from ceth.princeton.edu. We'd be interested to hear from anyone else who is working in this area, especially if they are storing metadata as SGML and generating MARC records (or even a relational database) from it. Susan Hockey Director, Center for Electronic Texts in the Humanities Rutgers and Princeton Universities hockey@zodiac.rutgers.edu ================================================================ From: jcort@lib.ua.ac.be () The mapping of the MARC record structure in an SGML DTD has been the subject of two articles published in Library Resources and Technical Services 38(4): - The documentation of electronics Texts using Text Encoding Initiative Headers : an introduction / Richard Giordano, p.389-401 - Cataloging Electronic Texts: the University of Virginie Library Experience / Edward Gaynor, p. 403-413. The Text Encoding Initiative (TEI) has developped a DTD that includes a Header element. This element describes general aspects of the electronic text. It also includes specific elements for the coding of bibliographic elements. The encoding scheme used is based on elements that can also be found in the MARC record. I personally fully agree with Edward Gaynor: libraries should try to integrate electronic text cataloging into the traditional technical services operations. One of the conclusions in his article states: "librarians should consider the usefulness of developing a MARC document type definition (DTD) in cooperation with the TEI. A full blown MARC DTD could make data conversion and interchange a relatively simple matter of programming." Developping a MARC DTD is complex but feasible, because both standards (MARC and SGML) are quite rigid. The University of Antwerp is using SGML not only to create electronic versions of library manuals and guides on the Web (http://www.ua.ac.be/index.html), but also for the purpose of record exchange (for example in our SDI-service). If you are interested you may also have a look at the CCB DTD developped by the universities of Ghent and Leuven for the creation of the Belgian Union Catalogue on CD-ROM. Although this DTD is far from perfect and does not have the ambition to support the full MARC record structure, it might however inspire you. This DTD is available through FTP. The url is: ftp://lib.ua.ac.be/pub/ccb/ccb.dtd. Kind regards, Jan Corthouts Deputy Librarian UIA-Library PB13 2610 Antwerp jcort@lib.uia.ac.be
Received on Wednesday, 11 January 1995 04:15:58 UTC