W3C home > Mailing lists > Public > www-rdf-interest@w3.org > August 2000

Re: Special characters in Bath Profile XML/Dublin Core

From: Misha Wolf <misha.wolf@reuters.com>
Date: Wed, 09 Aug 2000 12:42:25 +0000 (GMT)
Message-Id: <B0006040469@euvig1.dtc.lon.ime.reuters.com>
To: dc-general@mailbase.ac.uk
Cc: www-rdf-interest@w3.org
[I'm copying the www-rdf-interest@w3.org list]

Ann,

> The Bath Profile specifies that, for levels 1 & 2, XML record syntax 
> is used with a DTD for Basic Dublin Core (ie. version 1.1). The DTD 
> is from the CIMI project, and allows a <record-list> of <dc-record>s.
> 
> This DTD does not include reference to any standard SGML 
> character entity sets. So I assume the only character entities 
> available are the in-built XML ones: &, <, >, ', ". I am wondering 
> how to include non-keyboard characters. Should they be in 
> Unicode, or should they be in plain text, eg. &eacute; becomes 'a', 
> &alpha; becomes 'alpha'? I assume that if I were to include any 
> character entity sets within a DTD it would no longer be 
> interoperable.

If you have difficulties using Unicode directly, I suggest you use 
Numeric Character References (NCRs).  In both HTML and XML, NCRs always 
refer to the Unicode Standard.  Both decimal and hexadecimal versions 
are supported.  For example, all of these mean the same thing:

   A
   &#65;
   &#x41;

> Following on from this, I'm wondering how to encode superscripts 
> and subscripts. I don't think I can do this in Unicode for the general 
> case, though Unicode may include simple ones like 
> <SUP>2</SUP>. Again, if I include extra tags in text like <SUP> 
> and <SUB> the DTD will no longer be interoperable. The same 
> applies to formatting tags like italic and bold, but I expect one can 
> live without these - the content being more important than the 
> format. Super/subscipts are quite likely to occur in scientific article 
> titles. The only solution I have so far thought of is to include them 
> as plain text, eg. ^2^ for a superscript and ~2~ for a subscript. 
> Does anyone know if there is a 'standard' convention for this.

This is more of a problem.  When I was involved in the W3C RDF WG and the 
DC Datamodel WG, my hope was that arbitrary markup could be included, and 
be "passed through" transparently.  This is obviously needed for titles 
of mathematical papers etc.  I've rather lost touch, and don't know what 
the current situation is.

> Thanks for any help.
> 	Ann
> 
> --------------------------------------------------------------------------
> Mrs. Ann Apps. Electronic Publishing @ MIMAS. Manchester Computing,
>      University of Manchester, Oxford Road, Manchester, M13 9PL, UK
> Tel: +44 (0) 161 275 6039    Fax: +44 (0) 0161 275 6040
> Email: ann.apps@man.ac.uk  WWW: http://epub.mimas.ac.uk/ann.html
> --------------------------------------------------------------------------

Misha

[This mail was written using voice recognition software]


-----------------------------------------------------------------
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.
Received on Wednesday, 9 August 2000 07:41:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:51:43 GMT