XML for Structured Data from Jean Paoli on 1997-05-16 (w3c-sgml-wg@w3.org from May 1997)

From: Jean Paoli <jeanpa@microsoft.com>
Date: Thu, 15 May 1997 21:14:10 -0700
To: "'w3c-sgml-wg@w3.org'" <w3c-sgml-wg@w3.org>
Message-ID: <78DFE33066ABD0118B9200805FD431BA05919C@RED-16-MSG.dns.microsoft.com>
As Jon said in a previous e-mail, the XML ERB have decided to open the
namespace debate. Andrew Layman from Microsoft and myself have been
- with the approval of the XML ERB- discussing with a lot of groups in
the 
W3C and in the IETF the subject of using XML for Structured Data.

The mail below describe the issues we were discussing with all of these
groups (Digital Signature, DAV, Labels) and the set of limitations of
XML
that have been clearly identified by them.

At this time a lot of these groups expressed their interest in using XML

for structured data and I think personnally that this is a huge win for
XML
if they do choose to use it: This will represent a huge range of
distributed
applications and protocols which will use XML on the Web.
This is why the XML ERB agreed to open officially these subjects to
discussion
and asked me to post 5 questions to this mailing list for revising the
XML-syntax.

This mail starts with an introduction to the problem and will be
followed by 5 e-mails
numbered SD1, SD2, SD3, SD4, SD5 (SD like Structured Data) which
correspond 
to the 5 revisions that we would want to discuss.

It is better to read the questions in order.

You will find also at the end of this e-mail a paper written by the
folks in W3C 
which are in charge of the PICS-NG Metadata Model and Label Syntax Group
(The Author is Ora Lassila).
They have today 2 competing syntaxes (a lisp proposal and an XML
proposal
and I really hope they will choose XML).


-Jean Paoli

------------------------------------------------------------------------
--------------------------------------------
XML for Structured Data 

Andrew Layman (AndrewL@microsoft.com)
Jean Paoli (JeanPa@microsoft.com)

Introduction:

This discussion proposes that we use XML for structured data and lists a
number of 
issues related to such use. These have come up recently in many contexts
and 
under many names such as "web collections," "metadata," "schemas,"
"ontologies," 
"profiles," "lexicons" etc. XML can handle these needs, but it needs
some 
extensions to do so.

We have discussed this subject extensively with two groups in the W3C:
the 
Digital Signatures Initative Group (designers of PICS, and more
generally, 
electronic signatures on assertions about documents) and W3C-Labels 
(representation of structured data about web resources, for instance
electronic 
libraries and their indices). Correspondents include Ralph Swick, Ora
Lassila, 
Philip DesAutels and Eric Miller. We are also particularly indebted to
two 
individuals: Tim Bray and Henry Thompson. This discussion lists several
known issues 
for XML, and often makes concrete proposals both to illustrate the issue
and so 
that we have something to start with and to criticize. These proposals
are not 
meant to read as final thoughts but as beginning ones.

A word on terminology: We here use "schema" to denote the information
that must 
be known before a document can be read (e.g. DTD). I specifically avoid
all use 
of the term "meta-data" since it currently has at least three
widely-different 
meanings depending on speaker.

XML is a format for tagging text so that the structure of its semantic
parts is 
identified in an easily-parsed way. The document's structure, that is,
the 
relationship of elements in the document can be mechanically discovered
without 
reference to any outside information beyond the XML specification. (See 
http://www.w3.org/pub/WWW/TR/WD-xml-961114.html). For example, the
following is 
a valid XML document:

<ORDERS>
	<LINEITEM XML-ID="L1">
		<NAME>Number, the Language of Science</NAME>
		<AUTHOR>Dantzig, Tobias</AUTHOR>
		<PRICE>5.95</PRICE>
		</LINEITEM >
	<LINEITEM XML-ID="L2">
		<NAME>Tales of Grandpa Cat</NAME>
		<AUTHOR>Wardlaw, Lee</AUTHOR>
		<PRICE>6.58</PRICE>
		</LINEITEM >
		</ORDERS>

The base level of XML is only enough to express containment and text. A
number 
of issues immediately arise: How to represent more complex structures;
how 
attributes of elements should be represented; how to identify a
namespace 
(schema) used in a document; how to present a schema in a machine
readable form; 
how to create a open, extensible mechanism for adding and extending
schemas; and 
how to integrate multiple schemas in one document.
We will  describe each of these in the following e-mails.

****End of the Introduction of XML for Structured Data *********

------------------------------------------------------------------------
-------------------------------------------------------------
This is the  paper written by the folks in W3C which are in charge of
the PICS-NG Metadata Model 
and Label Syntax Group (The Author is Ora Lassila).
This paper uses the proposed modifications to the XML syntax but states 
that these extensions are proposed and have yet to be discussed by the
XML working group.
------------------------------------------------------------------------
--------------------------------------------------------------

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
   <TITLE>PICS-NG Metadata Model and Label Syntax</TITLE>
   <META NAME="Author" CONTENT="Ora Lassila">
   </HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B"
ALINK="#FF0000">

<P><A HREF="../.."><IMG SRC="../../Icons/WWW/w3c_home" ALT="W3C"
BORDER=0 HEIGHT=48 WIDTH=72 ALIGN=LEFT></A>
</P>

<H3 ALIGN=RIGHT>WD-pics-ng-metadata-970514.html</H3></DIV>

<P><BR>
<BR>
</P>

<H1>PICS-NG Metadata Model and Label Syntax</H1>

<P><B>Ora Lassila</B>, <A
HREF="mailto:lassila@w3.org">lassila@w3.org</A>,
Nokia Research Center (currently visiting W3C)</P>

<P>Version 3.5, 5/14/97</P>

<P>This document supersedes an earlier document titled &quot;<A
HREF="http://www.w3.org/pub/WWW/PICS/draft-lassila-pics-ng-label-syntax.
html">PICS-NG
Label Syntax Proposal</A>&quot; version 1 dated 2/20/97 [Lassila
97].</P>

<H2>Acknowledgements</H2>

<P>This document would not have been possible without substantial
contributions
and support from Ralph Swick (W3C), as well as contributions and
comments
from Eric Miller (OCLC), Jim Miller (W3C), Paul Resnick (AT&amp;T) and
Bob Schloss (IBM). The author is indebted to all these people for their
continuing moral support.</P>

<H2>Status of this document</H2>

<P>This document is being submitted simultaneously to the W3C (PICS)
Label
Working Group and the W3C DSig Collections Working Group for
consideration
as the basis of a converged web resource description framework. It
represents
the discussion of W3C staff and has not yet undergone review by either
of those groups. Note also that Section 5.2 is currently missing and
will
be provided in an update very shortly.</P>

<P>
<HR WIDTH="100%"></P>

<H2>Table of Contents</H2>

<OL>
<LI><A HREF="#intro">Introduction</A></LI>

<LI><A HREF="#model">Metadata Object Model</A></LI>

<LI><A HREF="#inheritance">Sharing Fragments of Metadata</A></LI>


<LI><A HREF="#schemata">Schemata</A></LI>

<LI><A HREF="#syntax">Syntax of PICS-NG</A></LI>

<LI><A HREF="#examples">Examples</A></LI>

<LI><A HREF="#issues">Open Issues</A></LI>

<LI><A HREF="#literature">Literature</A></LI>

<LI><A HREF="#appendix-xml">Appendix A: Correspondence to the XML Web
Collection
Proposal</A></LI>
</OL>

<P>
<HR WIDTH="100%"></P>

<H2><A NAME="intro"></A>1. Introduction</H2>

<P>The first question to ask is: what is metadata? Metadata is
&quot;data
about data&quot;, or specifically in our present context, &quot;data
about
web resources.&quot;</P>

<BLOCKQUOTE>
<P><I>The broad goal is to define a metadata mechanism which makes no
assumptions
about a particular application domain, nor defines the semantics of any
application domain. The definition of the mechanism should be domain
neutral,
yet the mechanism should be suitable for describing information about
any
domain.</I></P>
</BLOCKQUOTE>

<P>Metadata can be used in a variety of application areas; for
example:&nbsp;in
<I>resource discovery</I> to provide better search engine capabilities,
in <I>cataloging</I> for describing the content available at a
particular
web site or page, by <I>intelligent software agents</I> to facilitate
knowledge
sharing and exchange, in <I>digital signatures</I>, in <I>content
rating</I>,
and in many others (for example, metadata can be used for specialized
tasks
such as organizing a group of web pages for purposes of printing them as
a single unit, or for producing a visualization of the link
relationships
between them).</P>

<P>This document introduces an model for representing metadata, and a
syntax
for expressing and transporting metadata based on this model. In a way,
this is a new version of the PICS content rating label mechanism and
motivates
its use as a general metadata description formalism. The new PICS -
which
we shall here call &quot;PICS-NG&quot; (for &quot;Next
Generation&quot;)



- is based on a conceptual object model for metadata,
suitable for expressing
information about web resources as well as other PICS-NG formulations.
The model is highly extensible, and also more general than the implied
model behind <A
HREF="http://www.w3.org/pub/WWW/TR/REC-PICS-labels.html">PICS
version 1.1</A> [Krauskopf 96]; hence this document will first describe
the model in general and then proceed to give a specialization for
implementing
content rating labels.</P>

<P>A mechanism is needed to permit encoding and transport of web
metadata
in a manner that maximizes the interoperability of independently
developed
web servers and clients. Specific applications are free - and indeed
encouraged
- to impose additional semantics on a subset of the metadata above that
required by the model described in this document.</P>

<H2><A NAME="model"></A>2. Metadata Object Model</H2>

<P>The <I>metadata object model</I> defines a conceptual framework for
objects called <I>labels</I>. Labels are collections of <I>attributes
</I>and
their corresponding <I>values</I>. The domain of values consists of
instances
of a small set of <I>primitive types</I>, other labels, as well as
<I>lists</I>.
The primitive types are: <I>strings</I>, <I>numbers </I>(both integers
and floats) and <I>booleans</I>. By definition, an attribute/value pair
contained in a label makes a <I>statement</I>. Using labels it is
possible
to make statements about <I>resources</I> (which have a URL) as well as
about other labels.</P>

<P>The set of attributes for a given label, as well as any
characteristics
or restrictions of the values themselves, are defined by a
<I>schema</I>,
referred to by the label using a URL. This URL&nbsp;may be treated
merely
as an identifier or it may refer to a machine-readable description of
the
schema. A label may have more than one schema, and similarly a schema
may
be defined in terms of any number of other schemata. By definition, an
application that understands a particular schema used by a label
understands
the semantics of each of the attribute statements contained in that
label.
An application that has no knowledge of the particular schema will
minimally
be able to parse the label into the attribute and value components and
will be able to transport the label intact (e.g. to a cache or to
another
application). In the presence of multiple schemata, an application may
choose (in a left-to-right order) the first schema it has knowledge of,
and interpret the label using that schema [<B>Note: </B>see the Open
Issues
section for a discussion on multiple inheritance].</P>

<P>An actual machine-readable description of a schema may be accessed
through
content negotiation by dereferencing the schema URL contained in the
label.
If the schema is machine-readable it may be possible for an application
to learn the semantics of the schema on demand. How the learning happens
is beyond the scope of this document; furthermore, no claim is made that
it is always feasible to encode the full semantics in a machine-readable
schema. The URL&nbsp;referring to a schema may actually refer to a file
containing definitions for several schemata (i.e. a library of
schemata).
In this case, embedded labels may refer to any of the contained schemata
definitions using URL&nbsp;fragment identifiers.</P>

<P>A <I>type</I> is an identifier designated by a schema to name a
component
of a <I>type system</I>. The basic type system of PICS-NG contains the
following types (many of these types are not unlike those found in
various
Lisp systems):</P>

<CENTER><TABLE BORDER=1 CELLSPACING=0 CELLPADDING=3 >
<TR ALIGN=LEFT VALIGN=TOP>
<TH>Type</TH>

<TH>Description</TH>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>string</TD>

<TD>A sequence of characters [<B>Note: </B>a discussion of character
sets
will be included in a future version of this document]. Syntactically
strings
are case-sensitive and may contain whitespace.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>symbol</TD>

<TD>A sequence of characters acting as a unique identifier.
Syntactically
symbols are case-insensitive. The particular syntax used for metadata
restricts
the set of characters allowed in a symbol. Furthermore, symbols may not
contain whitespace.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>integer</TD>

<TD>An integer.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>float</TD>

<TD>A floating-point approximation of a real number.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>range</TD>

<TD>A tuple of two numbers, representing lower and upper bounds of an
interval.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>number</TD>

<TD>Either an integer, a float, or a range.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>boolean</TD>

<TD>A boolean value. The names of the two possible values are
<TT>true</TT>
and <TT>false</TT>.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>list</TD>

<TD>An ordered sequence of values (of any type).</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>label</TD>

<TD>A PICS-NG metadata label.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>URL</TD>

<TD>A Uniform Resource Locator. Syntactically this is a string, but only
those characters are allowed which are legal as specified in the <A
HREF="http://ds.internic.net/rfc/rfc1738.txt">URL&nbsp;specification</A>
[Berners-Lee 94].</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>ISODate</TD>

<TD>Objects of this type represent points in time. Syntactically the
type
looks like a string (with additional restrictions on their contents as
defined in the syntax section below), internally it may be represented
in any way the implementation sees fit. Syntactically this type is
currently
defined as <I>quoted-ISO-date</I> in the <A
HREF="http://www.w3.org/pub/WWW/TR/REC-PICS-labels.html">PICS
version 1.1</A> specification.</TD>
</TR>
</TABLE></CENTER>

<P>In addition, the type called <I>any</I> is understood to denote the
set of all of the above types.</P>

<P>In certain applications it may be desirable for some attributes to
hold
multiple values simultaneously. In this case the order of the values is
significant, that is, an application is required to preserve the
ordering
(please note that the order of <I>attributes</I> is not significant). To
assign multiple values to an attribute the list type is used (in other
words, an attribute with multiple values is an attribute with a single
list value).</P>

<P>A label is a collection of statements (attribute/value pairs). These
statements are being made about an object called the <I>referent</I>. We
can identify three different types of referents:</P>

<UL>
<LI><B>Referent Value: </B>the statements apply to the object named by
the referent; if the referent is a label describing a set, the
statements
are about the set as a whole.</LI>

<LI><B>Indirect Referent Value: </B>the statements apply to the referent
of the referent; if the referent is a label describing a set, the
statements
are about each of the items in that set.</LI>

<LI><B>Immediate Value: </B>the statements apply to the referent object;
if the referent is a label, the statements are about the label object
itself.</LI>
</UL>

<P>If the referent is a list it is understood that the statements are
being
made of each of the items of the list. The following table clarifies the
differences between the three cases based on the type of the referent
object:</P>

<TABLE BORDER=1 CELLSPACING=0 CELLPADDING=3 >
<TR>
<TD></TD>

<TH>Referent Value</TH>

<TH>Indirect Referent Value</TH>

<TH NOWRAP>Immediate Value</TH>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD NOWRAP>boolean</TD>

<TD>N/A</TD>

<TD>N/A</TD>

<TD ROWSPAN="2">The referent label makes statements about the object
(e.g.
unit of a numerical value)</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>symbol, number</TD>

<TD>The symbol or number is the name of another label which is
considered
the actual referent (see below)</TD>

<TD>The symbol or number is the name of another label which is
considered
the actual referent (see below)</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>string</TD>

<TD>String is a URL, and the statements apply to the resource at that
URL</TD>

<TD>String is a URL referring to a label describing a set; statements
apply
to the items of the set</TD>

<TD>Statements are made about the string (e.g. type of string,
language)</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>label</TD>

<TD>If the label defines a set, the statements apply to the set as a
whole</TD>

<TD>If the label defines a set, the statements apply to the elements of
the set individually</TD>

<TD>Statements apply to the referent label object itself</TD>
</TR>
</TABLE>

<P>If <I>x</I> is the referent of <I>y</I>, then we will say that
<I>y</I>
is the <I>parent</I> of <I>x</I>. In general, the immediately enclosing
label of any label is called the <I>parent</I> (regardless of what
attribute
holds the label as its value).</P>

<P>When label naming is used (in the above table, when the referent is
a symbol or a number), the scope of name visibility is within the peers
of a named label as well as within the locally (lexically) enclosing
labels.
References through URLs do not transmit name visibility. In addition, a
label is not allowed to make <I>forward references</I> (only labels
introduced
lexically before the referring label can be referred to), nor is a label
allowed to refer to itself.</P>

<H2><A NAME="inheritance"></A>3. Sharing Fragments of Metadata</H2>

<P>In order to avoid needless proliferation of metadata a mechanism is
introduced which allows the sharing of common fragments of metadata
among
several labels. This feature is inspired by various <I>inheritance
mechanisms</I>
found in object-oriented programming systems as well as various
knowledge
representation systems.</P>

<P>Attributes and their (optional) default values are <I>inherited</I>
from a schema to a label, and values may be inherited from one label to
another. Using inheritance statements can be made of groups of objects
without having to repeat the statement individually for each object. The
following algorithm defines the exact mechanism of inheritance: given a
label <I>lab</I> and an attribute <I>att</I>, the value of <I>att</I>
for
<I>lab</I>, as given by the function <B>AttributeValue(</B><I>lab</I>,
<I>att</I><B>)</B>, is</P>

<OL>
<LI>The local value of <I>att</I> for <I>lab</I>, if <I>att</I> is a
local
attribute of <I>lab</I>.</LI>

<LI><B>AttributeValue(</B>schema of <I>lab</I>, <I>att</I><B>)</B><I>,
</I>if <I>lab</I> has a schema explicitly defined.</LI>

<LI><B>AttributeValue(</B>parent of <I>lab</I>, <I>att</I><B>)</B>, if
<I>lab</I> has a parent.</LI>

<LI>Unspecified, since a label always has either a schema definition or
is enclosed by another label.</LI>
</OL>

<P>As stated in the model definition section, in the presence of
multiple
schemata an application may choose the first schema it has knowledge of,
and interpret the label using that schema.</P>

<H2><A NAME="schemata"></A>4. Schemata</H2>

<P>A label refers to schema(ta) for the purpose of grounding the terms
used by the label, to provide semantics for the statements the label
makes.
It is our intention that the PICS-NG metadata formalism be extremely
simple,
yet powerful via extensibility. It is expected that metadata
implementors
will define new schemata to introduce additional semantics for metadata
expressions. We assume a formalism will exist for defining schemata, but
this formalism is not described in this document (possibly the same
formalism
is used for schemata as is used for metadata instances). For maximal
extensibility,
the schema definition mechanism may take on features of <I>metaobject
protocols</I>.</P>

<P>For the purposes of &quot;bootstrapping&quot; the model, it will be
necessary to define a small set of attributes which are available in all
labels (and which conceivably could be used by any label). These
attributes
cannot be redefined or overridden by new schemata (to indicate the fixed
nature of the definition of these attributes their names start with the
* character; the use of the character *&nbsp;is reserved for this
purpose
and no schema should use it as the first letter of an attribute name).
The common core attributes of labels are:</P>

<CENTER><TABLE BORDER=1 CELLSPACING=0 CELLPADDING=3 >
<TR ALIGN=LEFT VALIGN=TOP>
<TH>Attribute name</TH>

<TH>Type</TH>

<TH>Description</TH>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>*schema</TD>

<TD>URL, list</TD>

<TD>Contains a reference to the schema of the label. A list value is
understood
to be a list of URLs.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD NOWRAP>*for<BR>
*for-indirect<BR>
*for-immediate</TD>

<TD>any (see table in Section 2)</TD>

<TD>Contains the <I>referent</I> of the label, i.e. the object about
which
the statements in the label are being made. See the explanation at the
end of section 2 describing the three different kinds of referents: a
<I>referent</I>
value (for), an <I>immediate</I> value (for-immediate) and an
<I>indirect
referent</I> value (for-indirect). Note: Specialized schemata may define
other attributes the values of which can also be considered
referents.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>*id</TD>

<TD>symbol, number</TD>

<TD>Names the label. Named labels can be referred to by just using their
name (see explanation on referents at the end of section 2). The scope
of the names is the lexical context of the label (everything within the
outermost lexically enclosing label).</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>*dsig</TD>

<TD>label</TD>

<TD>This attribute holds a digital signature which signs the label. The
digital signature is a label itself and thus conforms to this
specification.
The actual schemata and sematics of digital signatures will be specified
later.</TD>
</TR>
</TABLE></CENTER>

<H3>4.1. Basic Content Rating Schema (the &quot;PICS Schema&quot;)</H3>

<P>In order to implement an extension of <A
HREF="http://www.w3.org/pub/WWW/TR/REC-PICS-labels.html">PICS
version 1.1</A> using PICS-NG, a schema has to be defined to introduce
the old &quot;options&quot; as label attributes. We will call this
schema
the &quot;PICS&nbsp;2.0 Schema.&quot; A&nbsp;PICS 2.0 rating label is
expressed
as a single label. A label-list is a label whose referent is a list of
labels. The attributes of the PICS 2.0 schema are:</P>

<CENTER><TABLE BORDER=1 CELLSPACING=0 CELLPADDING=3 >
<TR ALIGN=LEFT VALIGN=TOP>
<TH>Attribute</TH>

<TH>Default value</TH>

<TH>Type</TH>

<TH>Description</TH>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>at</TD>

<TD NOWRAP>no default</TD>

<TD>ISODate</TD>

<TD>The last modification date of the item to which this rating applies,
at the time the rating was assigned.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>by</TD>

<TD NOWRAP>no default</TD>

<TD>string</TD>

<TD>An identifier for the person or entity within the rating service who
was responsible for creating this particular rating label.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>generic</TD>

<TD NOWRAP><TT>false</TT></TD>

<TD>boolean</TD>

<TD>If this option is set to true, the rating label can be applied to
any
URL starting with the prefix given in the for option. This is used to
supply
ratings for entire sites or any subparts of sites.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>on</TD>

<TD>no default</TD>

<TD>ISODate</TD>

<TD>The date on which this rating was issued.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD>until</TD>

<TD>no default</TD>

<TD>ISODate</TD>

<TD>The date on which this rating expires.</TD>
</TR>
</TABLE></CENTER>

<H2><A NAME="syntax"></A>5. Syntax of PICS-NG</H2>

<P>The PICS-NG metadata object model provides an abstract, conceptual
framework
for defining and using metadata. A concrete syntax is also needed for
the
purposes of authoring and exchanging metadata. Several syntaxes are
obviously
possible, and we may not have to limit ourselves to a single syntax.
There
are, however, certain goals to keep in mind when designing the
syntax:</P>

<OL>
<LI><B>Brevity: </B>over-the-wire characteristics are important despite
advances in telecommunications technology.</LI>

<LI><B>Ease of parsing: </B>to make metadata efficient to use, parsers
have to be simple and fast; to promote widespread interoperability,
parsers
have to be easy to write.</LI>

<LI><B>Suitability to direct human authoring and comprehension: </B>this
is probably less important than the previous goals, yet we should avoid
unnecessary verbosity or anything else that needlessly complicates
authoring
(the ability to reliably author metadata with the Windows Notepad editor
is <B>not </B>a goal).</LI>
</OL>

<P>This document defines an s-expression syntax for PICS-NG. This syntax
satisfies the above requirements.</P>

<H3>5.1. S-Expression Syntax</H3>

<P>The syntax of PICS-NG is greatly simplified from that of PICS version
1.1. Basically PICS-NG syntax in a straightforward manner consists of
s-expressions
where additional restrictions are placed on the types of values of
certain
elements of s-expression structures. PICS-NG parsing is a multi-step
process.
Parsing of a single label happens as follows:</P>

<OL>
<LI>A simple s-expression parser is used to parse (and verify) the
overall
syntactic structure (given below in the form of a BNF definition). This
is the only step necessary if one is not interested in any semantic
interpretation
of the label (if the data is only passed through, if the parsing agent
has no knowledge of the schemata used, etc.).</LI>

<LI>Information from each of the schemata of a label is used to verify
that attribute values have legal values.</LI>

<LI>Any other information from the schemata is used for semantic
interpretation
of the label.</LI>
</OL>

<P>This syntax has been chosen because it is simple to parse, provides
a straightforward correspondence between the model and the syntactic
form
of the data, is brief (good &quot;over the wire&quot; -characteristics),
and (by not being too verbose) is easy for humans to read and write. A
BNF definition of the overall syntactic structure is given below
(despite
that fact that BNF&nbsp;rather poorly lends itself to describing
s-expressions):</P>

<TABLE CELLSPACING=0 CELLPADDING=4 >
<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>Manifest</I></B></TD>

<TD>::</TD>

<TD>'<TT>(</TT>' <I>Version Label*</I> '<TT>)</TT>'</TD>

<TD></TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>Version</I></B></TD>

<TD>::</TD>

<TD><I>Symbol</I></TD>

<TD>Possibly the version symbol in this version is
<TT>pics-2.0</TT></TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>Label</I></B></TD>

<TD>::</TD>

<TD>'<TT>(</TT>' '<TT>label</TT>' <I>Attribute</I>* '<TT>)</TT>'</TD>

<TD></TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>Attribute</I></B></TD>

<TD>::</TD>

<TD><I>AttributeName Value</I></TD>

<TD></TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>AttributeName</I></B></TD>

<TD>::</TD>

<TD><I>Symbol</I> | <I>URL</I></TD>

<TD></TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>Value</I></B></TD>

<TD>::</TD>

<TD><I>Atom</I> | <I>List</I> | <I>Label</I></TD>

<TD></TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>Atom</I></B></TD>

<TD>::</TD>

<TD><I>String</I> | <I>Symbol</I> | <I>Number</I> | <I>Range</I> |
<I>Boolean</I></TD>

<TD></TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>List</I></B></TD>

<TD>::</TD>

<TD>'<TT>(</TT>' '<TT>list</TT>' <I>Value*</I> '<TT>)</TT>'</TD>

<TD></TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>Range</I></B></TD>

<TD>::</TD>

<TD>'<TT>(</TT>' '<TT>range</TT>' <I>Number Number</I> '<TT>)</TT>'</TD>

<TD><B>Note: </B>is this really a general thing or a content rating
thing?</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>Boolean</I></B></TD>

<TD>::</TD>

<TD>'<TT>true</TT>' | '<TT>false</TT>'</TD>

<TD></TD>
</TR>
</TABLE>

<P>Here are definitions for the &quot;literal&quot; entities of the
syntax:&nbsp;</P>

<TABLE CELLSPACING=0 CELLPADDING=4 >
<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>Symbol</I></B></TD>

<TD>any sequence of characters not containing whitespace nor any of the
following characters: <TT>( ) &quot;</TT></TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>String</I></B></TD>

<TD>defined as <B><I>quotedname</I> </B>in the <A
HREF="http://www.w3.org/pub/WWW/TR/REC-PICS-labels.html">PICS
1.1 specification</A>. Basically anything limited by doublequotes.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>Number</I></B></TD>

<TD>defined as [ '<TT>+</TT>' | '<TT>-</TT>' ] <I>DigitCharacter</I>* [
'<TT>.</TT>' <I>DigitCharacter</I>+ ] where <I>DigitCharacter</I> is any
of the characters '<TT>0</TT>'...'<TT>9</TT>'.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>URL</I></B></TD>

<TD>similar to <I>String</I>, but with contents identifying a Uniform
Resource
Locator, as defined in the <A
HREF="http://www.w3.org/pub/WWW/TR/REC-PICS-services-961031.html">PICS
Rating Services and Rating Systems</A> [Miller 96] and <A
HREF="http://ds.internic.net/rfc/rfc1738.txt">RFC
1738</A> [Berners-Lee 94].</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><B><I>ISODate</I></B></TD>

<TD>a <I>String</I>, representing a date but restricted from the
ISO&nbsp;standard,
as described by the <A
HREF="http://www.w3.org/pub/WWW/TR/REC-PICS-labels.html">PICS
1.1 specification</A>.</TD>
</TR>
</TABLE>

<H3>5.2. XML Syntax</H3>

<P>Another possible approach to metadata syntax is to use XML (the
Extensible
Markup Language). This language is attractive because of its political
appeal and the fact that it may find other uses in the Internet arena.
The full definition of an XML&nbsp;syntax for PICS-NG will be included
in a future version of this document. See <A
HREF="#appendix-xml">Appendix
A</A> for a discussion of PICS-NG compared with Microsoft's XML
Collections
proposal.</P>

<P>Section 6.2 illustrates a proposed XML syntax [contributed by Andrew
Layman, <A
HREF="mailto:andrewl@microsoft.com">andrewl@microsoft.com</A>].
This recommendation relies on several <I>proposed</I> features of XML
which
are described in a separate document:</P>


<DT>Structured attributes</DT>

<DD>Tags beginning &quot;&lt;*&quot; identify attributes (in the SGML
sense)
of the enclosing element, not content. </DD>

<DD>E.g. &lt;x a=&quot;b&quot;&gt;&lt;/x&gt; is the same as
&lt;x&gt;&lt;*a&gt;b&lt;/a&gt;&lt;/x&gt;.
With the second form, b can contain XML tags.</DD>

<DT>Namespaces</DT>

<DD>Element names can be optionally qualified by the name of the
defining
schema. Any element can have a </DD>

<DD><I>xml-schema</I> attribute which introduces the schema, gives it a
shortname and makes the schema usable </DD>

<DD>within the element.</DD>

<DT>Local namespaces</DT>

<DD>Elements set the namespace for their contents. The default namespace
within any element (the one that </DD>

<DD>can be used without qualification) is the namespace in which the
element
is defined.</DD>

<DT>Short end tags</DT>

<DD>Element names can be omitted from closing tags. E.g.
&lt;x&gt;b&lt;/x&gt;
is the same as &lt;x&gt;b&lt;/&gt;</DD>

<DT>Empty elements</DT>

<DD>A bodiless element is the same as an empty element. E.g.
&lt;foo/&gt;
== &lt;foo&gt;&lt;/foo&gt;</DD>

<P>With the above syntax proposals, the XML encoding can be nearly a
transliteration
of the s-expression encoding. The suggestion has been made to eliminate
some of the special tokens (e.g. &quot;<TT>label</TT>&quot;) and use a
&quot;<TT>reference=</TT>&quot; attribute on <TT>for</TT> rather than
three
separate <TT>for</TT>, <TT>for-immediate</TT>, and <TT>for-indirect</TT>
tokens. These ideas are illustrated in section 6.2.</P>

<H2><A NAME="examples"></A>6. Examples</H2>

<P>The following examples show how PICS-NG&nbsp;is used to make certain
kinds of statements. Some of the examples are drawn from the
PICS&nbsp;1.1
specification.</P>

<P><I>[note: these examples are slanted to content filtering; we will
rewrite
them in a future draft to show other uses.]</I></P>

<H3><B>6.1 Examples in S-expression Encoding</B></H3>

<P>Some statement about a single document (URL):</P>

<UL>

<PRE>(pics-2.0
  (label *schema &quot;http://www.w3.org/authors-and-stuff&quot;
         *for &quot;http://www.w3.org/People/Lassila/&quot;
         author &quot;Ora Lassila&quot;))</PRE>
</UL>

<P>Some statement about two documents:</P>

<UL>
<PRE>(pics-2.0
  (label *schema &quot;http://www.w3.org/authors-and-stuff&quot;
         author &quot;Ora Lassila&quot;
         *for (list &quot;http://www.w3.org/People/Lassila/&quot;

&quot;http://www.w3.org/PICS/draft-lassila-pics-ng-metadata.html&quot;))
)</PRE>
</UL>

<P>Here are some of the examples from the <A
HREF="http://www.w3.org/pub/WWW/TR/REC-PICS-labels.html">PICS
1.1 document</A>, modified for the new syntax:</P>

<UL>
<PRE>(pics-2.0
  (label *schema &quot;http://www.gcf.org/v2.5&quot;
         by &quot;John Doe&quot;
         *for-indirect (list (label *for
&quot;http://w3.org/PICS/Overview.html&quot;
                                    on &quot;1994.11.05T08:15-0500&quot;
                                    until
&quot;1995.12.31T23:59-0000&quot;
                                    suds 0.5
                                    density 0
                                    color/hue 1)
                             (label *for
&quot;http://w3.org/PICS/Underview.html&quot;
                                    by &quot;Jane Doe&quot;
                                    subject 2
                                    density 1
                                    color/hue 1))))</PRE>
</UL>

<UL>
<PRE>(pics-2.0
  (label *schema &quot;http://www.gcf.org/v2.5&quot;
         *for-indirect (list (label suds 0.5 density 0 color/hue 1)
                             (label subject 2 density 1 color/hue
1))))</PRE>
</UL>

<P>A PICS label rating a statement about a URL (that is, the ratings
apply
to the statement, not the document):</P>

<UL>
<PRE>(pics-2.0
  (label *schema &quot;http://www.gcf.org/v.2.5&quot;
         *for-immediate (label *schema
&quot;http://www.w3.org/authors-and-stuff&quot;
                               *for
&quot;http://www.w3.org/soap.html&quot;
                               author &quot;Ora Lassila&quot;)
         suds 1
         density 0
         color/hue 0))</PRE>
</UL>

<P>The same example as above, except that label naming is used instead
of an explicit containment hierarchy:</P>

<UL>
<PRE>(pics-2.0
  (label *schema &quot;http://www.w3.org/authors-and-stuff&quot;
         *id foo
         *for &quot;http://www.w3.org/soap.html&quot;
         author &quot;Ora Lassila&quot;)
  (label *schema &quot;http://www.gcf.org/v.2.5&quot;
         *for foo
         suds 1
         density 0
         color/hue 0))</PRE>
</UL>

<P>A label making use of multiple values and metadata attached to
attribute
values:</P>

<UL>
<PRE>(pics-2.0
  (label *schema &quot;http://purl.org/Schemas/description1&quot;
         *for &quot;http://purl.color.org/document.html&quot;
         title &quot;Light and Dark: A study of color&quot;
         subject (label *schema &quot;http://purl.org/Schemas/LCSH&quot;
                        *for-immediate &quot;Color and Color
Palates&quot;)
         author (list (label *schema
&quot;http://www.foo.com/author&quot;
                             name &quot;John Smith&quot;
                             affiliation &quot;thedarkside&quot;
                             email &quot;john@thedarkside&quot;)
                      (label *schema
&quot;http://www.foo.com/author&quot;
                             name &quot;Smith, Jane Q.&quot;
                             affiliation &quot;thelightregion&quot;
                             email
&quot;jane@thelightregion&quot;))))</PRE>
</UL>

<P>An example demonstrating how common data can be shared by several
labels:</P>

<UL>
<PRE>(pics-2.0
  (label *schema &quot;http://www.gcf.org/v.2.5&quot;
         *for-indirect (label *schema
&quot;http://purl.org/Schemas/description1&quot;
                              author &quot;Ora Lassila&quot;
                              subject (label *schema
&quot;http://purl.org/Schemas/LCSH&quot;
                                             *for-immediate &quot;Color
and Color Palates&quot;)
                              *for (list (label *for
&quot;http://www.w3.org/foo&quot;
                                                author &quot;Ralph
Swick&quot;
                                                title &quot;Fundamentals
of Foos&quot;)
                                         (label *for
&quot;http://www.w3.org/bar&quot;
                                                title &quot;Fundamentals
of Bars&quot;)
                                         (label *for
&quot;http://www.w3.org/foobar&quot;
                                                title &quot;Foos vs.
Bars&quot;)))
         by &quot;Jim Miller&quot;
         suds 1.0
         density 0.5
         hue/color 0.0)))</PRE>
</UL>

<P>The labels of the previous example written out so that each of them
stands alone (i.e. no sharing of fragments of metadata):</P>

<UL>
<PRE>(pics-2.0
  (label *schema &quot;http://www.gcf.org/v.2.5&quot;
         *for (label *schema
&quot;http://purl.org/Schemas/description1&quot;
                     *for &quot;http://www.w3.org/foo&quot;
                     subject (label *schema
&quot;http://purl.org/Schemas/LCSH&quot;
                                    *for-immediate &quot;Color and Color
Palates&quot;)
                     author &quot;Ralph Swick&quot;
                     title &quot;Fundamentals of Foos&quot;)

         by &quot;Jim Miller&quot;
         suds 1.0
         density 0.5
         hue/color 0.0)
  (label *schema &quot;http://www.gcf.org/v.2.5&quot;
         *for (label *schema
&quot;http://purl.org/Schemas/description1&quot;
                     *for &quot;http://www.w3.org/bar&quot;
                     subject (label *schema
&quot;http://purl.org/Schemas/LCSH&quot;
                                    *for-immediate &quot;Color and Color
Palates&quot;)
                     author &quot;Ora Lassila&quot;
                     title &quot;Fundamentals of Bars&quot;)
         by &quot;Jim Miller&quot;
         suds 1.0
         density 0.5
         hue/color 0.0)
  (label *schema &quot;http://www.gcf.org/v.2.5&quot;
         *for (label *schema
&quot;http://purl.org/Schemas/description1&quot;
                     *for &quot;http://www.w3.org/foobar&quot;
                     subject (label *schema
&quot;http://purl.org/Schemas/LCSH&quot;
                                    *for-immediate &quot;Color and Color
Palates&quot;)
                     author &quot;Ora Lassila&quot;
                     title &quot;Foos vs. Bars&quot;)
         by &quot;Jim Miller&quot;
         suds 1.0
         density 0.5
         hue/color 0.0))</PRE>
</UL>

<H3><B><FONT SIZE=+1>6.2 Examples in XML Encoding</FONT></B></H3>

<P>[This section contributed by Andrew Layman, &lt;<A
HREF="mailto:andrewl@microsoft.com">andrewl@microsoft.com</A>&gt;
with some minor editting by Ralph Swick. The examples are equivalent to
those in section 6.1 As stated above in section 5.2, these examples rely
on several <I>proposed</I> features of XML which are described
elsewhere.]</P>

<P><I>Andrew's comments on the examples:</I></P>

<P>In the main, names and other characteristics of section 6.1 are used
here to make comparison with the s-expression syntax easier, since our
main goal is to verify that XML is able to express the same statements
as s-expressions can.</P>

<P>Schema shortnames are illustrated in these examples. The shortnames
are chosen according the Java conventions for package names. It is
overkill
for these examples, but shows how one can absolutely avoid any
possibility
of name conflicts, even as schemas evolve. </P>

<P>The s-expressions examples use an element called &quot;label.&quot;
Obviously, a <I>label</I> is meant to be the root type of all elements:
Devoid of any particular properties or attributes, it can be subclassed
to become anything, with subclassing effected by the &quot;*schema&quot;
attribute. That is, all labels are really particular kinds of things,
identified
by their &quot;*schema&quot; attribute. Each schema evidently describes
one kind of object. In contrast, in the XML proposal, all elements are
explicitly of some particular type, drawn from the namespace of an
xml-schema
attribute of a parent element. For instance, the first s-expression
example
has a label of type &quot;http://www.w3.org/authors-and-stuff&quot; (in
the s-expression model, element types are URIs). The first XML example
introduces an &quot;http://www.w3.org&quot; schema, then draws from it
a particular element type, &quot;authors-and-stuff&quot;. (This really
should be a name meaning &quot;thing with author and other
attributes&quot;
but I have not changed the names in these examples.) </P>

<P>Some statement about a single document (URL): </P>

<PRE>&lt;*xml-schema ref=&quot;http://www.w3.org&quot; /&gt;
&lt;authors-and-stuff
    for=&quot;http://www.w3.org/People/Lassila&quot;
    author=&quot;Ora Lassila&quot; /&gt;</PRE>

<P>Some statement about two documents: </P>

<PRE>&lt;*xml-schema ref=&quot;http://www.w3.org&quot;
as=&quot;org.w3.www&quot; /&gt;
&lt;org.w3.www.authors-and-stuff 
    author=&quot;Ora Lassila&quot; &gt;
    &lt;*for&gt;  &lt;thing&gt;http://www.w3.org/People/Lassila&lt;/&gt;

&lt;thing&gt;http://www.w3.org/pics/draft-lassila-pics-ng-metadata.html&
lt;/&gt;
            &lt;/for&gt;
    &lt;/authors-and-stuff&gt;</PRE>

<P>Here are some examples from the PICS 1.1 Document modified for the
new
syntax: </P>

<PRE>&lt;*xml-schema ref=&quot;http://www.gcf.org&quot;
as=&quot;org.gcf.www&quot; /&gt;
&lt;org.gcf.www.v2:5       
    by=&quot;John Doe&quot; &gt;
    &lt;*for reference=&quot;indirect&quot;&gt;
        &lt;thing
            for=&quot;http://w3.org/PICS/Overview.html&quot;
            on=&quot;1994.11.05T08:15-0500&quot;
            until=&quot;1995.12.31T23:59-0000&quot;
            suds=&quot;0.5&quot;
            density=&quot;0&quot;
            color-hue=&quot;1&quot; /&gt;
        &lt;thing
            for=&quot;http://w3.org/PICS/Underview.html&quot;
            by=&quot;Jane Doe&quot;
            subject=&quot;2&quot;
            density=&quot;1&quot;
            color-hue=&quot;1&quot; /&gt;
        &lt;/for&gt; &lt;/v2:5&gt;</PRE>

<PRE>&lt;org.gcf.www.v2:5&gt;      
    &lt;*for reference=&quot;indirect&quot;&gt;
        &lt;thing&gt; &lt;v2:5   suds=&quot;0.5&quot;
density=&quot;0&quot; color-hue=&quot;1&quot; /&gt; &lt;/thing&gt;
        &lt;thing&gt; &lt;v2:5   subject=&quot;2&quot;
density=&quot;1&quot; color-hue=&quot;1&quot; /&gt; &lt;/thing&gt;
        &lt;/for&gt; &lt;/v2:5&gt;</PRE>

<P>A PICS label rating a statement about a URL (that is, the ratings
apply
to the statement, not the referenced document): </P>

<PRE>&lt;*xml-schema ref=&quot;http://www.gcf.org&quot;
as=&quot;org.gcf.www&quot;&gt;
&lt;org.gcf.www.v2:5&gt;
    &lt;*for reference=&quot;immediate&quot;&gt;
        &lt;*xml-schema ref=&quot;http://www.w3.org&quot;
as=&quot;org.w3.www&quot;&gt;
        &lt;org.w3.www.authors-and-stuff
            for=&quot;http://www.w3.org/soap.html&quot;
            author=&quot;Ora Lassila&quot; /&gt; &lt;/for&gt; 
    &lt;*suds&gt;1&lt;/&gt;
    &lt;*density&gt;0&lt;/&gt;
    &lt;*color hue=&quot;0&quot; /&gt;  &lt;/v2:5&gt;
</PRE>

<P>The same example as above, except that label naming is used instead
of containment hierarchy: </P>

<PRE>&lt;*xml-schema ref=&quot;http://www.gcf.org&quot;
as=&quot;org.gcf.www&quot;/&gt;
&lt;*xml-schema ref=&quot;http://www.w3.org&quot;
as=&quot;org.w3.www&quot;/&gt;
&lt;org.w3.www.authors-and-stuff
    id=&quot;foo&quot;
    for=&quot;http://www.w3.org/soap.html&quot;
    author=&quot;Ora Lassila&quot; /&gt; 
&lt;org.gcf.www.v2:5 &gt;
    &lt;*for&gt;#foo&lt;/&gt;
    &lt;*suds&gt;1&lt;/&gt;
    &lt;*density&gt;0&lt;/&gt;
    &lt;*color hue=&quot;0&quot; /&gt; &lt;/v2:5&gt; 
</PRE>

<P>A label making use of multiple values and metadata attached to
attribute
values: </P>

<PRE>&lt;*xml-schema ref=&quot;http://purl.org/Schemas&quot;
as=&quot;org.purl&quot; /&gt;
&lt;org.purl.description1
    for=&quot;http://purl.color.org/document.html&quot;
    title=&quot;Light and Dark: A study of color&quot; &gt;
    &lt;*subject&gt;
        &lt;lcsh&gt;  &lt;*for reference=&quot;immediate&quot;&gt;Color
and Color Palettes&lt;/&gt;&lt;/&gt;
        &lt;/subject&gt;
    &lt;*author&gt;
        &lt;*xml-schema ref=&quot;http://www.foo.com&quot;
as=&quot;com.foo.www&quot; /&gt;
        &lt;com.foo.www.author
            name=&quot;John Smith&quot;
            affiliation=&quot;thedarkside&quot;
            email=&quot;john@thedarkside&quot; /&gt;
        &lt;com.foo.www.author
            name=&quot;Smith, Jane Q.&quot;
            affiliation=&quot;thelightregion&quot;
            email=&quot;jane@thelightregion&quot; /&gt;
&lt;/description1&gt;
</PRE>

<P>An example demonstrating how common data can be shared by several
labels.
(Note: Evidently in this metadata application, attributes of a parent
are
attributed to each child. Such behavior is probably reasonable for this
example and the particular attributes used in it, but would need to be
controlled carefully in applications using either default values or
subclassing.)
</P>

<PRE>&lt;*xml-schema ref=&quot;http://www.gcf.org&quot;
as=&quot;org.gcf.www&quot;&gt;
&lt;org.gcf.www.v2:5&gt;
    &lt;*for reference=&quot;indirect&quot;&gt;
    &lt;*xml-schema ref=&quot;http://purl.org/Schemas&quot;
as=&quot;org.purl&quot; /&gt;
    &lt;org.purl.description1
        author=&quot;Ora Lassila&quot;&gt;
        &lt;*subject&gt; 
            &lt;lcsh&gt; &lt;*for
reference=&quot;immediate&quot;&gt;Color and Color Palettes&lt;/&gt;
                   &lt;/lcsh&gt;&lt;/subject&gt;
        &lt;*for&gt;
            &lt;description1
                author=&quot;Ralph Swick&quot;
                title=&quot;Fundamentals of Foos&quot; /&gt;
            &lt;description1
                for =&quot;http://www.w3.org/bar&quot;
                title=&quot;Fundamentals of Bars&quot; /&gt;
            &lt;description1
                for =&quot;http://www.w3.org/foobar&quot;
                title=&quot;Foos vs. Bars&quot; /&gt; &lt;/for&gt;
        &lt;/description1&gt; &lt;/for&gt;
    &lt;*by&gt;Jim Miller&lt;/&gt;
    &lt;*suds&gt;1.0&lt;/&gt;
    &lt;*density&gt;0.5&lt;/&gt;
    &lt;*color hue=&quot;0&quot; /&gt; &lt;/v2:5&gt; </PRE>

<P>The labels of the preceding example written out so that each of them
stands alone (i.e. no sharing of fragments of metadata): </P>

<PRE>&lt;*xml-schema ref=&quot;http://www.gcf.org&quot;
as=&quot;org.gcf.www&quot;&gt;
&lt;org.gcf.www.v2:5&gt;
    &lt;*for&gt;
        &lt;*xml-schema ref=&quot;http://purl.org/Schemas&quot;
as=&quot;org.purl&quot; /&gt;
        &lt;org.purl.description1
            for=&quot;http://www.w3.org/foo&quot; &gt;
            &lt;*subject&gt; 
                &lt;lcsh&gt; &lt;*for
reference=&quot;immediate&quot;&gt;Color and Color Palettes
                       &lt;/for&gt;
                &lt;/lcsh&gt;&lt;/subject&gt;
            &lt;*author&gt;Ralph Swick&lt;/&gt;
            &lt;*title&gt;Fundamentals of Foos&lt;/&gt;
            &lt;/description1&gt; &lt;/for&gt; &lt;/description1&gt;
    &lt;*by&gt;Jim Miller&lt;/&gt;
    &lt;*suds&gt;1.0&lt;/&gt;
    &lt;*density&gt;0.5&lt;/&gt;
    &lt;*color hue=&quot;0&quot; /&gt; &lt;/v2:5&gt;
&lt;org.gcf.www.v2:5&gt;
    &lt;*for&gt;
        &lt;*xml-schema ref=&quot;http://purl.org/Schemas&quot;
as=&quot;org.purl&quot; /&gt;
        &lt;org.purl.description1
            for=&quot;http://www.w3.org/bar&quot; &gt;
            &lt;*subject&gt; 
                &lt;lcsh&gt; &lt;*for
reference=&quot;immediate&quot;&gt;Color and Color Palettes
                       &lt;/for&gt;
                &lt;/lcsh&gt;&lt;/subject&gt;
            &lt;*author&gt;Ora Lassila&lt;/&gt;
            &lt;*title&gt;Fundamentals of Bars&lt;/&gt;
&lt;/description1&gt; &lt;/for&gt;
    &lt;*by&gt;Jim Miller&lt;/&gt;
    &lt;*suds&gt;1.0&lt;/&gt;
    &lt;*density&gt;0.5&lt;/&gt;
    &lt;*color hue=&quot;0&quot; /&gt;
&lt;/v2:5&gt;&lt;org.gcf.www.v2:5&gt;
    &lt;*for&gt;
        &lt;*xml-schema ref=&quot;http://purl.org/Schemas&quot;
as=&quot;org.purl&quot; /&gt;
        &lt;org.purl.description1
            for=&quot;http://www.w3.org/foobar&quot; &gt;
            &lt;*subject&gt; 
                &lt;lcsh&gt; &lt;*for
reference=&quot;immediate&quot;&gt;Color and Color Palettes
                       &lt;/for&gt;
                &lt;/lcsh&gt;&lt;/subject&gt;
            &lt;*author&gt;Ora Lassila&lt;/&gt;
            &lt;*title&gt;Foos vs. Bars&lt;/&gt; &lt;/description1&gt;
&lt;/for&gt;
    &lt;*by&gt;Jim Miller&lt;/&gt;
    &lt;*suds&gt;1.0&lt;/&gt;
    &lt;*density&gt;0.5&lt;/&gt;
    &lt;*color hue=&quot;0&quot; /&gt; &lt;/v2:5&gt;</PRE>

<H2><A NAME="issues"></A>7. Open Issues</H2>

<H3>7.1. Multiple Values, Attribute Order, etc.</H3>

<P>As specificed, attribute order is not significant, but value order
(for
multiple values) is. Some syntactic approaches to multiple values may
allow
the same attribute to be specified multiple times (see, for example, the
<A
HREF="http://www.w3.org/pub/WWW/Member/9703/XMLsubmit.html">XML&nbsp;Col
lections
proposal</A> [Hopmann 97]). In this case the order of the <I>same</I>
attributes
is significant.</P>

<P>To allow for conjunctive as well as disjunctive sets of multiple
values,
the sequence operator &quot;<TT>list</TT>&quot; may in the future be
replaced
by the operators &quot;<TT>and</TT>&quot; and &quot;<TT>or</TT>&quot;.
The actual ramifications of this to the model and possible
implementations
are at this point unclear.</P>

<H3>7.2. Inheritance</H3>

<P>Inheritance takes place in a hierarchy of lexically enclosed labels.
Propagating inherited values is simple if one&quot;sees&quot; the entire
hierarchy. From an individual label's standpoint, however, inheritance
works using unidirectional links the label has no knowledge of. This is
confusing: since a label does not know of all the links pointing to it,
it can not alone determine the values it inherits (this is the reason
why
we do not allow inheritance over links expressed as URLs).</P>

<P>As currently defined in this document, the multiple schemata
mechanism
does not allow for the use of &quot;mixin&quot; schemata. For flexible
means of extending metadata, a full multiple inheritance mechanism may
be necessary.</P>

<H3>7.3. PICS 1.1 Error Tokens and Extensions</H3>

<P>Error tokens defined by PICS version 1.1 as well as the former
version
of the PICS-NG proposal are not included in this document. There are two
ways to introduce error tokens and other similar constructs: errors
could
be represented by labels (referring to a special error schema, defined
together with the other basic PICS&nbsp;schemata), or by allowing
additional
prefix operators (such as <TT>error</TT>) in addition to the ones
defined
by this document (i.e., <TT>label</TT>, <TT>list</TT> and
<TT>range</TT>).</P>

<P>Since the PICS 1.1 metadata architecture is easily extensible, the
old
extension mechanism of PICS 1.1 is no longer needed. The multiple
schemata
approach can be used for &quot;optional&quot; extensions, a single new
schemata should be used for &quot;mandatory&quot; ones.</P>

<H3>7.4. PICS Ratings vs. PICS Options</H3>

<P>Some people have expressed concerns about the fact that old PICS
options
are now mixed with the transmit names of ratings. Technically this is
<B>not
a problem</B> because we have a way of determining which attributes are
which, but from a metadata author's standpoint this can be confusing. A
possible solution is to put all options into a separate label and make
that label a value of a new attribute (called, say,
&quot;label-attributes&quot;).
The options-schema can be defined in the same definition file as the
basic
content rating schema, and referred to using the fragment identifier
syntax
(say, &quot;#options&quot;). Inheritance of individual label options
becomes
difficult if they are placed in a separate label.</P>

<H3>7.5. Canonical Form of Syntax</H3>

<P>A minimal, canonical form of the syntax used has to be defined, for
purposes of signing PICS-NG labels and for mechanically producing label
representations.</P>

<H2><A NAME="literature"></A>8. Literature</H2>

<P>[Berners-Lee 94] Berners-Lee, Tim et al, 1994. <I>Uniform Resource
Locators
(URL)</I>. RFC 1738, CERN (et al). Available as <A
HREF="http://ds.internic.net/rfc/rfc1738.txt">http://ds.internic.net/rfc
/rfc1738.txt</A>.</P>

<P>[Hopmann 97] Alex Hopmann et al, 1997. <I>Web Collections using
XML</I>.
Proposal (submitted to W3C), Microsoft Corporation. Available as <A
HREF="http://www.w3.org/pub/WWW/Member/9703/XMLsubmit.html">http://www.w
3.org/pub/WWW/Member/9703/XMLsubmit.html</A>.</P>

<P>[Krauskopf 96] Krauskopf, Tim et al, 1996. <I>PICS Label Distribution
Label Syntax and Communication Protocols, Version 1.1</I>. W3C
Recommendation
31-October-96. Available as <A
HREF="http://www.w3.org/pub/WWW/TR/REC-PICS-labels.html">http://www.w3.o
rg/pub/WWW/TR/REC-PICS-labels.html</A>.</P>

<P>[Lassila 97] Lassila, Ora, 1997. <I>PICS-NG Label Syntax
Proposal</I>.
Unpublished working paper, W3C. Available as <A
HREF="http://www.w3.org/pub/WWW/PICS/draft-lassila-pics-ng-label-syntax.
html">http://www.w3.org/pub/WWW/PICS/draft-lassila-pics-ng-label-syntax.
html</A>.</P>

<P>[Miller 96] Miller, Jim et al, 1996. <I>Rating Services and Rating
Systems
(and Their Machine Readable Descriptions), Version 1.1</I>. W3C
Recommendation
31-October-96. Available as <A
HREF="http://www.w3.org/pub/WWW/TR/REC-PICS-services-961031.html">http:/
/www.w3.org/pub/WWW/TR/REC-PICS-services-961031.html</A>.</P>

<H2><A NAME="appendix-xml"></A>Appendix A: Correspondence to the XML Web
Collection Proposal</H2>

<P>In this section, we compare the above metadata object model to the
model
defined in &quot;<A
HREF="http://www.w3.org/pub/WWW/Member/9703/XMLsubmit.html">Web
Collections using XML</A>&quot; [Hopmann 97]. The text below in the
column
titled &quot;XML&nbsp;model&quot; is quoted directly from section 2.2
&quot;The
Web Collection model.&quot; Commentary also includes information
acquired
in private discussions with Alex Hopmann.</P>

<TABLE BORDER=1 CELLSPACING=0 CELLPADDING=4 >
<TR ALIGN=LEFT VALIGN=TOP>
<TD>
<H3>XML model</H3>
</TD>

<TD>
<H3>Commentary</H3>
</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><FONT SIZE=-1>Web Collections provide a hierarchical structure for
storing properties that describe objects. A collection is simply an
association
of field names to values. The meanings of these field names are defined
by the profile is specified for the given collection. </FONT></TD>

<TD>In this respect the two models are identical. The word
&quot;profile&quot;
is used in the same meaning as our term &quot;schema&quot;, and the word
&quot;property&quot; is used in lieu of &quot;attribute.&quot;</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><FONT SIZE=-1>A collection is not required to contain properties
correlating
to each field in its profile. Similarly, a collection may contain
properties
that do not correspond to any field in its profile. A collection may
also
contain more than one property that correlates to a single field in its
profile.</FONT> </TD>

<TD>Unknown attributes are permitted if an application is not concerned
of the semantics of a label. Lists take the place of multiple
occurrences
of a property.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><FONT SIZE=-1>The order of properties in a collection can be
significant
in specific applications but is not necessarily significant in all
applications.
Likewise, applications will determine the meaning of multivalued
properties,
missing properties, and properties that do not correspond to fields in
the profile; applications may deem a collection invalid if does not
contain
appropriate information. However applications MUST be able to at a
minimum
gracefully ignore additional properties that they do not understand.
</FONT></TD>

<TD>Schemata define all semantics. See the Open Issues section regarding
ordering.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><FONT SIZE=-1>A primary collection must explicitly refer to its
profile.
Secondary collections usually have implied profiles (such as the profile
of the collection which encapsulates them), though they may explicitly
refer to a profile. </FONT></TD>

<TD>The label model has a loosely similar inheritance mechanism. The XML
Collection model does not specify inheritance very clearly.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><FONT SIZE=-1>Web Collections support aggregate profiles. This is
the
ability to specify that a given collection has a properties from a first
profile, and furthermore additional properties from other profiles.
</FONT></TD>

<TD>See the Open Issues section regarding inheritance.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><FONT SIZE=-1>This Web Collection specification draws a sharp line
between the Web Collections syntax and the semantics implied by a
particular
application. A computer program must be able to parse and manipulate the
Web Collection data without understanding the specific application. It
need not however be able to do anything with the data unless it
understands
that specific profile. </FONT></TD>

<TD>Without knowledge of semantics, we do not believe any useful
manipulation
is feasible.</TD>
</TR>

<TR ALIGN=LEFT VALIGN=TOP>
<TD><FONT SIZE=-1>Web Collections draw a distinction between two types
of URIs. This distinction is based on the needs of a syntax parser. A
URI
can be used to point to some other resource (behaving like a link) in
which
case it is just normal data in the collection (a value), or a URI might
be used to include some other resource within the collection (an inline
reference). A Web Collection parser might use this information to
determine
whether to encapsulate additional resources with the Web Collection.
</FONT></TD>

<TD>Inclusion by reference will be a barrier to adoption by firewall
vendors.
This feature should be excluded from the model. However, the label model
allows a referent to be a label. The semantics of a schema determine
whether
the value of an attribute is <I>interpreted</I> as a reference.</TD>
</TR>
</TABLE>

<P>
<HR WIDTH="100%"></P>

<ADDRESS><FONT SIZE=-1>Ora Lassila &lt;<A
HREF="mailto:lassila@w3.org">lassila@w3.org</A>&gt;</FONT></ADDRESS>

<P><FONT SIZE=-1>Revision History:<BR>
05-May-97 [swick] Add W3C logo and doc title at top<BR>
14-May-97 [swick] Add Sections 5.2 and 6.2 from&nbsp;Andrew Layman.
Correct
usage of &quot;for-immediate&quot; in the examples in Section 6.1
(formerly
just Section 6). This version published as &quot;version 3.5&quot; with
a new date code.</FONT></P>

</BODY>
</HTML>
------------------------------------------------------------------------
--------------------------------------------------------------
Received on Friday, 16 May 1997 00:14:19 UTC