- From: dehora <dehora@eircom.net>
- Date: Sat, 25 Aug 2001 03:24:09 +0100
- To: <w3c-rdfcore-wg@w3.org>
[a review of what the M&S says about literals] see also: <http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Jul/0434.html> RDF 1.0 syntax: the XML BNF as per section 6. RDF 1.0 model: the formal model as per section 5. Quick overview of what the M&S says about literals: Literals MUST be well formed XML with respect to the RDF 1.0 syntax. Some constraints SHALL apply to literals containing XML markup with respect to the RDF 1.0 syntax. Such constraints SHOULD NOT apply to other serializations. Literals MAY be other than well formed XML with respect to the RDF 1.0 model. The RDF 1.0 model is NOT dependent on XML in its conception of a literal. Literals are NOT defined for the RDF model. (at least that's what I've divined it says...) Commentary: My current take is the RDF 1.0 model is not restricted to using well formed XML for the inscription of literals. I think the wg should decide if this is the case. I've been under the happy illusion of late that literals in RDF are structured XML. This seems to only be the case when RDF is serialized according to RDF 1.0 XML syntax. Section 2.1 says "In RDF terms, a literal may have content that is XML markup but is not further evaluated by the RDF processor." Is the intent to say that a literal may optionally be XML markup (certainly there's more than one parse tree for that sentence)? If you combine such an interpretation, with the reference to "string literals" in 2.1.1, the pictorial examples (which do not include XML tags in the literals employed) and the glossary definition of a literal, there's enough to allow the interpretation that literals may be other than well formed XML, with respect to the RDF model. However, it seems possible to interpret the document either way. [Note however that the discussion of literals with respect to the model does not offer or recommend a character encoding for literals (that is, literals do not have a canonical form in the M&S). Looking to the MT for help, I see that it defers to n-triples encoding for the literals. I'm guessing that you can drop any encoding scheme+mapping function to the set of literals LV into the MT and it would not affect what are true statements in RDF (help?).] In sections 2.2.1 and 6 it is clear that literals MUST be well formed XML with respect to the BNF: but is this the case for any RDF syntax, or indeed for the model? The M&S doesn't actually say. Contrariwise, it's clear that a URI used in RDF is the same across syntaxes, that is, it's clear that the RDF 1.0 Model depends on the URI syntax. All in all the M&S could be more explicit on literal encoding: I'm honestly not sure what the authors' intent was (help?). There are some desirable outcomes of not insisting that literals are XML: -any formal model of RDF will not come to be dependent on XML for the serialization of literals; this is consistent with the notion of having a syntax independent model. -other serializations can avoid being dependent on XML for literals. For example n-triples 1.8 is not constrained to use well formed XML for literals. There is at least one undesirable outcome: -transcoding literals between syntaxes, such as between n-triples and RDF 1.0 syntax, might have lossy corner cases, and possibly wrt namespaces. Some proposals follow as a result of all this. Proposal 1: add a serialization independent definition of a Literal early on in the document. Forces: Literals are not clearly defined for the RDF 1.0 model; Literals and their representation in XML are not clearly distinguished in the M&S; So: edit section 2.1 along the lines of: " 2.1. Basic RDF Model [...] The basic data model consists of four object types: Resources [...] Properties [...] Literals The most primitive value type represented in RDF, typically a string of characters [XXX: 7-bit US-ASCII? as per n-triple/model theory?]. Literals are distinguished from Resources in that the RDF model does not permit literals to be the subject of a statement. For the XML serialization syntax described in this document, there are syntactic restrictions on how XML markup in literals can be expressed; see Section 2.2.1. Statements A specific resource together with a named property plus the value of that property for that resource is an RDF statement. These three individual parts of a statement are called, respectively, the subject, the predicate, and the object. The object of a statement (the property's value) can be another resource or it can be a literal." Proposal 2: state clearly that the constraints placed on literals for the serialization syntax SHOULD NOT apply to other or future syntaxes. Forces: prevent dependencies on XML by other syntaxes as well as the RDF formal model or future MT. So: edit section 2.2.1 along the lines of: "Section 2.2.1 [...] Within a propertyElt, the resource attribute specifies that some other resource is the value of this property; that is, the object of the statement is another resource identified by URI rather than a literal. The resource identifier of the object is obtained by resolving the resource attribute URI-reference in the same manner as given above for the about attribute. Strings must be well-formed XML; the usual XML content quoting and escaping mechanisms may be used if the string contains character sequences (e.g. "<" and "&") that violate the well-formed ness rules or that otherwise might look like markup. See Section 6. for additional syntax to specify a property value with well-formed XML content containing markup such that the markup is not interpreted as RDF. Note that such syntactic constraints on RDF literals are a result of the use of XML and might not apply to other or future RDF serializations." [...]" Proposal 3: parseType attribute values beginning with 'rdf:' are reserved for use by the RDFCore/W3C. Proposal 3.1: move 'Literal' and 'Resource' parseTypes to 'rdf:Resource' and 'rdf:Literal' Forces: parseType is a useful and used extension mechanism for RDF application developers and modelers; <http://lists.w3.org/Archives/Public/www-rdf-comments/2001AprJun/0127.html>; "The RDF Model and Syntax Working Group acknowledges that the parseType='Literal' mechanism is a minimum-level solution to the requirement to express an RDF statement with a value that has XML markup. Additional complexities of XML such as canonicalization of whitespace are not yet well defined. Future work of the W3C is expected to resolve such issues in a uniform manner for all applications based on XML. Future versions of RDF will inherit this work and may extend it as we gain insight from further application experience."; "The parseType attribute changes the interpretation of the element content. The parseType attribute should have one of the values 'Literal' or 'Resource'. The value is case-sensitive. The value 'Literal' specifies that the element content is to be treated as an RDF/XML literal; that is, the content must not be interpreted by an RDF processor. The value 'Resource' specifies that the element content must be treated as if it were the content of a Description element. Other values of parseType are reserved for future specification by RDF. With RDF 1.0 other values must be treated as identical to 'Literal'."-the mandate of reserving future parseTypes and the mandate that parseTypes in the wild 'should' be either Literal or Resource are not wholly consistent with each other. So: amend section 6 BNF and prose to fully reserve parseTypes starting with 'rdf:'. Informatively indicate that alternate parseTypes are a useful extension mechanism. Deprecate 'Literal' and 'Resource' in favor of 'rdf:Literal' and 'rdf:Resource'. Mandate that unrecognized parseTypes are treated as rdf:Literal. Bill -- InterX bdehora@interx.com dehora@acm.org +44(0)20-8817-4039 www.interx.com
Received on Friday, 24 August 2001 22:25:10 UTC