This is an attempt to have a consistent story around SML references. (sml:ref="true"; not including sml:keyrefs.) This covers SML Bugzillas bugs 4658, 4673, 4682, 4683, 4780/4795, 4834, 4865, 4884, 4976.
Some of concepts and suggestions are already implemented in the first public working draft of the specifications. They are included here to make this proposal complete. Resolutions to Bugzilla bugs are marked with the corresponding bug number.
An element information item in an SML instance document is an SML reference if and only if it has an attribute information item whose {name} is "ref" and whose {namespace} is <sml namespace> and whose {normalized value}, after whitespace normalization using "collapse" following schema rules, is either "true" or "1".
Note: this doesn't depend on having a schema, and will not be affected if a schema is used to produce a PSVI.
Rationale: schemas may not always exist. People need to understand SML documents with SML semantics (just like people can process XML documents without a schema; perform XSLT without a schema.)
If element R is a non-null SML reference, then for reference-handling purposes, R is processed as a unit by SML consumers (who know how to resolve such reference; including their implementations of the deref() function). Note that the above only applies to reference-handling. Often, R may also have non-reference-scheme content and/or attributes (collectively: information items) that may be used in other contexts for other purposes, in which case it doesn¡¯t need to be viewed as a unit.
Producers
decide which reference scheme(s) to use; in order to be useful, at least one reference
scheme used by the producer of an SML-IF document should be understood by each
consumer of the document. Each
consumer decides which reference scheme(s) to recognize, and this choice MUST be
consistent between the consumer¡¯s implementation of every
reference scheme the consumer
recognizes mainline and
deref()
components, which may be separate implementations.
Every consumer must follow a set of rules when processing a reference, but there is a minor difference between the processing required of validators and that required more generally of all consumers. This happens when a single reference R contains multiple schemes recognized by the consumer or the recognized scheme(s) resolve(s) to more than one target. Since a particular consumer may only use some of the SML-defined semantics, it may choose to short-circuit the reference resolution process after finding one scheme it recognizes that resolves successfully; a validator, since it is expected to implement *all* the SML-defined semantics, must evaluate all reference schemes that it recognizes. It is perfectly acceptable for a general consumer to follow exactly the same rules required for a validator, e.g. to facilitate code re-use between validator and non-validating consumers.
The general form for handling non-null references is shown below. Note: this assumes that all attempts to resolve a reference will always return 0, 1 or many targets. That is, the attempt never fails (but resolution may fail) and all model documents are always reachable.
1. If the consumer recognizes no scheme used in the reference, then R is unresolved
2. Assume the consumer recognizes R as using N schemes supported by the consumer, then (answers bug [4976]):
2.1. The validator consumer MUST attempt to resolve R using all N schemes, and
2.1.1. If none
of the recognized schemes resolves, then R is unresolved.
2.1.2. If at
least one of the recognized schemes resolves to more than one target element,
then report an errorthe model is invalid.
2.1.3. If one
scheme resolves to a target that¡¯s different from the target resolved by
another scheme, then report an errorthe model is invalid.
2.1.4. If one
scheme resolves and another doesn¡¯t, then report an errorthe
model is invalid.
2.1.5. If none of the above is true (that is, all recognized schemes resolve to the same one and only one target element, call it T), then R is resolved to T.
2.2. Non-validator consumers are not required to attempt to resolve all N schemes. Their behavior in this case is implementation-defined, but it has to satisfy the following constraints:
2.2.1. It either treats R as unresolved, or resolves R to a single target T.
2.2.2. If none of the recognized schemes resolves, then R is unresolved.
2.2.3. If R is treated as unresolved, then at
least the condition of one the above
2.1.1~2.1.3 4 is true for those schemes that are attempted by the
consumer.
2.2.4. If R resolves to T, then T MUST be among the target(s) resolved by at least one recognized scheme.
In summary, the consumer has to obey the ¡°at most one target¡± rule, and it can¡¯t lie about the result. The above rules allow a general consumer to behave in (and not limited to) any of the following ways or any combination of them:
¡¤ Only use the first scheme it recognizes, and ignore other schemes.
¡¤ Try the recognized schemes one by one until one resolves.
¡¤ If a scheme or multiple schemes resolve to more than one target, make R unresolved.
¡¤ If a scheme or multiple schemes resolve to more than one target, pick one of them as the target for R.
Note:
implementation-defined means a consumer MUST clearly *document* its behavior.
Note: 2.2.3 above indicates non-validator consumers are *not* required to attempt to resolve any recognized schemes.
Note: only validators are required to report ¡°multiple target¡± errors. This means the current ¡°3.1.2.1 At Most One Target¡± constraint needs to be updated to only cover generic consumers. Validator behavior needs to be moved to a different section.
Any attempt to resolve a reference is limited to the *current* model. That is, target MUST be within the current model.
So any given reference falls into one of 3 categories: null, resolved, and unresolved. (More about ¡°null¡± in section 4¡±.) Suggest we delete the definition of ¡°dangling reference¡± and use ¡°unresolved reference¡± instead, because the former doesn¡¯t carry more information than the latter. For unresolved references, it¡¯s not always possible to tell whether it¡¯s because the reference is wrongly specified or whether the target is outside of the model. (But ¡°dangling¡± would imply the latter case.)
Implementations
of the deref() function should be governed by the same rules as generic
non-validator consumers. A deref()
implementation should not report errors (which isn¡¯t allowed by XPath 1.0
anyway). For a given reference element, it MAY either return nothing
(unresolved) or one target. It MUST NOT return more than 1 target for each
reference element. (Answers bug [4683].)
Rationale:
it's
the reference is a unit, because we
should not constrain how schemes are defined. e.g.
maybe my scheme uses multiple <my:uri>
elements, and I specify "the reference resolves to the first <my:uri> that actually resolves". Then even though
there could be many <my:uri>
sub-elements, they are *not* multiple instances of the same scheme and don't
constitute an error if more than one resolves.
There is a question about "when no scheme is recognized, should it be viewed as a null or unresolved reference?" The above gives the answer "unresolved", because this case is distinct from ¡°this reference doesn¡¯t have a value (null)¡±.
Because all schemes used by the same reference element are meant to target the same element, it should then be an error for cases described in 2.3, which should be reported by validators. But for reasons like performance, general consumers may not want to try to resolve all schemes, hence the implementation-defined behavior.
Again because references are units, we don't need to get into details in terms of how new schemes should be defined. In particular:
- Schemes may overlap. That is, the same sub-element/attribute can be used by multiple schemes.
- Schemes can use attributes. (More on this follows.)
The only questions a scheme definition needs to and MUST answer are: (Answers bug [4865].)
¡¤ How is a scheme recognized. That is, there is a set of rules that, when satisfied, identify elements with sml:ref="true" as instances of this scheme. (e.g. for the "uri/iri" scheme, the rule can include: there is one and only one sub-element named <sml:uri>, whose value is xs:anyURI.) That is, either a reference element is an instance of (or an example of, or uses) a scheme, or it is not. ¡°How many instances¡± is never a question.
¡¤ How is a scheme de-referenced. That is, for an element R using this scheme, a deterministic approach is defined to resolve R to a set of target element nodes. This MUST cover all the reference elements recognized as an instance of this scheme.
Because
rules are specified in terms of the reference elements, we don't need to deal
with "what if there are 2 instances of the same scheme in a single
reference R". (Answers bug [4658].)
We may want to add a note to suggest that scheme definitions SHOULD NOT use overlapping elements/attributes without good reasons.
Rationale: again, we don't want to constrain how people want to define their schemes, especially for documents that pre-date SML. All we can require is that given an SML reference element R (with sml:ref="true"), a scheme must be defined in a way that a) consumers can tell whether R uses such a scheme, and b) if so how to resolve R using the scheme to a set of nodes. (The set may contain more than 1 node, which will result in an error because SML references must resolve to a single node, but no one else cares about why there were more than 1.)
¡°Null¡± references are distinct from unresolved references. A null reference is an explicit declaration of intent by the producer that the reference itself does not exist, and a processing directive (NOT a hint) to consumers not to search the reference for reference scheme information items.
Propose
to introduce another attribute, say, sml:nilref,
with a boolean value (following the same rules as sml:ref). This attribute is
only allowed when sml:ref="true"; if sml:nilref 's value is "true", then the reference
is viewed as "null". This is the only case where an reference
element denotes a ¡°null¡± reference. (Answers bug [4884].) In the SML schema,
define a global sml:nilref
attribute with xs:boolean type.
When an reference element is recognized as ¡°null¡±, then consumers MUST not attempt to resolve it. Any scheme-related content (including attributes) recognized by the consumer is ignored for reference-related purposes. The question of whether a null reference is resolved or not is undefined; it is an ill-formed question.
¡°targetRequired¡± should also apply to ¡°null¡± references. That is, it¡¯s an error if targetRequired=true is specified and the corresponding reference element R has sml:nilref=true. (Answers bug [4780/4795].)
Rationale: Null references are different from empty references (e.g. content of the reference element is intentionally empty; this is an explicit declaration of intent from the producer). Empty references are just normal references (whose value happens to be empty).
Using the university example, if each course has a ¡°primary teacher¡± and an ¡°assistant teacher¡±, wouldn¡¯t it be nice for the consumer/application to know ¡°this course doesn¡¯t have an assistant teacher¡± rather than ¡°I (the consumer) don¡¯t understand the way you (the producer) specified the assistant teacher, and I don¡¯t know who (s)he is¡±?
As for recognizing ¡°null¡± references, xsi:nil doesn't work unless we constrain scheme definitions to not use attributes (i.e., to use element content only). Also when schema validation is used, xsi:nil disallows any sub-element or textual content; but there may be cases where a null reference still has non-reference-related content.
This also answers the question "do we support schemes involving attributes"? Because whether a reference is null or not is controlled by sml:nilref, it can cover attributes well (whereas xsi:nil can't). (Answers bug [4682].)
¡°Null¡± should not be a special case to ¡°targetRequired¡±. If we view ¡°nilref¡± as similar to ¡°xsi:nil¡± in schema, then ¡°targetRequired=true¡± is like the opposite of ¡°nillable¡±.
There
are 5 classes of (complex)
types with regard to how they interact with the sml:ref attribute.
1.
Those who are derived from the current sml:refType.
2.
Those who have an explicit reference to the global
attribute declaration with use=required and fixed=true
3.
Those who have an explicit reference to the global
attribute declaration, with either true or false value
4.
Those who allow sml:ref
via an attribute wildcard or an explicit reference to the global attribute
declaration, with either true or false value
5.
Any complex type / any element declaration, including
those who don¡¯t allow sml:ref
with a ¡°true¡± value
SML
processors or applications may choose to do different things for these
different classes (e.g. static analysis, GUI-based tooling, etc.).
There
is no need to define ¡°SML reference type¡±. The only thing SML needs
to define is, for which class(s) of types are sml:acyclic and sml:targetXXX
allowed to be specified.
Discussed during 2007-08 F2F. The WG decided to choose class #5.
Note:
saying that a schema type is an ¡°SML reference type¡± implies that instances of
the type are expected to hold SML references, and reference-specific
constraints can be applied to them.
Options,
from least to most flexible:
6.1.sml:refType
and only types derived from it are viewed as SML reference types
7.5.An XML
Schema complex type is an SML reference type if and only if it includes a
reference to the sml:ref global attribute declaration in its {attribute uses},
with {required} = true and {value constraint} = fixed + ¡°true¡±. This says that for an element
information item to be valid with respect to an SML reference type, it MUST
have sml:ref present with value ¡°true¡±.
That is, it MUST be a reference element.
8.5.all
complex types are SML reference types; whether they really are used as
references is determined at runtime by whether the instance has
sml:ref="true"
Proposal:
option 2 (Answers bug [4673].)
With
option 2 adopted, sml:acyclic and sml:targetXXX attributes can only be
specified on SML reference types or element declarations whose types are SML
reference types. (Answers bug [4834].)
If they are used elsewhere, it is an error. In particular, if a complex type has a
reference to the global sml:ref attribute declaration but the reference either
is optional or doesn¡¯t have a fixed value ¡°true¡±, then it¡¯s not an SML
reference type, and it¡¯s an error to use sml:acyclic or sml:targetXXX with this
complex type.
(Similar to schema disallowing defining new xsi: attributes, see [1]) SML can disallow defining sml: attributes (by user schemas). (This removes the possibility for a schema to define a local sml:ref, which would confuse consumers.)
[1] http://www.w3.org/TR/xmlschema-1/#no-xsi
Suggest to remove the special sml:refType (as it has no special meaning any more).
Rationale:
- Advantages
of #2/#3 over #1: allows you to use your existing types for references (as
schema don't support multiple inheritance or ¡°multi-typing¡± as John¡¯s email
called it)
- Advantages
of #1/#2 over #3: makes it possible to identify reference types at compile time
(a good property for static schema/grammar analysis)
-
Advantages of #1/#2 over #3: removes the possibility where an instance without
an sml:ref attribute or with sml:ref=¡±false¡± and either acyclic or targetXXX
specified. The spec would then have the problem of defining what that means
semantically.
- Advantages
of #2 over #3: it's very unlikely for a schema author to specify
acyclic/targetXXX on a complex type but still not know whether the
corresponding instance will be an SML reference.
- Advantages
of #1/#2 over #3: being able to know which types are "real" reference
types also helps, e.g., GUI-based tooling.Class 1 makes it
impossible to use existing schemas and make existing complex type reference
types (because schema doesn¡¯t support multiple-inheritance). Classes 2~4 all make it hard for the
scenario where there is a base (possibly
abstract) type with SML constraints specified, and sml:ref attribute is only
allowed by types derived from this base type.
When an
element is not a reference element (i.e. doesn¡¯t have sml:ref=true), how should it be handled in
reference-related cases? There are 3 scenarios:
1.
targetXXX
constraints
Discussed during 2007-08
F2F; WG decided to make
them trivially ¡°satisfied¡± for non-references. That
is, they are enforced when and only when the elements have sml:ref=true.
2.
deref() (e.g. in
identity constraints)
Should those non-refs be
passed to deref()? Or an error should be
issue and deref() is not called?
During the 2007-08 F2F, WG
decided to answer ¡°no error; deref() returns unresolved¡±.
3.
Schema default value for sml:ref
Then it¡¯s possible that
in the input infoset there is no sml:ref,
but in PSVI there is sml:ref=true. Should
it be viewed as a reference element?
During the 2007-08 F2F,
some members were worried about potential inter-op problems between performing
and not performing schema assessment; others are not as worried, because such problem
already exists between processors who support different reference schemes. Some other members also see the value in
viewing defaulted sml:ref=true
as references. This makes it
possible to transform existing XML data into SML models,
without having to modify all the instances and add sml:ref attributes to them.
This issue was not
resolved during the F2F.
Rationale:
to expand a little on item #1. Each element in the
instance (whose corresponding element declaration has any one of the above
constraints) can be in one of 4 categories:
¡¤
non-reference
¡¤
null reference
¡¤
unresolved reference
¡¤
resolved reference
And for
the 4 constraints, we have the following 4 x 4
matrix applies:
Reference\Constraint |
Acyclic |
targetRequired |
targetElement |
targetType |
Non-reference |
Satisfied |
|
|
|
Null |
Satisfied |
Violated |
Violated |
Violated |
Unresolved |
Satisfied |
Violated |
Violated |
Violated |
Resolved |
Check |
Satisfied |
Check |
Check |
There
are 2 options for the ¡°non-reference¡± row for the targetXXX
constraints:
a)
¡°violated¡±, because targetXXX means ¡°it must be a reference and its target must
¡¡±
b)
¡°satisfied¡±, because targetXXX means ¡°if it is a refernce
then its target must ¡¡±.
WG picked answer
(b), because for those who want #1
answer, they could have had use=required and fixed=true.