Re: XInclude in XML Calabash

"Henry S. Thompson" <> writes:
> Norman Walsh <> writes:
>> Correct behavior for XInclude is that nested XInclude elements are expanded
>> before evaluating fragids. So you can say
>>   <xi:include href="book.xml" xpointer="id(chapter1)"/>
>> and it works even if the xml:id ‘chapter1’ is “behind” an xi:include
>> element in book.xml.
> Yes, and our old stand-off markup approach to managing overlapping
> markup for annotated linguistic data depends on this.

Indeed. That’s clearly the expected behavior and my implementation
just got it wrong.

>> Unfortunately, fixing that bug has a consequence. Attribute types
>> defined by DTD validation are lost in the expanded document.
> Why?  [attribute type] is an infoset property, should be preserved in
> the transcluded bit, shouldn't it?

The answer to why is simply that the Saxonica APIs don’t make it

>> I don’t know how to fix this. It’s clear from the Saxonica API docs
>> that NodeInfo.getSchemaType() doesn’t return types declared with DTD
>> fragments.
> Is there no getAttributeType()?

AFAICT, the closest thing to getAttributeType() is getSchemaType().

>> In theory, I could use NodeInfo.getTypeAnnotation() to find
>> out, but the values I get back from that API don’t have the
>> IS_DTD_TYPE bit set even when the type comes from the DTD.
> Do you mean, so getTypeAnnotation() _should_ work?

Well…the JavaDoc says:

 * Get the type annotation of this node, if any. The type
 * annotation is represented as an integer; this is the
 * fingerprint of the name of the type, as defined in the name
 * pool. Anonymous types are given a system-defined name. The
 * value of the type annotation can be used to retrieve the actual
 * schema type definition using the method {@link
 * Configuration#getSchemaType}. 
 * The bit IS_DTD_TYPE (1<<30) will be set in the case of an attribute
 * node if the type annotation is one of ID, IDREF, or IDREFS and this
 * is derived from DTD rather than schema validation.
 * @return the type annotation of the node, under the mask
 *         NamePool.FP_MASK, and optionally the bit setting
 *         IS_DTD_TYPE in the case of a DTD-derived ID or IDREF/S
 *         type (which is treated as untypedAtomic for the
 *         purposes of obtaining the typed value).
 *         For elements and attributes, this is the type annotation as
 *         defined in XDM. For document nodes, it should be one of
 *         XS_UNTYPED if the document has not been validated, or
 *         XS_ANY_TYPE if validation has taken place (that is, if any
 *         node in the document has an annotation other than Untyped
 *         or UntypedAtomic).
 * @since 8.4. Refinement for document nodes introduced in 9.2

I’m not sure how useful it actually is to know that an attribute was
of type ID or IDREF(S) but not which one. In any event, when I poked
at this with the debugger, the IS_DTD_TYPE bit was not set.

>> (All of this despite the fact that in the parsed document, before I
>> copy it, the XPath id() function does work.)
> So we need to ask Michael what that's exploiting?

I suppose, since you’re making me feel guilty for trying to just
ignore it :-)

>> I suppose I should try construct a test case and report the bug but
>> that’s not going to be useful today. And even if I could get the DTD
>> type, it’s entirely unclear that I could construct a new tree with
>> that type, so I’m not sure it’d help.
> But the DTD type should travel with the transcluded bit, shouldn't it?

Yes it should. Whether it *can* or not in the Saxon APIs is a
different question. A question compounded by the fact that my code
uses the 9.6 APIs and not (yet) the 9.7 APIs.

>> On the whole, I think the best choice is to fix the “nested includes”
>> bug and just accept that DTD-based ID attribute types won’t work.
>> But how painful is that going to be for users, I wonder?
> In principle, quite serious.  I don't know how much use that approach to
> overlap is getting these days in practice.

Right. So I’ll start with a simple message to the Saxonica list to see
if they believe that what I need to do is possible in principle with
the APIs I have available.

                                        Be seeing you,

Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 512 761 6676

Received on Sunday, 17 April 2016 19:31:38 UTC