Re: XInclude in XML Calabash

"Henry S. Thompson" <ht@inf.ed.ac.uk> writes:
> Norman Walsh <ndw@nwalsh.com> writes:
>> Correct behavior for XInclude is that nested XInclude elements are expanded
>> before evaluating fragids. So you can say
>>
>>   <xi:include href="book.xml" xpointer="id(chapter1)"/>
>>
>> and it works even if the xml:id ‘chapter1’ is “behind” an xi:include
>> element in book.xml.
>
> Yes, and our old stand-off markup approach to managing overlapping
> markup for annotated linguistic data depends on this.

Indeed. That’s clearly the expected behavior and my implementation
just got it wrong.

>> Unfortunately, fixing that bug has a consequence. Attribute types
>> defined by DTD validation are lost in the expanded document.
>
> Why?  [attribute type] is an infoset property, should be preserved in
> the transcluded bit, shouldn't it?

The answer to why is simply that the Saxonica APIs don’t make it
easy.

>> I don’t know how to fix this. It’s clear from the Saxonica API docs
>> that NodeInfo.getSchemaType() doesn’t return types declared with DTD
>> fragments.
>
> Is there no getAttributeType()?

AFAICT, the closest thing to getAttributeType() is getSchemaType().

>> In theory, I could use NodeInfo.getTypeAnnotation() to find
>> out, but the values I get back from that API don’t have the
>> IS_DTD_TYPE bit set even when the type comes from the DTD.
>
> Do you mean, so getTypeAnnotation() _should_ work?

Well…the JavaDoc says:

/**
 * Get the type annotation of this node, if any. The type
 * annotation is represented as an integer; this is the
 * fingerprint of the name of the type, as defined in the name
 * pool. Anonymous types are given a system-defined name. The
 * value of the type annotation can be used to retrieve the actual
 * schema type definition using the method {@link
 * Configuration#getSchemaType}. 
 *
 * The bit IS_DTD_TYPE (1<<30) will be set in the case of an attribute
 * node if the type annotation is one of ID, IDREF, or IDREFS and this
 * is derived from DTD rather than schema validation.
 *
 * @return the type annotation of the node, under the mask
 *         NamePool.FP_MASK, and optionally the bit setting
 *         IS_DTD_TYPE in the case of a DTD-derived ID or IDREF/S
 *         type (which is treated as untypedAtomic for the
 *         purposes of obtaining the typed value).
 *
 *         For elements and attributes, this is the type annotation as
 *         defined in XDM. For document nodes, it should be one of
 *         XS_UNTYPED if the document has not been validated, or
 *         XS_ANY_TYPE if validation has taken place (that is, if any
 *         node in the document has an annotation other than Untyped
 *         or UntypedAtomic).
 *
 * @since 8.4. Refinement for document nodes introduced in 9.2
 */

I’m not sure how useful it actually is to know that an attribute was
of type ID or IDREF(S) but not which one. In any event, when I poked
at this with the debugger, the IS_DTD_TYPE bit was not set.

>> (All of this despite the fact that in the parsed document, before I
>> copy it, the XPath id() function does work.)
>
> So we need to ask Michael what that's exploiting?

I suppose, since you’re making me feel guilty for trying to just
ignore it :-)

>> I suppose I should try construct a test case and report the bug but
>> that’s not going to be useful today. And even if I could get the DTD
>> type, it’s entirely unclear that I could construct a new tree with
>> that type, so I’m not sure it’d help.
>
> But the DTD type should travel with the transcluded bit, shouldn't it?

Yes it should. Whether it *can* or not in the Saxon APIs is a
different question. A question compounded by the fact that my code
uses the 9.6 APIs and not (yet) the 9.7 APIs.

>> On the whole, I think the best choice is to fix the “nested includes”
>> bug and just accept that DTD-based ID attribute types won’t work.
>> But how painful is that going to be for users, I wonder?
>
> In principle, quite serious.  I don't know how much use that approach to
> overlap is getting these days in practice.

Right. So I’ll start with a simple message to the Saxonica list to see
if they believe that what I need to do is possible in principle with
the APIs I have available.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 512 761 6676
www.marklogic.com

Received on Sunday, 17 April 2016 19:31:38 UTC