SiRPAC bugfixes

Hi,

while testing the SiRPAC parser I encountered two cases where the parser
produced weird results. These are listed below, together with suggested
bugfixes.

Case one: The parser simply swallows empty container elements:

<rdf:Alt ID="foo">
    <rdf:li/>
    <rdf:li>bar<rdf:li>
</rdf:Alt>

becomes
foo , rdf:type , rdf:Alt
foo , rdf:_2 , "bar"

without the expected
foo , rdf:_1 , ""

This can be fixed by adding a special case handler to processListItem, in
the else branch roughly at line 1760. Put the following before the while loop:

if (!e.hasMoreElements())
  addTriple (createResource(RDFMS+"_"+iCounter),
	     createResource(sID),
	     createLiteral(""));
 
This makes sure something is done for empty list elements too.


Case two: The parser treats empty properties as resources instead of literals:

<rdf:Description ID="foo">
    <x:empty/>
    <x:empty></x:empty>
</rdf:Description>

becomes
foo , x:empty , #genid1
foo , x:empty , #genid2

while it should be
foo , x:empty , ""
foo , x:empty , ""

This one is more complicated. The problem lies in processPredicate, where the
comment at line 1566 says 

/**
* Before looping through the children, let's check
* if there are any. If not, the value of the predicate is
* an anonymous node
*/
 
I think this is wrong in most cases. This condition is results in a triple
with a #genid, see line 1580:

addTriple (createResource(predicate.name()),
	   createResource(sTarget),
	   createResource(sObject));

If you change the createResource(sObject) to createLiteral(""), it works as
expected. However, it then also applies to this strange case:

<rdf:Description ID="foo">
    <x:empty ID="bar"/>
</rdf:Description>

While this is legitimate, I am not too sure what it is supposed to mean;
I assume a reification with an empty object or something like that. You can
identify this by checking (sStatementID != null), but as I said I am not sure
how to handle this...

Maybe the empty peroperties should already be recognized in an earlier parsing
state, such as treating them as parseLiteral in endElement. This actually
works if you force it as in

<rdf:Description ID="foo">
    <x:empty rdf:parseType="Literal"/>
</rdf:Description>
 
You also could change the parseLiteral method to return true if there are no
non-rdf/xml attributes on the stack, but since this method is used so often
in the code I am not sure of the implications.

I hope this was some help!

Regards,
Karsten Otto

Received on Friday, 23 February 2001 18:03:21 UTC