- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Fri, 28 Oct 2011 10:48:17 +0100
- To: public-rdf-wg@w3.org
Skolemization is certainly one way to solve the problem. It's a stronger condition than the use case is asking for (clustering within a single result set) but it's a fairly likely next request to have bNode references that can be used in a subsequent query (e.g. RDF lists) or that the sorting is the same for two separate queries. Other solutions: 1/ Use GROUP BY ?vcard ?adr ?pred ?obj This clusters but does not sort. It is legal, strict SPARQL 1.1 2/ Implementations may extend "<" to define a stable ordering 3/ Implementations may extend ORDER BY to define a stable ordering 4/ (SPARQL 1.1) extend URI or STR to return something that labels the bNodes. Apache Jena provides stable ordering of sorted results (2) - any ORDER BY is stable within and across requests. Bnodes are ordered using the internal identifier. Jena also provides (4). (4) isn't the skolemization scheme yet RDF-WG is proposing - the extension pre-dates this WG - and it can be used to round-trip bNode references. (Steve - it's good to hear that the design works for 4Store). The fact it changes Sesame actually makes it harder for a SPARQL change. The SPARQL-WG charter is strongly worded against making changes that alter SPARQL 1.0 queries. Has there been discussion on the Sesame lists as to a change here? What we-all must be aware of is slipping into defining "subsets of RDF". Skolemization means that there is one approach, for SPARQL and for API use. Andy On 28/10/11 09:35, Steve Harris wrote: > It's really an inevitable consequence of the (silly IMHO) way in which > blank nodes are defined in RDF, how can you define a stable ordering on > existential variables? > > However bNode skolemisation is one solution to this issue, as it > provides a stable URI identifier for bNodes which has an order defined > by http://www.w3.org/TR/rdf-sparql-query/#modOrderBy > > Incidentally, I have (mostly) implemented bNode skolemistion in 4store, > it was about a days work so far, but quite complex. Also the skolem > constant URIs are pretty unwieldy, compared to the non-standard hack we > were using before, but I think it's a worthwhile cost for > interoperability and safety. > > A typical 4store skolem URI looks like > http://4store.org/.well-known/genid/0F1BAE7E-B38C-4556-813E-342B60693BD0/10420f0000000041 > > It could be made shorter using e.g. base64 encoding, rather than hex, > but the UUID in the skolem constant is the UUID of the store, which is > externalised in other ways. > > Has any consensus been arrived at on how to signal that some URI is a > skolemised bNode in RDF yet? e.g. a class that skolem constant URIs > belong to? > > - Steve > > On 2011-10-27, at 22:39, David Wood wrote: > >> Hi all, >> >> FYI. This is a real-world use case worth considering as we discuss >> blank nodes. Although it is mostly a SPARQL issue, I felt this group >> should be aware of the discussion. >> >> Regards, >> Dave >> >> >> >> >> Begin forwarded message: >> >>> *From: *James Leigh <james@3roundstones.com >>> <mailto:james@3roundstones.com>> >>> *Subject: **Blank Node Ordering* >>> *Date: *October 27, 2011 10:05:30 EDT >>> *To: *public-rdf-dawg-comments@w3.org >>> <mailto:public-rdf-dawg-comments@w3.org> >>> *Cc: *David Wood <david@3roundstones.com <mailto:david@3roundstones.com>> >>> >>> Hello, >>> >>> We recently ran into some unexpected behaviour that we want to bring to >>> this groups attention regarding the ORDER BY clause. >>> >>> When ordering RDF literals and URIs, the same literal or the same URI >>> will always be arranged together. However, there is no guarantee with >>> blank nodes that the same blank nodes will be arranged together. >>> >>> The following SPARQL query lists all the vcards addresses in the default >>> graph along with their properties. A single address is represented in >>> multiple result bindings, one for each property in the data store. >>> >>> SELECT ?card ?adr ?pred ?obj { >>> ?card a vcard:VCard; vcard:adr ?adr . >>> ?adr ?pred ?obj . >>> } ORDER BY ?vcard ?adr ?pred >>> >>> The (author's) expected result is to have all results bindings ordered >>> first by the vcard they belong to and if there are multiple addresses on >>> the vcard, each address property is ordered together. >>> >>> For example the follow bindings sets are a valid result set. Notice that >>> the entire home address comes before any of the work address properties. >>> This order is predictable because of the ORDER BY clause in the query >>> above. >>> >>> vcard=<me>, adr=<me#home>, pred=vcard:country-name, obj="Australia" >>> vcard=<me>, adr=<me#home>, pred=vcard:locality, obj="WonderCity" >>> vcard=<me>, adr=<me#home>, pred=vcard:postal-code, obj="5555" >>> vcard=<me>, adr=<me#home>, pred=vcard:street-address, obj="111 Lake >>> Drive" >>> vcard=<me>, adr=<me#work>, pred=vcard:country-name, obj="Australia" >>> vcard=<me>, adr=<me#work>, pred=vcard:locality, obj="WonderCity" >>> vcard=<me>, adr=<me#work>, pred=vcard:postal-code, obj="5555" >>> vcard=<me>, adr=<me#work>, pred=vcard:street-address, obj="33 Enterprise >>> Drive" >>> >>> However, it would be incorrect (in SPARQL 1.0 and SPARQL 1.1 draft) for >>> the author to assume the addresses will always be ordered together like >>> this. >>> >>> Consider the result set if blank nodes were used for the address node. >>> The result might look like the one below. >>> >>> vcard=<me>, adr=_:b1, pred=vcard:locality, obj="WonderCity" >>> vcard=<me>, adr=_:b1, pred=vcard:street-address, obj="111 Lake Drive" >>> vcard=<me>, adr=_:b2, pred=vcard:street-address, obj="33 Enterprise >>> Drive" >>> vcard=<me>, adr=_:b2, pred=vcard:country-name, obj="Australia" >>> vcard=<me>, adr=_:b1, pred=vcard:country-name, obj="Australia" >>> vcard=<me>, adr=_:b2, pred=vcard:postal-code, obj="5555" >>> vcard=<me>, adr=_:b1, pred=vcard:postal-code, obj="5555" >>> vcard=<me>, adr=_:b2, pred=vcard:locality, obj="WonderCity" >>> >>> Although each result of a vcard is ordered together, because it is a >>> URI, the ordering of the adr blank nodes looks random and is >>> unpredictable. Sesame 2.x is implemented to appear to randomly arrange >>> blank node results when ordering by blank nodes as shown above. When the >>> data used contains blank node there is no way to control the ordering. >>> >>> The author would expect that _:b1 is ordered before or after _:b2, but >>> the author would not expect that _:b1 is mixed among _:b2. Although, >>> there is no order between _:b1 and _:b2, SPARQL should provide guidance >>> on how to arrange blank nodes. >>> >>> Many people still use blank nodes and this issue causes unexpected >>> results for SPARQL users. >>> >>> My colleagues and I propose that the group seriously consider adding a >>> restriction to ORDER BY in SPARQL 1.1 that will ensure ordering of any >>> RDF term will guarantee that same terms are arranged together. >>> >>> Although, an order among different blank nodes could not be fixed. >>> SPARQL should fix the same RDF terms to be ordered together. >>> >>> Thanks, >>> James >>> >> > > -- > Steve Harris, CTO, Garlik Limited > 1-3 Halford Road, Richmond, TW10 6AW, UK > +44 20 8439 8203 http://www.garlik.com/ > Registered in England and Wales 535 7233 VAT # 849 0517 11 > Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD >
Received on Friday, 28 October 2011 09:48:53 UTC