Re: Using HDF5 to store ontology instances

Hi,

Interesting case.

> The schema is roughly a Directed Acyclic Graph

This is true for the organization of objects within an HDF5 file, i.e. the datasets. The actual dataset contents must conform to tabular structures. There is a noticeable overhead in the definition of a dataset, so it is worthwhile to consider storing data as part of the tabular structure rather than distributing individual triples over separate datasets.

There are ontologies where both Linked Data representations, as well as binary HDF5 representations exist. In the construction domain the recent IfcOWL (Pauwels & Terkaj 2016) is gaining traction. We have proposed a binary serialization format (Krijnen & Beetz 2016) for the same ontology (the IFC EXPRESS schema) roughly following ISO 10303-26. The hierarchical structuring and the ability to do random reading and seeking (due to known fixed length records) lead to fast retrieval of data. The self-describing nature of HDF5 makes project specific extensions of heterogeneous data quite appealing.

My impression is that the success of your implementation in HDF5 depends to a large extent on whether your data is homogeneous and consistent enough to create clusters of predicates into large tabular structures. Storing data as raw triples might result in a vast amount of self joins, which makes retrieval of specific subgraphs less efficient.

---------------------------------

Pieter Pauwels, Walter Terkaj (2016) EXPRESS to OWL for construction industry: Towards a recommendable and usable ifcOWL ontology, Automation in Construction, Volume 63, March , Pages 100-133, ISSN 0926-5805, http://dx.doi.org/10.1016/j.autcon.2015.12.003. http://www.sciencedirect.com/science/article/pii/S0926580515002435

Thomas Krijnen, Jakob Beetz (2016) Efficient binary serialization of IFC models using HDF5. In ICCCBE2016: 16th
International Conference on Computing in Civil and Building Engineering, Osaka, July 6-8, 2016.  https://pure.tue.nl/ws/files/28331562/icccbe_hdf5_krijnen_beetz.pdf https://speakerdeck.com/aothms/efficient-binary-serialization-of-ifc-models-using-hdf5

---------------------------------

Hope this is somehow helpful,

Kind regards,
Thomas

Received on Monday, 10 October 2016 10:38:18 UTC