- From: Mikel Egaña Aranguren <mikel.egana.aranguren@gmail.com>
- Date: Fri, 10 Feb 2017 17:25:11 +0100
- To: public-lod <public-lod@w3.org>
- Message-ID: <CABf_9zJgjquHtwNEBirqLak5TbiGwZrge0XgkpCOrFOVS2wS2Q@mail.gmail.com>
Hi all; We have been hired by the Basque Government [1] to help them "enter the Linked Data world", through a 90.000 EUR pilot project [2]. I have a question about multilingual content, but any thoughts on our first approach, presented bellow, are wellcome. The project comprises two different but overlapping areas: *1.- Web Content * Add Schema+JSON-LD data to current Web pages, describing the content of the pages. For example, add something like the following snippet (with more data) to the Web document about the President (Note the different URIs; see explanation bellow): [http://gida.irekia.euskadi.eus/eu/people/8080] { "@context": "http://schema.org", "@type": "Person", "@id":"http://euskadi.eus/id/inigo-urkullu", "email":"gab-lehendak@euskadi.eus", "telephone":945017900, "mainEntityOfPage": { "@type": "WebPage", "@id": "http://euskadi.eus/page/inigo-urkullu" } } *2.- Open Data* Convert some of the Dataset of the current Open Data Portal ( http://opendata.euskadi.eus) to RDF and publish as Linked Data. These datasets may also refer to entities from Web content: for example, the president might appear in a CSV staff list in the Open Data portal. The overlapping part is a further requirement to migrate some of the content from the current URL (Document) based system to a URI (Resource) based system. Thus, for example, the current web for the President ( http://gida.irekia.euskadi.eus/eu/people/8080) should become something like http://euskadi.eus/id/inigo-urkullu (and make it persistent and the rest of best practices for URIs). The RDF representation of that URI will contain RDF from both the Open Data portal (e.g. staff related data) and from the JSON-LD of the Web page (e.g. email), hence the URIs in the JSON-LD above. In terms of content negotiation, the situation is a bit more complex than the usual Linked Data setting I'm acquainted with. For datasets from the Open Data portal that lack already existing Web content, there is no problem: 303 redirections will provide the usual. For example, for the URI of a sensor that meassures air quality, we would have something like this (Remember that there was no prior web page describing the sensor): http://euskadi.eus/id/sensor-1 [Resource identifier of the entity "sensor 1"] 303 http://euskadi.eus/data/sensor-1 [RDF data about the sensor] 303 http://euskadi.eus/doc/sensor-1 [An HTML, "ugly" rendering of the RDF data, a la DBPedia] For content that already existed in the Web, like the president, the process is a bit more convoluted: http://euskadi.eus/id/inigo-urkullu [Resource identifier of the entity "Iñigo Urkullu"] 303 http://euskadi.eus/data/inigo-urkullu [RDF data about the president, including both data from Open Data and data from the Web content, via JSON-LD] 303 http://euskadi.eus/page/inigo-urkullu [A nice HTML page containing some of the RDF data, in JSON-LD, and other, pure web content, non existing as data] The page http://euskadi.eus/page/inigo-urkullu has two HTML links, with appropiate icons, pointing at: http://euskadi.eus/data/inigo-urkullu [RDF data about the president, including both data from Open Data and data from the Web content, via JSON-LD, as already described] http://euskadi.eus/doc/inigo-urkullu [An HTML, "ugly" rendering of all the RDF data about the president, a la DBPedia] When an HTML representation from the ID is requested, content is filtered according to the schema:mainEntityOfPage predicate: if the predicate exists (the president), the fancy web page (/page/) is provided via 303, otherwise (the sensor, there is no schema:mainEntityOfPage predicate, there was no "prior" web page) the "ugly" web page is provided (/doc/). Web content (the president) is of high quaility and very linkable, and that's why we want to include it in the Triple Store, via JSON-LD, to have some "anchor" entities in the data, with a lot of links. (The JSON-LD is programmatically created by the current content manager software). This a very preliminary sketchy architecture and thoughts are wellcome about it, but my question is about the URIs themselves: there are two official languages in Basque Country (Spanish and Basque) and the same content is usually duplicated (or even triplicated, including english sometimes): - Web pages: http://gida.irekia.euskadi.eus/eu/people/8080 http://gida.irekia.euskadi.eus/es/people/8080 - Datasets: http://opendata.euskadi.eus/katalogoa/-/2015eko-igorpen-eta-jatorri-kutsagarrien-euskal-erregistroa-eper-e-prtr/ http://opendata.euskadi.eus/catalogo/-/registro-vasco-de-emisiones-y-fuentes-contaminantes-del-2015-eper-euskadi-e-prtr/ Therefore the easiest solution would be to mint URIs according to language, like DBPedia. Thus the president would have two URIs: http://euskadi.eus/id/es/inigo-urkullu ("Spanish" president) http://euskadi.eus/id/eu/inigo-urkullu ("Basque" president) Both resources would have to be related via owl:sameAs in the Triple Store [3]. The advantage of this is that one can follow the current division when it comes to converting data to RDF. However, my gut feeling is that I should go for a "pure Linked Data" solution, mint a unique id ( http://euskadi.eus/id/inigo-urkullu) and use RDF @es and @eu in triples for content in different languages. The latter solution implies that the content negotiation above should include language content negotiation, which I don't know if it is widespread, and other side effects. So I'm more inclined for a URI for each language, because it is the easiest, but I still would like to hear any thoughts on the "One URI - different rdfs:labels" solution, before completely ignoring it. Thanks! Regards [1] https://en.wikipedia.org/wiki/Basque_Government [2] http://www.contratacion.euskadi.eus/w32-1084/es/contenidos/anuncio_ contratacion/expx74j21656/es_doc/es_arch_expx74j21656.html [3] Stardog and GraphDB (and porbably others) implement special methods for owl:sameAs "inference" in SPARQL queries, to make queries efficient. -- Mikel Egaña Aranguren, Ph.D. https://mikel-egana-aranguren.github.io
Received on Friday, 10 February 2017 16:25:47 UTC