- From: Laufer <laufer@globo.com>
- Date: Thu, 22 Jan 2015 06:53:51 -0200
- To: Antoine Isaac <aisaac@few.vu.nl>
- Cc: "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
- Message-ID: <CA+pXJigzFeLfRmu_tr2+p5FKaYMXAKF6otT=tOe79exdsq+P9g@mail.gmail.com>
+1 to Antoine Best Regards, Laufer Em quinta-feira, 22 de janeiro de 2015, Antoine Isaac <aisaac@few.vu.nl> escreveu: > Dear Joao Paolo, Carlos, > > I agree with your concerns. This have been voiced many times, and the > 'technology neutral' focus does make it more visible, but, I think doesn't > change much. The situation is a bit messy in the Linked Data world alone. > > I won't discuss data formats now, because that's not the point of 7.4 (it > may be very useful to have the discussion for other sections though; I just > don't have the time). > > My issue about what Joao Paolo describes as 'schemas' (and what suggests > to call 'data models') is that it misses a part of what is called > 'vocabularies' in the Linked Data world (and in other communities). Using > the same point as in an earlier email: do you think that the ISO language > codes are a schema (or a data model) of their own? > > In a previous group I was involved, on Library Linked Data, we faced a > similar problem of naming things. We ended up with 'metadata element sets' > for schemas/ontologies and 'controlled vocabularies' for thesauri, code > lists etc. > http://www.w3.org/2005/Incubator/lld/XGR-lld-vocabdataset/ > Note that we were facing then the need of being a bit more technology > neutral: these 'controlled vocabularies' have existed way before RDF > (porting them into RDF was actually why the SKOS 'schema' was created). > > Now, we may decide to rule 'vocabularies that don't qualify as data > models' (like ISO language codes) from the best practices. I find it a bit > a pity, because these are valuable artefacts, as the past decade on Linked > Data has shown. And our current best practices apply to them, too. > > Back to the BP document now. From my past experience, we won't have time > to fix this in two days. There are much easier and urgent issues to fix - > *once* we have noted this vocabulary issue down for future resolution of > course. > Also, and because I've seen these discussions before, we probably won't > find a good solution, i.e. we'll always have to exemplify the term we > chose, as in "this section is about 'X', which gathers ontologies, schemas, > relational models, etc". > > So what I suggest is to create an issue saying the the section needs > terminological discussion and input, and maybe go as far as removing the > 'controlled vocabularies' from the picture. Is it alright? > > Best, > > Antoine > > On 1/22/15 5:01 AM, Joao Paulo Almeida wrote: > >> Dear All, >> >> I think that we have reached a crucial point in the discussions around >> the Best Practices document. >> >> Many have raised the concern that the term "vocabulary" may be a problem >> in the document, in part because of its lack of precision and in part >> because it is biased towards the RDF(S)/OWL(S) technological space. >> >> I completely agree with that, and we need to do our best to ensure >> precision and to be agnostic with respect to the various technological >> spaces. >> >> The problem has also appeared in the discussion surrounding the term >> "format", which I also believe is problematic if not properly defined and >> qualified. (and also the term "schema" and the other terms used in section >> 7.4 of the BP document). >> >> So, this is a call for the group to settle on some concepts (and >> ultimately terms) that should help us to structure our discussions, give >> us a basis to communicate and help our audience to understand us. >> >> I offer here a sketchy initial attempt; I'm hoping (fingers crossed) not >> to incite a terminological debate, but a conceptual one... As long as we >> agree on the concepts, we can always adjust the terms to make this more >> intuitive to the majority of the people in our audience. >> >> Some of it is inspired in [1] to avoid re-inventing the wheel. (I wanted >> to, but did not manage to touch upon the "metadata" and "ontology" terms. I >> also did not manage to link OWL and SKOS into this.) And remember, this is >> just a starting point. >> >> regards, >> João Paulo >> >> ---- >> >> By "data representation" we mean any convention for the arrangement of >> symbols in such a way as to enable information to be encoded by a data >> producer and later decoded by data consumers. >> >> A particular convention for data representation is often referred to as a >> "data format". >> >> Adapted from [1]: >> >> In existing computer systems there is typically a long chain of >> relations connecting the physical phenomena by which data are represented >> with the data being represented. Each link in the chain connects two layers >> of representation: each layer organizes information available at the next >> lower level into structures at a higher (or at least different) layer of >> abstraction, and in this way provides information used in turn by the next >> higher level in the representation. >> >> For example, the representation of an email message may involve the >> following layers: >> >> Physical layer: holes in cards or tape, magnetic charges, color >> changes on optical disks or scan codes, tones on a telephone connection, or >> similar phenomena are interpreted as representing sequences of bits. >> >> Bit layer: those sequences of bits may be interpreted as >> representations of other different sequences of bits (for example five bits >> may be written to the physical medium to represent four bits of data, in >> such a way as to guarantee a minimum and maximum amount of space between >> magnetic flux events in the media). >> >> Byte / octet layer: the sequences of bits read from the storage >> device are grouped into octets: units of eight bits often referred to as >> bytes. >> >> Character layer: an octet sequence may be interpreted as a sequence >> of characters as defined by the appropriate character-set standard. >> >> Application-specific data structure layer: the email reader will read >> the character stream and distinguish the mail header from the message body, >> and may distinguish multiple alternative representations of the message and >> attachments within the message body. Within the mail header, mail software >> will distinguish important fields like date, sender, and addressee. >> >> >> We assume here that this group is concerned with data representation >> beyond the octet layer, not concerning itself with "data formats" for >> physical, bit and byte level data representation. Data representation at >> the lowest level in this context is thus the octet sequence. >> >> Different applications will almost always have different >> application-specific data structures. The variety of applications and uses >> of the data on the web leads to an unbounded number of data formats. >> >> The need to support the definition of suitable data formats for data >> interchange on the web has led to the development of languages and >> frameworks for families of formats, examples of which include XML, SGML, >> JSON and RDF. >> >> (Here it is important to note that we should avoid saying that data is >> represented in XML or RDF - but instead, we should say that data is >> represented in an XML-based format, or in an RDF-based format. So XML data >> is data represented in an XML-based format, RDF data is data represented in >> an RDF-based format.) >> >> These languages and frameworks ultimately establish conventions to >> encode data into sequences of octets. These conventions are often called >> "serialization formats" or "serialization syntaxes" (e.g., [TURTLE], >> [RDF11-XML], [JSON-LD]). In addition, these languages often establish a >> "data model" or "abstract syntax" (e.g., [RDF11-CONCEPTS]) which define the >> structure of data independent of a particular serialization format. >> >> Some of these families of formats are accompanied by languages or >> (meta-)formats to specify a format, to enable some level of automation for >> processing data in the format. For example, an XML-based format can be >> specified with a "schema document" in the XML Schema Definition language, >> enabling XML documents to be checked for conformance to the format defined >> in the schema document [XML-SCHEMA]. Likewise, an RDF-based format can be >> specified using RDF Schema [RDF11-SCHEMA]. >> >> These "schemas" are often used as a means to anchor natural language >> descriptions to guide humans in the interpretation of data produced using >> the format. Often, labels are used in these schemas to convey intuitive >> meaning and guide interpretation, in which case these labels serve the role >> of "terms" in communication. The collection of terms as used in the schema >> is then referred to as a "vocabulary". >> >> >> Some requirements (adapted from [1]): >> >> Any data representation relied on for interoperability must have clear, >> well written, published documentation. If the format is not documented, the >> likelihood that the information it represents can be recovered without loss >> is small. >> >> The specification documents for data formats should be controlled by >> public bodies, preferably consensus-based organizations in the >> international standardization system or by relevant industry consortia. >> >> >> [1] C. M. Sperberg-McQueen, . David Dubin. "Data Representation", DH >> Curation Guide: a community resource guide to data curation in the digital >> humanities, http://guide.dhcuration.org/representation/. >> >> [RDF11-SCHEMA] >> Dan Brickley, R. V. Guha. RDF Schema 1.1. W3C Recommendation, 25 February >> 2014. URL: http://www.w3.org/TR/2014/REC-rdf-schema-20140225/. The >> latest published version is available at http://www.w3.org/TR/rdf-schema/ >> . >> >> [RDF11-XML] >> Fabien Gandon, Guus Schreiber. RDF 1.1 XML Syntax. W3C Recommendation, 25 >> February 2014. URL: http://www.w3.org/TR/2014/REC- >> rdf-syntax-grammar-20140225/. The latest published version is available >> at http://www.w3.org/TR/rdf-syntax-grammar/. >> >> [TURTLE] >> Eric Prud'hommeaux, Gavin Carothers. RDF 1.1 Turtle: Terse RDF Triple >> Language. W3C Recommendation, 25 February 2014. URL: >> http://www.w3.org/TR/2014/REC-turtle-20140225/. The latest edition is >> available at http://www.w3.org/TR/turtle/ >> >> [OWL2-OVERVIEW] >> W3C OWL Working Group. OWL 2 Web Ontology Language Document Overview >> (Second Edition). 11 December 2012. W3C Recommendation. URL: >> http://www.w3.org/TR/owl2-overview/ >> >> [JSON-LD] >> Manu Sporny, Gregg Kellogg, Markus Lanthaler, Editors. JSON-LD 1.0. 16 >> January 2014. W3C Recommendation. URL: http://www.w3.org/TR/json-ld/ >> >> [XML-SCHEMA] >> XML Schema: Primer >> World Wide Web Consortium. XML Schema Part 0: Primer Second Edition, ed. >> Priscilla Walmsley and and David C. Fallside.W3C Recommendation 28 October >> 2004. See http://www.w3.org/TR/xmlschema-0/ >> > > -- . . . .. . . . . . .. . .. .
Received on Thursday, 22 January 2015 08:54:24 UTC