W3C home > Mailing lists > Public > public-lod@w3.org > June 2013

Re: Fwd: Linked Data discussions require better communication

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Fri, 21 Jun 2013 11:38:22 -0400
Message-ID: <51C4736E.5030604@openlinksw.com>
To: public-lod@w3.org
On 6/21/13 11:24 AM, Stephane Fellah wrote:
>
>
>
> Kingsley,
>
> Context reminder: I am trying to clarify the definition of Linked Data 
> and argue for the need of URIs, HTTP and RDF Model to create Linked 
> Data (as defined by TBL). I am trying to illustrate what is Linked 
> Data and what is not Lined Data through the use of examples.

Context alignment: what makes [1] Linked Data, RDF, or RDF based Linked 
Data or something else? This is a very simple example. It reflect the 
fundamental point re. Linked Data and RDF.

Again, what makes my sample document *uniquely* RDF? By that I mean why 
is it certainly RDF but not some other mechanism for structured data 
representation such as EAV/CR? You also had a CSV example in the footer 
of the DBpedia page I referenced.


Links:

1. 
<http://kingsley.idehen.net/DAV/home/kidehen/Public/DropBox/Public/Linked%20Data%20Resources/linked-data-rdf-test2.ttl> 
-- a URI/URL denoting my sample document comprised of structured content 
that is 100% compatible with TimBL's linked data memes, so what makes it 
uniquely RDF?


Kingsley
> I also trying to pinpoint the dangers of drifting away from this 
> definition based on my personal experience.
>
>
> On Thu, Jun 20, 2013 at 3:54 PM, Kingsley Idehen 
> <kidehen@openlinksw.com <mailto:kidehen@openlinksw.com>> wrote:
>
>     On 6/20/13 2:16 PM, Stephane Fellah wrote:
>>     Kingsley,
>>
>>
>>
>>
>>     On Thu, Jun 20, 2013 at 1:28 PM, Kingsley Idehen
>>     <kidehen@openlinksw.com <mailto:kidehen@openlinksw.com>> wrote:
>>
>>         On 6/20/13 12:50 PM, Stephane Fellah wrote:
>>>         Hi,
>>>
>>>         I agree with Luca's viewpoint. The W3C standard RDF model
>>>         (a.k.a triple model) is one of most fundamental piece of the
>>>         technology stack defining Linked Data (along with URIs and
>>>         HTTP).
>>
>>         I am not disputing that point.
>>
>>         Here's what in dispute, and the topic of debate to me: the
>>         misconception that you MUST know anything about RDF en route
>>         to creating and publishing Linked Data. RDF is an optional
>>         implementation detail with a particular outcome in mind i.e.,
>>         the ability for humans and machines to understand the entity
>>         relationship semantics that constitute the Linked Data.
>>
>>
>>     Can you provide some examples to clarify your point here? Do you
>>     consider CSV files as Linked Data ?
>
>     Of course you can produce Linked Data content via a CSV file [1][2].
>
>
> You have not really answered my question. My question is:  Do you 
> consider CSV file as Linked Data ? Your answer should hopefully be NO 
> (it's just data without semantic). I am not asking if CSV can be 
> transformed to Linked Data.  Of course it can ! Any data can be 
> transformed to linked data.  The point I am trying to make is that CSV 
> files are not by nature Linked Data. To perform the conversion to 
> Linked Data, we could use the following steps:
> 1) Define a unique URI for each row (Subject) .
> 2) For each column, define a URI for its semantic (Property).
> 3) For each cell value, define a  literal or  URI (in case of a 
> reference to another resource).
> 4) Make the data accessible through HTTP.
>
> The end results is a set of triples that represent a directed labeled 
> graph (RDF Model). I just demonstrated to you that I use the RDF Model 
> (Directed labeled graph composed of triples with URIs).
>
>
>
>>     Do you consider RDBMS Tables ( using primary keys of the database
>>     as identifiers) as Linked data ?
>
>     Nice that you asked, I can use RDBMS keys to demonstrate different
>     kinds of Linked Data to you, for sure [3][4][5][6].
>
>
> Again, you dodged my question by showing me how you convert RDBMS to 
> Linked Data. Again the steps taken to realize your example are similar 
> to the one I described for CSV. The result is a directed label graph 
> (RDF Model).
> The original RDBMS table are not Linked Data, they are structured data 
> without any semantic. The mapping to URIs and the decomposition of the 
> E-R model to a directed label graph using URIs means that you are 
> using the RDF Model.
>
> Turning RDBMS into XML document without URIs (as for example in 
> Geographic Markup Language) is not Linked Data, as the tags have not 
> well defined semantics (XML is actually semi-structured data) and 
> entities are not decomposed into their simplest form (triples).
>
>
>
>>      Do you consider XML documents using XPointer and XLink as Linked
>>     Data (like in Geographic Markup Language GML) ?
>
>     By now, you should understand that non of these formats have
>     anything to do with RDF.
>
>
> Exactly ! That was my point: CSV, RDBMS and XML documents have nothing 
> to do with RDF and THUS ARE NOT LINKED DATA. They are just "dumb" data 
> (unstructured (documents), semi-structured (XML for example), 
> structured (RDBMS, Images etc) ) without any semantic, making it 
> impossible for machine to process automatically because you have to 
> write code to interpret the semantic of the information they convey.
>
>
>
>>     Do you consider XML documents using local identifier xml:id as
>>     Linked Data ? I personally do not consider them as Linked Data
>>     because they do not adhere to the RDF model (meaning I cannot
>>     harvest them as a set of triples using URIs). If you disagree
>>     with my point, then we should have different terminologies to
>>     distinguish RDF compliant data versus the rest.
>
>     Circa. 2013,  RDF isn't bound to any data serialization format (it
>     never really was).
>
>     RDF isn't bound to any concrete syntax for graphical expression of
>     structured data. It has an abstract syntax that outlines the
>     grammar to be used when representing entity relationships using
>     triples (or 3-tuples).
>
>     The greatest feature of RDF is that it is self-describing,
>     described, and understandable by an RDF processor sucking in RDF's
>     own vocabulary [7].
>
>
>
> You didn't answer my question. You state something that everyone 
> already knows in this mailing list.  The whole point of RDF model is 
> to decompose any piece of data into its simplest and most atomic form 
> (the triple form) and to convey meanings by the use of unique 
> identifiers (URIs). The triple model and directed label graph is not 
> the invention of RDF (as you stated in one of your previous message). 
> Similarly unique identifiers is not the invention of URIs. However it 
> is the model used for Linked Data and the W3C standard that defines it 
> is the RDF model specification. Making the claim that Linked data does 
> not need RDF is just confusing, misleading, and is counter productive 
> for the community.
>
>
>
>>
>>>         I think it is important to make understand the community
>>>         that Linked Data  can be serialized into different
>>>         representations (Turtle, RDF/XML, JSON-LD, N3, NTriples,
>>>         TrigG, and any future formats) , as long as they are
>>>         isomorphic to RDF model (meaning data can be converted to a
>>>         set of triples and identifiers are based on URIs).
>>
>>         I really don't believe that I am disputing this point.
>>         Neither do I believe the point (above) is new to anyone on
>>         this list.
>>
>>>         If the data are NOT convertible to RDF model, I do not
>>>         consider it as Linked Data.
>>
>>         And that assertion is inaccurate. It is also indefensible.
>>         The World Wide Web as it already exists is full of Linked
>>         Data for which RDF processors may or may not exist. It
>>         functions, humans and programs understand the "LinksTo"
>>         relation etc.. That's why it works and scales the way it does.
>>
>>
>>     That is where I differ with you: The World Wide Web as it already
>>     exists is full of "Data", not "Linked Data".
>
>     Well, we just disagree. I don't know what you think HTML
>     represents, or why you feel documents aren't entities worthy of
>     ambiguous denotation or structured-machine-readable description etc.
>
>
> HTML is certainly not Linked Data (contents and hyperlinks don't have 
> explicit semantic) and machine cannot interpret the information that 
> the document conveys without extra additional information (such as 
> RDFa, GRDDL).
>
>
>     In my eyes, the World Wide Web is just medium with evolving
>     resolution. As it evolves the resolution of its constituency (its
>     webby entity relations) simply increases. RDF simply provides a
>     way for us (via RDF processors) to increase the resolution of
>     web-like structured data (which includes the mesh we know as the
>     World Wide Web).
>
>
> I agree with this.
>
>>     To become Linked Data they need to be converted to RDF Model,
>>     meaIning be compliant with triple model and uses URIs and HTTP to
>>     be linkable.
>
>     "RDF Model" doesn't become meaningful by will. You sentence about
>     doesn't mention a single defining characteristic of RDF. Doesn't
>     HTML leverage HTTP and URIs?
>
>
> Please read again my sentence. It defines three characteristics of RDF 
> : the triple model, use of URIs and HTTP.  I am not sure why you 
> asking about HTML ? Do you consider HTML as Linked Data ?
>
>
>>     CSV files, XML with local identifier files, Database tables are
>>     NOT  linked data until they adhere to the Triple Model and uses
>>     URI for identification (thus being compliant with the RDF Model).
>
>     You make Linked Data by making a commitment to the following
>     during the act of creating and publishing web-like structured data:
>
>     1. dereferencable URIs as the denotation mechanism for entities
>     being described
>     2. a data model (basic entity relationship graph *OR* enhanced RDF
>     variant) for structured data representation
>     3. actual document content comprised of statements that represent
>     entity relationships (and if using RDF said relationship semantics
>     become *explicit* rather than *implicit*).
>
>
> I do not agree with point 2. There is only one model for Linked Data: 
> Directed Labeled Graph with use of URIs to denote the meaning of 
> resources and properties. Any other model E-R, Tuples model, binary 
> model (images) should be decomposed into its most atomic form (triple 
> forms) to become Linked data. Failure to do so will prevent 
> interoperability by creating new islands of interoperability based on 
> alternative models (see my anecdote below). I don't know what you mean 
> by "Enhanced RDF Variant" . Directed Labeled Graph is the simplest 
> model that can truly scale.
>
>
>>
>>
>>         Guess what, even though the World Wide Web is dominated by
>>         HTML content, it was bootstrapped on the back of a draconian
>>         mandate that everything MUST be interpretable as HTML.
>>
>>         Ironically, DBpedia most powerful deliverable was the use of
>>         HTML to expose the concept of Linked Data. We stuck RDF/XML
>>         and other formats in the footer pages of said documents.
>>
>>>         To make the system works, you need some set of standards on
>>>         which everyone agree: HTTP, URIs, RDF are fundamental to
>>>         Linked Data.
>>
>>         URIs and web-liked structured data representation are
>>         fundamental to Linked Data.
>>
>>         RDF is fundamental to Blogic.
>>
>>
>>     RDF is fundamental to build the "Global Linked Data Graph"
>>     (Directed Labeled Graph model based on URIs).  Inferencing,
>>     ontologies, SPARQL,  BLogic,  are just value-adds capabilities on
>>     top of Linked Data. You do not need BLogic for Linked Data.
>
>     If you didn't need Blogic, then why bother giving entities
>     unambiguous names. Why bother having such a concept? Why bother
>     with relationship roles like Subject, Predicate, and Object? I
>     mean, we can just rely on the mysterious magic of the literals
>     "RDF" and poof! All is understood, on this Giant Global (entity
>     relationship) Graph of Linked Data, by humans and machines.
>
>
> Unique identifier has nothing to do with blogic.  Unique identifier is 
> used to denote the meaning of something (in case of RDF a concept). 
> Unique identifiers are used in many other systems (telephone numbers, 
> social security number, ISBN numbers) and are fundamental to have a 
> scalable system. It has nothing to do with BLogic. BLogic (or any 
> other form of logics) is used to perform interpretation of the 
> information. It is orthogonal to RDF model.
>
>>
>>
>>>          Saying we do not need RDF model for Linked Data is like
>>>         saying we do not need URL or HTTP for the web of documents.
>>
>>         Again, here is what I am saying: You don't need to know
>>         anything about RDF to create and publish Linked Data. Please
>>         read my words, don't react to them.
>>
>>
>>     Based on my comments, I disagree with you on this point.
>
>     Clearly you do, but at some point, you will realize what I am
>     trying to unveil here. By the way, I wasn't born with a *unique*
>     understanding of these matters, I came to understand data
>     representation, access, integration, and management over many
>     years of learning from others, across many scenarios and projects.
>
>
> I have been working on interoperability issues over the last 15 years 
> (mainly in geospatial domain). I have been advocating the use of RDF 
> for the last 13 years after realizing that it was the only truly 
> scalable model that could solve the data integration problem. It has 
> also been many years of frustration trying to convince that RDF model 
> was the right model for data integration.
>
> To close this discussion, I wanted to share with you an anecdote  to 
> illustrate the risk of fragmentation of the web when introducing 
> alternative models (as David Booth mentioned in his excellent writing 
> at the beginning of this thread) and the importance to stick to our 
> guns with the need of RDF Model for Linked Data.
>
> Back in 2000, I started to be involved in Open Geopatial Consortium. 
> The goal of the consortium was to define a set of standards to enable 
> interoperability and integration of geospatial information and 
> services to fullfill the vision of "Geospatial Web". Geographic Markup 
> Language (GML) was proposed by one of the member of the consortium. 
> GML 1.0 was based on RDF.  It was a brillant idea and I gave my full 
> support to the effort. Unfortunately, 1 year later, GML 2.0 switched 
> from RDF to XML schema (XML and XML schema were the buzzwords at this 
> time). GML 2.0 took the RDF model and duplicated it using XML Schema. 
> They used different terms (Feature for Resource and Feature Property 
> for RDF Property). Everyone was amazed by the expressiveness of the 
> model and they started to describe every geospatial domain in GML (the 
> last spec of GML 3.0 has more than 600 pages now). The reason of the 
> switch invoked by the author of GML  was because RDFS was not 
> expressive enough to convey restrictions on data. I was pointing him 
> out to DAML+OIL effort (which was still in its infancy at the time), 
> but because it's lack of maturity, the consortium decided to use XML 
> Schema. I spent many years trying to convince people that was a 
> mistake and to go back to RDF model (which enforce the use of URIs) 
> and showing how we could express GML semantics with OWL.  I hit a 
> wall. GML became overly complicated  overtime (use of substitution, 
> schema profiles, lack of tools). GML just focus on structure of the 
> data, not on their semantic. GML was not machine interpretable.  I 
> just got a hard time to make them understand that human readable tags 
> do not have semantic.  Implementing each profile of GML has become a 
> sisyphean coding work to encode the semantic of each new GML profile 
> (CityML, SensorML, etc...)
>
> The end results of that is that OGC has created its own island of 
> interoperabilty and cannot be integrated easily with the Linked Data 
> without performing some mapping to URIs.  13 years of effort of 
> modeling has been captured in XML schema focusing mainly on 
> structured, syntax and validation. The formal semantic of all these 
> models is buried in a 600 pages documents and produced brittle systems 
> due the misinterpretation of specification by coders. Large investment 
> been done by many companies to implement GML, but the dream of 
> realizing the geospatial web is far to be fullfilled and all the 
> semantic still remain to be encoded. The irony of the story is that 
> OGC has produced GeoSPARQL but data are encoded in GML. I am glad to 
> see that after all these years, Linked Data is starting to get at last 
> some traction.
>
> The morale of this anecdote is that we have to be very careful not to 
> confuse the community and break apart with some alternative 'fancy' 
> solutions or definitions that are not well thought. Other the last 13 
> years I have been a strong believer of the Semantic Web and times has 
> proved again and again that it is the best solution to solve 
> integration problems we have today. I urge you to keep the original 
> definition of Linked Data, as defined by TBL, which mention the need 
> of RDF model and not trying to come out with other interpretations 
> that open the door for fracturing the vision of the Semantic Web,
>
>
>
> Best regards
> Stephane Fellah
>
>
>
>     In my world, every day is a new opportunity to discover and learn
>     something new. I am only afraid of the day when that doesn't happen!
>
>     Links:
>
>     1.
>     http://lists.w3.org/Archives/Public/public-lod/2013Jun/0083.html
>     -- post that started this thread (note: it includes links to a CSV
>     Browser)
>
>     2. http://bit.ly/18axeTP -- CSV Browser link that handles
>     SPARQL-FED query results returned in CSV format
>
>     3. http://bit.ly/18pGTFd -- green links demonstrating Linked Data
>     in a SQL RDBMS silo (a silo because the URNs derived from the DBMS
>     keys only resolve to relational tables based entity descriptions,
>     locally i.e., I can't copy and paste the URIs to an application
>     outside the DBMS e.g. a Web Browser)
>
>     4. http://bit.ly/11Brjz7 -- a Relation based on an relational
>     table remapped to an entity relationship model (e.g., EAV) this is
>     deliberately presented as quad so that the sources Tables aid
>     understand of the context flip
>
>     5. http://bit.ly/13fnIbr -- introducing blue links, HTTP URIs
>     replacing those DBMS specific URNs with local scope i.e.,
>     Web-scale super keys that resolve to descriptions from anywhere
>     via copy and past
>
>     6.
>     http://demo.openlinksw.com/OracleHR/employees/EMPLOYEE_ID/101#this
>     -- example of a Linked Data URI that you can click on en route to
>     seeing HTTP URI de-silo-fication in action combined with Linked
>     Data (RDF magic comes later when I seek to merge disparate data
>     across heterogeneous data sources)
>
>     7. http://bit.ly/147HINl -- RDF described in RDF and presented
>     using a Linked Data Browser page
>
>     8. http://dbpedia.org/resource/Linked_data -- go to the page
>     footer to see the variety of support formats (btw -- RDF appears
>     to be missing from the abstract, at this point in time)
>
>     9. http://bit.ly/15ZxzHo -- Vapor (Linked Data principles
>     conformance verifier) report for the DBpedia URI above (also
>     demonstrating the role formats play in this realm distinct from
>     abstract syntax) .
>
>     Kingsley
>>
>>         Kingsley
>>>
>>>         Sincerely
>>>         Stephane Fellah
>>>
>>>
>>
>>     Stephane
>>
>>>
>>>
>>>
>>>         On Thu, Jun 20, 2013 at 11:45 AM, Luca Matteis
>>>         <lmatteis@gmail.com <mailto:lmatteis@gmail.com>> wrote:
>>>
>>>
>>>             On Thu, Jun 20, 2013 at 5:02 PM, Melvin Carvalho
>>>             <melvincarvalho@gmail.com
>>>             <mailto:melvincarvalho@gmail.com>> wrote:
>>>
>>>               # Restate/reflect ideas that in other posts that are
>>>                 troubling/puzzling and ask for confirmation or
>>>                 clarification.
>>>
>>>
>>>             I am simply confused with the idea brought forward by
>>>             Kingsley that RDF is *not* part of the definition of
>>>             Linked Data. The evidence shows the contrary: the top
>>>             sites that define Linked Data, such as Wikipedia,
>>>             Linkeddata.org and Tim-BL's meme specifically mention
>>>             RDF, for example:
>>>
>>>             "It builds upon standard Web technologies such as HTTP,
>>>             RDF and URIs" - http://en.wikipedia.org/wiki/Linked_data
>>>             "connecting pieces of data, information, and knowledge
>>>             on the Semantic Web using URIs and RDF." -
>>>             http://linkeddata.org/
>>>
>>>             This is *the only thing* that I'm discussing here.
>>>             Nothing else. The current *definition* of Linked Data.
>>>
>>>               # Restate the actual subject and focus of the
>>>                 discussion; the subject line just doesn’t always cut it.
>>>
>>>
>>>             Again the subject line is the *definition* of the term
>>>             Linked Data. More specifically whether it includes (or
>>>             should include) RDF.
>>>
>>>               # Do more explication with the awareness that we might
>>>                 be talking about two (or more!) related but separate
>>>                 ideas/concepts. Or we could be using the same terms
>>>                 but with slightly different definitions.
>>>
>>>
>>>             I want to concentrate on the current definition of the
>>>             Linked Data term. Why do the main sites built from the
>>>             Linked Data community *strictly* describe RDF as one of
>>>             the main technologies that enable Linked Data?
>>>
>>>               # Define the terms inline rather than just linking
>>>                 out. One’s interpretation of an external standard or
>>>                 specification could be different from someone
>>>                 else’s, so I think it would be good to own it.
>>>
>>>
>>>             I simply think RDF is part of Linked Data's definition,
>>>             because of the evidence I have shown above. If this is
>>>             not the case, we should discuss it as a community. If we
>>>             decide that RDF is *not* part of the definition of
>>>             Linked Data, we should probably remove it from all the
>>>             top sites, otherwise it will create confusion for
>>>             newcomers.
>>>
>>>             Also we should make new Linked Data coffee mugs ;-)
>>>
>>>             Luca
>>>
>>>
>>
>>
>>         -- 
>>
>>         Regards,
>>
>>         Kingsley Idehen	
>>         Founder & CEO
>>         OpenLink Software
>>         Company Web:http://www.openlinksw.com
>>         Personal Weblog:http://www.openlinksw.com/blog/~kidehen  <http://www.openlinksw.com/blog/%7Ekidehen>
>>         Twitter/Identi.ca handle: @kidehen
>>         Google+ Profile:https://plus.google.com/112399767740508618350/about
>>         LinkedIn Profile:http://www.linkedin.com/in/kidehen
>>
>>
>>
>>
>>
>
>
>     -- 
>
>     Regards,
>
>     Kingsley Idehen	
>     Founder & CEO
>     OpenLink Software
>     Company Web:http://www.openlinksw.com
>     Personal Weblog:http://www.openlinksw.com/blog/~kidehen  <http://www.openlinksw.com/blog/%7Ekidehen>
>     Twitter/Identi.ca handle: @kidehen
>     Google+ Profile:https://plus.google.com/112399767740508618350/about
>     LinkedIn Profile:http://www.linkedin.com/in/kidehen
>
>
>
>
>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen







Received on Friday, 21 June 2013 15:38:48 UTC

This archive was generated by hypermail 2.3.1 : Friday, 21 June 2013 15:38:49 UTC