Big Data needs Enterprise Linked Data

At the PLDN Symposium of 29 september 2015 I presented the main idea of the Enterprise Linked Data initiative. The slightly adapted version of the slides below incorporates the suggestion by Jan Voskuil to include Big Data as beneficiary of Enterprise Linked Data.


So there it is, the main conclusion here is that Enterprise Linked Data will provide you with a far better basis for delivering operational data to BI, analytics and Big Data platforms, as compared to the current practices based on classic relational schemes. “Better” here meaning more flexible and more precise as to the meaning of the data. This will translate immediately into a lower TCO of ETL and analytics infrastructures and into shorter TTM of new reports and new (or adapted) data sources.


But let’s go back to the slides.
In the beginning, there was data. And data integration. We are talking mid ’70’s now.
Data models were based on the Relational Model of Codd introduced in 1970. That model had been adopted world wide in “relational” data bases (RDB) and in the standard query language SQL. In these days we started using ERD and CDM techniques to integrate data within or between systems.
In the ’90’s we were not so much concerned anymore with data integration but more focussed on process integration. One of the reasons for this was the broad use of commercial of-the-shelf ERP software, that had its own (invisible and untouchable) data models. Process integration was done with EDI and (web)services, both of course containing the data needed for a particular process step, but not bothering so much about the (internal) representation of whole business objects. The underlying paradigm here is still the RDB model.


Now, in 2015, we’re seeing the propagation of the RDF model facilitating what we could call knowledge integration. The inherent syntactic compatibility of different RDF models and the possibility to include semantic metadata in datasets (using world wide vocabularies) not only greatly enhances integration potential on analytics level but also on operational interoperability level. Nowadays data is being shared between otherwise unconnected organisations in unprecedented ways.


RDF facilitates knowledge integration
And developments in this direction still continue. Natural Language Processing (NLP) and AI techniques like inference engines and reasoners will make it possible to analyse structured and unstructured information like plain text and other content alike.


However, in the mean time, our basic transaction processing systems in the back end of organisations and enterprises continue to be based on the “old” RDB model. Data is still being stored in an old-fashioned, silo-proprietary way with incompatible data models necessitating complex, expensive and time consuming translations for integration.


It is this anachronism that Enterprise Linked Data sets out to resolve. Enterprise Linked Data is about going back to the roots of data integration and introducing the RDF model there. The idea here is to have all interoperable systems equipped with Linked Data interfaces instead of with archaic (web)services, with or without ESB mediations, with or without CDM.


As a direct consequence, a whole new practice of ETL will evolve. The use of Linked Data in this domain will massively reduce the import and integration effort of source data into data warehouses.


Enterprise Linked Data as basis for economic Big Data infrastructure
And ultimately, the same flexibility and semantic clarity, which Linked Data already provides for publication and integration of information in social and scientific domains, will be available for business analytics practices. In the end, also Big Data will only be economically feasible when supported by Enterprise Linked Data.


If this vision and ambition inspire you: in the Enterprise Linked Data project we try to put the ambition to work.


What do we do?
What do we need?
  • Hands-on volunteers
  • Tooling best practices
  • A real business case

Geef een reactie

Het e-mailadres wordt niet gepubliceerd. Verplichte velden zijn gemarkeerd met *