New job, less attention for blogging. But let me share the ideas of the ELD project currently running with the Platform Linked Data Nederland.
Linked Data and Enterprise Environments
Linked Data (LD) is about searching, sharing and linking information from various sources and different contexts. Data originating from organisations all over the world can easily be shared over the internet with LD technology. On a smaller scale, LD can be used the same way within one and the same organisation, typically in analytics and knowledge processes.
However, the majority of enterprise business processes is of administrative nature. To support data quality, adequate staff proficiency (training, work instructions) and regulatory compliance, administrative environments rely heavily on fixed predefined data structures. The typical application of LD technology, with flexible relations between data elements, doesn’t seem to fit well in these environments.
Nevertheless, we see added value using LD technology to support information sharing and integration within administrative enterprise business processes. We will call this Enterprise Linked Data (ELD).
While established best practices in this field like web services, ESB’s and ETL cover all current business needs, we think LD techniques can provide the following added value.
- Enhanced data quality of end-to-end processes (especially in consolidation and M&A situations)
- Greater flexibility of the sharing/integration infrastructure (meaning lower TCO and shorter TTM)
As mentioned above, the context of enterprise business processes differs from the setting of usual LD applications. This figure shows our model of the ELD context.
The aim of our ELD project is to provide a proof of concept (PoC) of this model using LD techniques, taking into account the following typical enterprise requirements.
- Frequent data updates (Δ data).
- Data availability within Host 2 must be independent of availability of Host 1 and the Network.
- Data presentation (model ‘) has a predefined structure and is fixed (or: has its own life cycle).
On the other hand we require the implementation to provide some typical LD advantages.
- Data semantics can be defined by world wide standards or by self descriptive data and can thus be supported independently of possibly unavailable, incomplete, outdated or inconsistent documentation.
- Channel is transparent for model changes (Δ model), i.e. no channel reconfiguration or deployment efforts are needed.
- Integration of additional data sources (see focus area 4) is possible without channel reconfiguration or deployment efforts.
For sake of simplicity our model is unidirectional: certain business logic supporting some business process provides a stream of (transactional) data updates in triple format (A-box statements). This stream is communicated to another hosting environment where the data is read by a user. (Of course, this user him/herself might also generate data updates that would then be passed in the opposite direction. We expect such an extension of our model to be straightforward, and will not discuss it further here.)
Note that by the first two requirements above, implementing our ELD model will essentially have the character of – yes – a replication exercise.
Replication of LD, while not very common in typical LD environments, could be expected to greatly enhance fexibility of enterprise information sharing and integration infrastructures. However, the subsequent transformation of this information to predefined models in the business domain (often using XML schema definitions to validate incoming data) might just move change efforts (Δ model) from the interface domain to the business logic domain.
Therefore we include in our model the transformation T of the received information to the presentation of this information to a business user – who expects information elements to be predefined and clearly identified, having known labels, identifyers and cardinality (are there zero, one or more of these items in this context?). This transformation is realised by the query on the received data by the UI business logic, in combination with “structural” information (T-box statements) that is added to the received dataset.
- Registering business transactions in an LD environment
– time dimension
– transaction management
- LD “replication” channel
– push or pull replication: event detector at the source or polling query at the receiver?
– guaranteed delivery
– resynchronisation on request
- Predefined UI structures
– LD transformation to predefined reports/forms
– as an alternative to predefined reports/forms: flexible content windows (with specific labels etc.)
– update forms (read/write)
- Integration of multiple channels
– semantic integration
– master data management
In the PoC of the ELD project we focus on areas 2 and 3, leaving areas 1 and 4 for the moment to mainstream LD discussions.
Talking deliverables, in the ELD project we plan to provide
- a working prototype of a LD “replication” channel
- a vision on requirements and possible solutions for transforming flexible LD ontologies to predefined data structures like forms and reports
LD “replication” channel
For the prototype we’ll work on the following components.
- Triple store 1 in hosting environment 1
- Some front end to enter data in triple store 1 (A-box and T-box statements)
- Some way to simulate frequent data updates (say new A-box statements every X seconds)
- Triple store 2 in hosting environment 2
- Some front end to query the data in triple store 2
- A network channel between triple store 1 and triple store 2
- some “event detector” on triple store 1, being able to detect new triples, perform some filtering on them, and push them through the channel to triple store 2, or
- some “polling query” that periodically queries triple store 1 to find out what new triples were added there since the last poll. The results are then added into triple store 2.
- A “synchronisation service” that is provided by triple store 1 and can be invoked by triple store 2. This service will send the whole content of triple store 1 (after some filtering) to triple store 2.
Transforming flexible LD ontologies to predefined data structures
We want to provide a vision supporting the hypothesis that changes to the business logic (and data model) on the sender side can be assimilated by only adding adequate T-box statements and/or updating the UI queries on the receiving side. More specifically: can we illuminate the plausibility of the following scenario.
Suppose a change were applied to our business logic. This could be a small change like only the update of the label of a single attribute, or a major business process change, including replacement of supporting IT systems. Anyway, in our model we’d expect this change to result in the business logic providing different A-box statements (Δ data) than before, and (by some external intervention) some additional T-box statements (Δ model) to be added (on sender side).
By our “replication” channel, both new data and new model are pushed to the receiving end, without changes to either channel or triple stores. On the receiving end, only the transformation to the predefined data structure of the business layer/UI has to be adapted. This can be done by adding adequate T-box statements and/or changing the UI queries.
And although we assume that this will take human intervention, both updating the UI query as generating the necessary Tbox statements might be an end user activity with no IT involvement needed.
At Platform Linked Data Nederland we work on above model and targeted prototype in sparse spare time. We’d welcome your help. Please don’t hesitate to point out your suggestions for improvement of the model and/or project set up. If you have any ideas or suggestions as to
- the implementation of an LD replication channel, or
- a transformation of LD graphs to predefined data structures
(like links to patterns, architectures, techniques, methods, products or other projects), contact us.