Data Integration and Access Software

Data integration and access software brings together data sets for use by other software or for presentation to end users, and also enables access of applications to databases without requiring a direct API connection. The purpose of data integration is to ensure the consistency of information where there is a logical overlap of the information contents of two or more discrete systems. To achieve a total solution, data integration software employs a wide range of technologies, including, but not limited to, data profiling; data quality; extract, transform, and load (ETL); semantic mediation; and associated metadata management. Data access is enabled by data connectivity software (which includes data connectors and connectivity drivers, and also federated data access software).

Data integration software may be used in a wide variety of functions. The most common is data warehousing, but other uses include enterprise information integration, data replication, data movement, and data synchronization, to name a few. Data integration may be deployed and executed as batch processes, typical for data warehouses, or in near-real-time modes for data synchronization or dedicated operational data stores. More challenging applications are integrating data from disparate, distributed data sources, including flat files, relational databases, XML files, and legacy applications, and the proprietary data sources associated with packaged applications from vendors such as SAP, Oracle, and Siebel.

The data integration and access software market includes the submarkets discussed in the following sections.

ETL and Database Synchronization Software

ETL software selectively draws data from source databases, transforms it into a common format, merges it according to rules governing possible collisions, and loads it into a target. This software normally runs in batch, but may also be invoked dynamically, in what vendors refer to as "real time" functionality. Such software actively moves data among correspondent databases driven by metadata that defines interrelationships among the data managed by those databases. The software performs transformations, routes the data to the target, and inserts it. It normally either features a run-time environment or operates by generating the program code that does the extract, transformation, routing of the data, and updating of the target. The following are representative vendors and products in this submarket:

• Computer Associates Advantage Data Transformer
• ETI Solution
• Group 1 DataFlow
• IBM (WebSphere DataStage)
• Information Builders WebFOCUS ETL Manager

Data Quality, Profiling, and Cleansing Software


This submarket includes products used to identify errors or inconsistencies in data, to normalize data formats, to infer update rules from changes in data, to match data entries with known values, and other activities involved in ensuring the validity and consistency of data on the one hand, or schematic details of data not incorporated in the database catalog on the other. Such activities are normally associated with data integration tasks such as data merges and federated joins, but may also be used to monitor the quality of data in the database. The following are representative vendors and products in this submarket:

• Axio and Athanor from Similarity Systems
• FirstLogic
• Quick Address
• Trillium

Data Connectivity Software

This software is used to establish connections between users or applications and databases without requiring an API or hard-coded database interface. It includes ODBC and JDBC drivers and database adapters. The following are representative vendors and products in this submarket:

• DataDirect subsidiary of Progress Software
• Easysoft ODBC drivers and gateways
• IDS JDBC drivers

Federated and Virtual Database Software

Federated database software permits the access of multiple databases as if they were one database. Most are read only, but some provide update capabilities. Virtual database products are similar, but offer full schema management coordinated with the source database schemas to create a complete database environment that sits atop multiple physical databases. The following are representative vendors and products in this submarket:

• Composite Information Server from Composite Software
• GemStone GemFire
• MetaMatrix Server

Data Integration Suites


These products blend functionality from several of the submarkets listed above and are offered as singly priced products or product suites. They generally include data quality, data movement, and transformation (a subset of ETL), and some federated data access capability, along with design and management tools. The following are representative vendors and products in this submarket:

• IBM WebSphere Information Integrator
• Informatica Data Integration Platform family
• SAS Dataflux Integration Server

Metadata Definition and Management Software

Products in this submarket are specifically designed to model, capture, and maintain IT metadata that is associated with application development or deployment. Standalone products in this category have, as their main feature, the definition or management of metadata, which may include graphical modeling capabilities, categorization and ontological relationship mapping, search, metadata extraction from database catalogs or other such sources, and code generation capability. This technology includes metadata registry and repository software. The following are representative vendors and products in this submarket:

• Fujitsy/Software AG (Centrasite)
• IBM/Ascential (MetaStage)
• Informatica (SuperGlue)
• MetaMatrix (MetaBase)
• Pantero
• Systinet/Mercury (Business Services Registry)
• Unicorn (Unicorn System)

 
Copyright 2006 IDC - Global Headquarters: 5 Speen Street Framingham, MA 01701 USA - P. 508.872.6200 - F. 508.935.4015 - www.idc.com