Simply defined, a data warehouse is a collection of data designed to support management
decision making. Data warehouses contain a wide variety of data that present a coherent
picture of business conditions at a single point in time. Typically, a data warehouse is
housed on an enterprise mainframe server. It is a central repository for all or
significant parts of the data that an enterprise's numerous business systems collect.
Data warehousing describes the process of defining, populating, and using a data
warehouse. Data warehousing emphasizes the capture of data from diverse sources for useful
analysis and access. Data from various transaction processing applications and other
sources is selectively extracted and organized on the data warehouse database for use by
analytical applications and user queries.
The operational systems of an enterprise (such as accounting, personnel, and payroll)
generate voluminous amounts of data. The high quality of the data is most important;
therefore, data undergoes extraction, cleansing, and transformation based on business
rules about its suitability. Then, this high quality data is loaded onto the data
warehouse database. When queried appropriately, this database generates business
intelligence -- the real power of data warehousing. Such business intelligence permits an
enterprise to discover what is going on and what is needed to improve. Learning to
systemize business processes that are effective means a better bottom line and
profitability. Data warehousing is a journey, not a destination because it initially must
be developed and then maintained as a system over time. This involves a continual process
of data retrieval, analyses, and discovery.
Data warehousing is an ongoing process, not something that is built once and left to
run. Some best practices for data warehousing consist of the following:
Build organizational commitment while managing user expectations. This
translates into selecting the right sponsor from the business or IT side, securing
funding, and having an overall project driver (usually a technical person) responsible for
results. Data warehousing is about servicing users. Users need to understand the
functionality and performance capabilities for it. They need to realize that a prototype,
not a finished product is the initial deliverable with improved releases to follow.
Think big, but deliver incrementally and often. Balance business benefits and
technical risks to reduce financial risk. Data warehousing is a process that can initially
take from 6-to-9 months for its first release. Components of the first release are: data
warehouse strategy and vision, release plan, technical architecture, requirements
analysis, design, prototype, development, deployment, and benefit assessment. The second
revision can take another 4-to-6 months and reiteratively improves upon requirements
analysis, design, prototype, development, deployment, and benefits assessment. Successive
releases executed in the same manner allow for a smooth growth in the integration of data
and subsequent incremental benefits enjoyed.
Avoid "data dumping" while modeling the business and preventing
analysis paralysis. Business intelligence is only as good as the data it is predicated
upon so ample attention needs to go into data cleansing and transformation. Bad data in
means bad data out. A dynamic business model should consider the following:
Business Goals/Objectives -- Where does the organization want or need
to be?
Business Area/Function -- What functional areas impact meeting the
organization's goals/objectives?
Improvement Opportunity -- What opportunities exist to improve the way
business is done?
Knowledge/Information -- What questions must the organization be able
to answer to improve?
Data -- What data supports answering those questions and how can it be
organized for service optimization?
Use joint application design (JAD) for building consensus-based requirements.
Cooperation and collaboration are not automatic during projects. Some effective JAD
elements for achieving consensus are:
Structured agenda forums that focus on producing deliverables
Empowered participants that are responsible for their performance and
work
Trained, objective third party facilitators who eliminate blockages
and get progress going
Self-documentation processes that officially record agreements and
responsibilities/ functions
Explicit issue management that deals specifically with a defined issue
and no other
Apply risk-based approach to data quality and integration. Populating
databases with good quality data is the key to effective data warehousing. Resolving data
quality problems is resource intensive but well worth the price -- both monetarily and in
the quality of business intelligence that results from queries. Rules need to be
established that define how choices in field nomenclature and classification are made.
Everyone needs to agree upon the common definitions for fields, what they mean, and how
they will be used. For example, fields may have the same name but different meanings or
the same meaning but different names; or have inconsistent values and identifiers or
multiple uses. Metadata, which is data that describes/explains data that is of
questionable quality, can be helpful when there is no resolution about data quality.
Focus on analytical tools and models. Choices and application of tools and
models correspond directly to what you want business intelligence to accomplish. Handling
high data volumes and complex analyses plays into the strengths of advanced technologies
and tools. Data retrieval technologies include management information systems (MIS),
executive information systems (EIS), ad-hoc query capabilities, and report generation.
Data analysis technologies include online analytical processing (OLAP), relational OLAP
(ROLAP), multi-dimensional OLAP (MOLAP), and visualization (graphical representations of
data that make trends easier to spot). Data mining technologies include rule induction,
genetic algorithms, and statistical modeling and formulas. Data mining answers questions
that you did not necessarily know to ask.
Make smart technology investments that are driven by business needs.
Businesses face concerns with:
Data source challenges such as legacy platforms, and data quality and
timeliness
End user requirements such as connectivity, usage patterns/traffic,
and geographic locations
Deployment/maintenance issues such as system performance and
availability, and database expansion