《精通數據倉庫設計》(Mastering Data Warehouse Design)中英對照——第1章

《精通數據倉庫設計》(Mastering Data Warehouse Design)中英對照——第1

第一部分 基本概念

我們發現,理解爲什麼採納某個具體的方法,能幫助我們理解這個方法的價值並應用這個方法。因此,這一節的開始,我們先介紹企業信息工廠(Corporate Information Factory CIF),這種已經被證明的、穩定的體系結構。在這種體系結構下,商業智能(BI),包含兩種形式的數據存貯,每一種都有一個BI環境下具體的角色。第一類數據存貯是數據倉庫,數據倉庫主要的角色是擔當數據知識庫,存貯來自不同數據源的數據,使它能被另一類數據存貯訪問。另一類數據存貯就是數據集市。總的來說,設計數據倉庫最有效的方法是基於實體-關係數據模型和範式技術(由Code   Date 最初在19709090年代爲關係數據庫創建)。

PA數據集市的主要角色是提供企業用戶一個容易的訪問優良的、集成的信息的方法。在第1章描述有幾種類型的數據集市,最常用的數據集市是創建聯機分析處理(OLAP),OLAP最有效的設計方法是維度數據模型。

在第2章,我們繼續這個基本的主題,解釋最重要的關係建模技術,介紹所需要的不同類型的模型,提供建立關係模型的過程,同時,我們解釋爲企業構建一個堅固的基礎時,商業數據型、系統數據、技術數據等模型等各類數據模型之間的關係,並解釋他們之間是如何互相共享或繼承特性。

1 介紹

歡迎閱讀本書,這是第一本徹底描述構建一個多用途的、穩定的、可持續的,支持商業智能的數據倉庫建模技術的書。這一章介紹BI及數據倉庫的目標,解釋他們如何組合成一個整體的企業信息工廠體系結構,討論數據倉庫建設的迭代性,論證數據倉庫數據模型的重要性,以及採用這種數據模型形式的理由。我們討論這種模型形式爲什麼應該基於關係設計技術,闡明是爲了滿足最小冗餘,最大穩定性和可維護性的需要。這一章的另一節列出了可維護的數據倉庫環境的特點。最後討論這種建模方法對最終交付數據集市的影響。這一章,讓讀者理解後續章節的基本原理,後續章節會描述創建數據倉庫模型的細節。

 

Chapter 1 Introduction CHAPTE

Welcome to the first book that thoroughly describes the data modeling techniques used in constructing a multipurpose, stable, and sustainable data warehouse used to support business intelligence (BI). This chapter introduces the data warehouse by describing the objectives of BI and the data warehouse and by explaining how these fit into the overall Corporate Information Factory (CIF) architecture. It discusses the iterative nature of the data warehouse construction

and demonstrates the importance of the data warehouse data model and the justification for the type of data model format suggested in this book. We discuss why the format of the model should be based on relational design techniques, illustrating the need to maximize nonredundancy, stability, and maintainability. Another section of the chapter outlines the characteristics of a maintainable data warehouse environment. The chapter ends with a discussion of the impact of this modeling approach on the ultimate delivery of the data marts. This chapter sets up the reader to understand the rationale behind the ensuing chapters, which describe in detail how to create the data warehouse data model.

 

1.1商業智能概述

商業智能,在數據倉庫領域,指的是一個企業學習過去的行爲與活動,理解組織的過去,確定組織的現狀,預計或者改變將來會發生的事情的能力。BI的概念已經提出20年了,讓我們簡短的回顧過去令人興奮的、不斷創新的10年。

Overview of Business Intelligence

BI, in the context of the data warehouse, is the ability of an enterprise to study past behaviors and actions in order to understand where the organization has been, determine its current situation, and predict or change what will happen in the future. BI has been maturing for more than 20 years. Let’s briefly go over the past decade of this fascinating and innovative history.

也許你熟悉技術採納曲線,最早採用新技術的公司叫創新者,下一類叫作早期採納者,然後有前半數成員、後半數成員,最後是落伍者。這個曲線是傳統的鐘型曲線,在開始的時候成指數增長,在後半週期市場緩慢下降。新技術一旦被引進,往往價錢昂貴且不完善,而很難應用;經過一段時間,性價比可以接受。手機(蜂窩電話)就是一個很好的例子。曾經,只有革新者(醫生和律師?)帶着手機,又笨重又昂貴,信號不連續,經常丟失通話。現在,你只要花60美元,隨處可以擁有一個手機,且服務非常的可靠。

You’re probably familiar with the technology adoption curve. The first companies to adopt the new technology are called innovators. The next category is known as the early adopters, then there are members of the early majority, members of the late majority, and finally the laggards. The curve is a traditional bell curve, with exponential growth in the beginning and a slowdown in market growth occurring during the late majority period. When new technology is introduced, it is usually hard to get, expensive, and imperfect. Over time, its availability, cost, and features improve to the point where just about anyone can benefit from ownership. Cell phones are a good example of this. Once, only the innovators (doctors and lawyers?) carried them. The phones were big, heavy, and expensive. The service was spotty at best, and you got “dropped” a lot. Now, there are deals where you can obtain a cell phone for about $60, the service providers throw in $25 of airtime, and there are no monthly fees, and service is quite reliable.

數據倉庫是這種採納曲線另一個很好的例子。事實上,如果你還沒有開始你的第一個數據倉庫項目,那沒有比現在更好的開始時間了。今天管理人期望得到大多數好的,及時的信息,用於領導企業進入下一個年代的、基於知識的決策,他們經常做到了,然而,並不是每次都這樣。

Data warehousing is another good example of the adoption curve. In fact, if you haven’t started your first data warehouse project, there has never been a better time. Executives today expect, and often get, most of the good, timely information they need to make informed decisions to lead their companies into the next decade. But this wasn’t always the case.

就在在10年前,同樣的管理者批准開發決策信息系統(Executive information systems EIS)來滿足他們的需要。EIS發起人後面的基本概念是合理的:以實時的方式,提供給管理者容易訪問的關鍵性能信息。然而,很多這類系統沒有實現它們目標,大多數是因爲基本的體系結構不能快速響應企業環境的變化。早期EIS系統另一個顯著的缺點是需要花費大量的精力去提供管理者所需要的數據。數據獲取,即提取、轉換、裝載(ETL)過程是一系列複雜的活動,它們的唯一目的是獲取最準確的、集成的數據,然後通過數據倉庫或者操作型數據存貯(ODS)讓企業訪問。

Just a decade ago, these same executives sanctioned the development of executive information systems (EIS) to meet their needs. The concept behind EIS initiatives was sound—to provide executives with easily accessible key performance information in a timely manner. However, many of these systems fell short of their objectives, largely because the underlying architecture could not respond fast enough to the enterprise’s changing environment. Another significant shortcoming of the early EIS days was the enormous effort required to provide the executives with the data they desired. Data acquisition or the extract, transform, and load (ETL) process is a complex set of activities whose sole purpose is to attain the most accurate and integrated data possible and make it accessible to the enterprise through the data warehouse or operational data store (ODS).

整個過程以手工密集的活動開始:硬編碼“數據吸管”是唯一從操作型系統獲取數據的方法,用於商業分析師的訪問。這有點類似於早期的電話,穿着輪滑來回穿梭的操作員很難通過插入正確的線繩,連接你呼叫的電話。

The entire process began as a manually intensive set of activities. Hard-coded “data suckers” were the only means of getting data out of the operational systems for access by business analysts. This is similar to the early days of telephony, when operators on skates had to connect your phone with the one you were calling by racing back and forth and manually plugging in the appropriate cords.

 

幸運的是,我們已經比那個年代前進了很多,數據倉庫行業已經開發了太多的工具和技術支持數據的獲取過程。現在,大多數ETL過程都已經自動化,就像今天的電話系統。同時,類似於電話的發展,這個過程保留了一些困難的,或者說本身決定的,複雜的問題。沒有兩個公司有同樣數據獲取過程,甚至不會有同樣的問題。今天,大多數擁有重要數據倉庫的大公司,嚴重依賴於 ETL工具,用於設計,構建和維護他們的BI環境。

過去十年,另一個主要的改變是建模技術和工具的引入,帶到了“容易使用”的階段。由RalphKimball博士等人提出的維度建模概念,對全球的支持聯機分析處理(OLAP)多維模型數據集市造成很大影響。

Fortunately, we have come a long way from those days, and the data warehouse industry has developed a plethora of tools and technologies to support the data acquisition process. Now, progress has allowed most of this process to be automated, as it has in today’s telephony world. Also, similar to telephony advances, this process remains a difficult, if not temperamental and complicated, one. No two companies will ever have the same data acquisition activities or even the same set of problems. Today, most major corporations with significant data warehousing efforts rely heavily on their ETL tools for design, construction, and maintenance of their BI environments.

Another major change during the last decade is the introduction of tools and modeling techniques that bring the phrase “easy to use” to life. The dimensional modeling concepts developed by Dr. Ralph Kimball and others are largely responsible for the widespread use of multidimensional data marts to support online analytical processing.

 

除了多維分析,還開發了其它一些複雜的技術用於支持數據挖掘、統計分析、探索等需要。現在,一個成熟的BI環境需要比星型模式多得多:平文件、無偏數據統計子集,規範化數據結構模式等,除了星形模式,所有這些都屬數據倉庫必須支持的、重要的數據需求。

當然,我們不能低估互聯網對數據倉庫的影響。互聯網消除了計算機的神祕性,管理者在日常生活中使用互聯網,不再對觸摸鍵盤心存芥蒂。終端用戶工具公司認識到了互聯網的影響,且大多數都利用了這種成就:它們的界面都複製了流行的互聯網瀏覽器與搜索引擎的視覺特性。這些工具的強大及直觀,導致商業分析師和管理者廣乏使用BI

In addition to multidimensional analyses, other sophisticated technologies have evolved to support data mining, statistical analysis, and exploration needs. Now mature BI environments require much more than star schemas— flat files, statistical subsets of unbiased data, normalized data structures, in addition to star schemas, are all significant data requirements that must be supported by your data warehouse.

Of course, we shouldn’t underestimate the impact of the Internet on data warehousing. The Internet helped remove the mystique of the computer. Executives use the Internet in their daily lives and are no longer wary of touching the keyboard. The end-user tool vendors recognized the impact of the Internet, and most of them seized upon that realization: to design their interface such

that it replicated some of the look-and-feel features of the popular Internet browsers and search engines. The sophistication—and simplicity—of these tools has led to a widespread use of BI by business analysts and executives.

發生最近幾年的另一個重要事件是:發生了從技術追趕業務到業務驅使技術的轉變。在BI的早期,信息技術(IT)部門認識到了BI的價值,並努力向商業團體兜售這些價值。不幸的是,有時IT夥計向商業團體兜售的是構建數據倉庫的希望。今天,複雜的決策支持環境的價值在商業界得到廣發的認同。例如,一個有效的客戶關係管理程序不能離開戰略(含有相關數據集市的數據倉庫)和戰術(操作型數據存貯和操作型集市)的決策支持能力。(見圖1.1):

Another important event taking place in the last few years is the transformation from technology chasing the business to the business demanding technology. In the early days of BI, the information technology (IT) group recognized its value and tried to sell its merits to the business community. In some unfortunate cases, the IT folks set out to build a data warehouse with the hope that the business community would use it. Today, the value of a sophisticated decision support environment is widely recognized throughout the business. As an example, an effective customer relationship management program could not exist without strategic (data warehouse with associated marts) and a tactical (operational data store and oper mart) decision-making capabilities. (See Figure 1.1)

 

BI體系結構

過去十年最重要的發展是提出了廣爲接受的BI體系結構,支持所有的技術需求。這種體系結構認識到EIS方法有不少重大缺陷,最嚴重的缺陷是EIS數據結構常常從源系統直接獲取數據,導致需要非常複雜的數據獲取環境,需要大量的人力和計算機資源去維護。CIF(見圖1.2)體系,現在已經有大多數決策支持系統使用,通過把數據隔離成主要的5個數據庫(操作型系統,數據倉庫,操作型數據存貯,數據集市,操作集市)來解決這個問題,把從源系統到商業用戶的數據移動過程合併爲一個高效的過程。

rBI Architecture

One of the most significant developments during the last 10 years has been the introduction of a widely accepted architecture to support all BI technological demands. This architecture recognized that the EIS approach had several major flaws, the most significant of which was that the EIS data structures were often fed directly from source systems, resulting in a very complex data

acquisition environment that required significant human and computer resources to maintain. The Corporate Information Factory (CIF) (see Figure 1.2), the architecture used in most decision support environments today, addressed that deficiency by segregating data into five major databases (operational systems, data warehouse, operational data store, data marts, and oper marts) and incorporating processes to effectively and efficiently move data from the source systems to the business users.

(翻轉90度之後的圖:)

這些組件進一步分爲兩個主要的組。

■■“取數據入”組從操作型系統獲取數據,集成,清洗並推入數據庫,以方便使用。在CIF中包含如下組件:

■■操作型系統數據庫(源系統)包含公司日常的商業數據,這仍然是決策支持系統最主要的數據來源。

■■ 數據倉庫是集成的、包含明細的、包含歷史數據的數據集合,用於支持戰略決策。

■■操作型數據存貯是集成的,明細的,現在的數據集合,用於支持戰術決策。

These components were further separated into two major groupings of components and processes:

■■ Getting data in consists of the processes and databases involved in acquiring data from the operational systems, integrating it, cleaning it up, and putting it into a database for easy usage. The components of the CIF that are found in this function:

■■ The operational system databases (source systems) contain the data used to run the day-to-day business of the company. These are still the major source of data for the decision support environment.

■■ The data warehouse is a collection or repository of integrated, detailed, historical data to support strategic decision-making.

■■  The operational data store is a collection of integrated, detailed, current data to support tactical decision making.

■■“數據獲取”組是一系列的過程和程序,用於從操作型系統抽取數據到數據倉庫和操作型數據存貯。數據獲取過程執行數據集成、清洗功能,把數據轉換爲企業統一的格式。這種企業級的格式,反映了一個企業商業規則的集成的集合。數據獲取層是CIP體系中最複雜的一部份。除了清洗和轉換外,數據獲取層還包含審計和控制過程,保證進入數據倉庫或操作型數據存貯系統數據的完整性。

■■“取信息出”由一系列過程和數據庫組成,用於把BI交付給最終的企業用戶和分析師,在CIF中包括如下組件:

■■從數據倉庫分離出的數據集市,用於提供商業團體各種各樣的決策分析支持。

■■ODS 分離出的操作集市,用於提供商業團體對現在的操作型數據進行多維訪問。

■■把數據從數據倉庫轉移到操作集市的過程叫數據交付。類似於數據獲取層,在移動數據的同時也製造數據。只是在數據交付時,來源是數據倉庫或ODS,這裏已經包含了高質量的,集成的數據,且數據符合企業的商業規則。

■■  Data acquisition is a set of processes and programs that extracts data for the data warehouse and operational data store from the operational systems. The data acquisition programs perform the cleansing as well as the integration of the data and transformation into an enterprise format. This enterprise format reflects an integrated set of enterprise business rules that usually causes the data acquisition layer to be the most complex component in the CIF. In addition to programs that transform and clean up data, the data acquisition layer also includes audit and control processes and programs to ensure the integrity of the data as it enters the data warehouse or operational data store.

■■ Getting information out consists of the processes and databases involved in delivering BI to the ultimate business consumer or analyst. The components of the CIF that are found in this function:

■■ The data marts are derivatives from the data warehouse used to provide the business community with access to various types of strategic analysis.

■■ The oper marts are derivatives of the ODS used to provide the business community with dimensional access to current operational data.

■■ Data delivery is the process that moves data from the data warehouse into data and oper marts. Like the data acquisition layer, it manipulates the data as it moves it. In the case of data delivery, however, the origin is the data warehouse or ODS, which already contains high quality, integrated data that conforms to the enterprise business rules.

CIF體系並不是一開始就如此。一開始,它由數據倉庫和一些輕量級的彙總數據、高度彙總數據組成——最開始,需要歷史數據的集合用來支持戰略決策。一段時間後,產生了操作型數據存貯,用於支持戰術決策支持系統;輕量級與高度彙總的數據存放在現在所謂的數據集市裏。

讓我們看看CIF的運轉情況。客戶關係管理(CRM)是一個普通的需求驅動器,驅動了戰術信息部件(操作型系統,操作型數據存貯,操作型集市),戰略信息部件(數據倉庫和各種類型的數據集市)。當然,對CRM來說,這些技術是必須的,但遠遠不止這些技術,除了爲客戶和組織提供長期價值外,它還需要商業策略,企業文化與架構,客戶信息等。CIF提供的架構非常適合CRM環境,在這個體系架構裏,每一個部件都有專門的設計和功能。

The CIF didn’t just happen. In the beginning, it consisted of the data warehouse and sets of lightly summarized and highly summarized data—initially a collection of the historical data needed to support strategic decisions. Over time, it spawned the operational data store with a focus on the tactical decision support requirements as well. The lightly and highly summarized sets of data evolved into what we now know are data marts.

Let’s look at the CIF in action. Customer Relationship Management (CRM) is a highly popular initiative that needs the components for tactical information (operational systems, operational data store, and oper marts) and for strategic information (data warehouse and various types of data marts). Certainly this technology is necessary for CRM, but CRM requires more than just the technology —it also requires alignment of the business strategy, corporate culture and organization, and customer information in addition to technology to provide long-term value to both the customer and the organization. An architecture such as that provided by the CIF fits very well within the CRM environment, and each component has a specific design and function within this architecture.

在這一章,我們會更詳細的描述每個部件。雖然CRM是數據倉庫和操作型數據存貯常見的應用,但是還有很多其他的應用,如企業資源計劃系統(ERP)的提供商,如SAPORACLEPeopleSoft等公司都有數據倉庫產品,並增加新的工具套件提供需要的功能。許多軟件公司現在都提供各種插件,包含一般的分析應用,例如,如利率分析、關鍵績效指標分析(KPI)等。我們會在本章的後面章節詳細的介紹CIF組件。

數據倉庫的改進非常重要的幫助公司對客戶提供更好的服務及提高公司效益。數據倉庫在技術不斷變化的同時,擁有一個穩定的體系結構。構建數據倉庫環境的工具已經發展了很長時間,他們非常複雜,對企業必需的數據提供設計、實現、維護、訪問等極大的便利。CIF架構利用這些技術和工具的革新,創建了一個環境,把數據分成5個不同的存貯,每一種存貯擔當一特定的角色,以正確的時間、正確的地點、正確的格式提供給企業團體正確的信息。想一想,你想成爲數據倉庫建設的後半部分還是落伍者?這值得等待。

We describe each component in more detail later in this chapter. CRM is a popular application of the data warehouse and operational data store but there are many other applications. For example, the enterprise resource planning (ERP) vendors such as SAP, Oracle, and PeopleSoft have embraced data warehousing and augmented their tool suites to provide the needed capabilities. Many software vendors are now offering various plug-ins containing generic analytical applications such as profitability or key performance indicator (KPI) analyses. We will cover the components of the CIF in far greater detail in the following sections of this chapter.

The evolution of data warehousing has been critical in helping companies better serve their customers and improve their profitability. It took a combination of technological changes and a sustainable architecture. The tools for building this environment have certainly come a long way. They are quite sophisticated and offer great benefit in the design, implementation, maintenance, and access to critical corporate data. The CIF architecture capitalizes on these technology

and tool innovations. It creates an environment that segregates data into five distinct stores, each of which has a key role in providing the business community with the right information at the right time, in the right place, and in the right form. So, if you’re a data warehousing late majority or even a laggard, take heart. It was worth the wait.

什麼是數據倉庫

在我們開始描述建模技術前,我們先統一一些術語的定義:什麼叫數據倉庫,它在BI中的角色和用途,支持它的構造和使用的各種部件

數據倉庫的角色和用途

我們在本章的第一節已經看到,BI 體系結構在過去的十年發生了極大的變化,從簡單的報表和EIS系統,到多維分析,到數據挖掘,到數據探索。現在又引進了可定製的分析應用,這些技術是一個強壯的、成熟的BI環境的一部份。圖1.3顯示了這些技術發展的時間框架。考慮這些重要的、明顯不同的技術和數據格式的需求,很明顯,必須從一開始就有一個貯藏室,用於存貯高質量的、可信任的、靈活的、可重用的格式的數據,這些數據用於支持和維護BI環境。從一開始,數據倉庫就是BI體系結構的一部份,不同的方法學及數據倉庫大師給與這個部件不同的名字,如:

籌備區:一個數據倉庫的變種是“後勤”籌備區,在這裏從操作型系統來的數據首先被帶到一起,是數據一種不正式的設計和維護分組,唯一的目的是給多維數據集市提供數據。

信息倉庫:IBM公司早期對數據倉庫的命名,不象籌備區定義那樣清晰,在它的定義裏,不僅包含歷史數據倉庫,還包含數據集市。

What Is a Data Warehouse?

Before we get started with the actual description of the modeling techniques, we need to make sure that all of us are on the same page in terms of what we mean by a data warehouse, its role and purpose in BI, and the architectural components that support its construction and usage.

Role and Purpose of the Data Warehouse

As we see in the first section of this chapter, the overall BI architecture has evolved considerably over the past decade. From simple reporting and EIS systems to multidimensional analyses to statistical and data mining requirements to exploration capabilities, and now the introduction of customizable analytical applications, these technologies are part of a robust and mature BI environment. See Figure 1.3 for the general timeframe for each of these technological advances.

Given these important but significantly different technologies and data format requirements, it should be obvious that a repository of quality, trusted data in a flexible, reusable format must be the starting point to support and maintain any BI environment. The data warehouse has been a part of the BI architecture from the very beginning. Different methodologies and data warehouse gurus

have given this component various names such as:

A staging area. A variation on the data warehouse is the “back office” staging area where data from the operational systems is first brought together. It is an informally designed and maintained grouping of data whose only purpose is to feed multidimensional data marts.

The information warehouse. This was an early name for the data warehouse used by IBM and other vendors. It was not as clearly defined as the staging area and, in many cases, encompassed not only the repository of historical data but also the various data marts in its definition.

數據倉庫環境必須包含各種技巧、功能和技術。因此,在設計時必須考慮兩個問題:

首先,必要使用合適的粒度,或者說細節程度,以滿足所有的數據集市。也就是說,它必須包含最普通的全部細節數據,既能給數據集市提供聚合的、彙總數據,同時也能個探索系統與數據挖掘提供事務級的數據。

其次,設計數據倉庫時不能向數據集市使用的各種工具妥協,除了考慮多維集市,還要容納統計、挖掘、探索需求。另外,必須能容納新的分析應用,爲支持隨時出現的新技術作準備。因此,它支持模式必須包含:星形模式、平文件、規範化的統計子集,及將來隨時會帶進BI的東西。考慮這些目標,讓我們看看數據倉庫怎樣適合這些複雜的體系結構,以支持成熟的BI環境。

The data warehouse environment must align varying skill sets, functionality, and technologies. Therefore it must be designed with two ideas in mind.

First, it must be at the proper level of grain, or detail, to satisfy all the data marts. That is, it must contain the least common denominator of detailed data to supply aggregated, summarized marts as well as transaction-level exploration and mining warehouses.

Second, its design must not compromise the ability to use the various technologies for the data marts. The design must accommodate multidimensional marts as well as statistical, mining, and exploration warehouses. In addition, it must accommodate the new analytical applications being offered and be prepared to support any new technology coming down the pike. Thus the schemas it must support consist of star schemas, flat files, statistical subsets of normalized data, and whatever the future brings to BI. Given these goals, let’s look at how the data warehouse fits into a comprehensive architecture supporting this mature BI environment.

企業信息工廠(CIF

CIF是一個廣爲接受的概念,對信息的存貯進行描述和分類,用於操作和管理一個成功的、強壯的BI基礎架構。這些信息存貯支持三個高級的組織過程:

■■業務操作:即企業正在發生的、日常的業務處理,我們在操作型事務處理系統與外部數據中找到這些功能。這些系統幫助運行業務,且常常用於提高度業務自動化。支持這些功能的過程相對靜態,僅僅在季度級改變。也就是說,操作型過程基本上保持穩定,除非企業有意識的改變。

■■商業智能:爲更好的瞭解公司及其產品、客戶而進行的不斷研究。業務操作過程是靜態的,而商業智能除了靜態數過程,還包含一個持續改進的過程。當業務分析師和知識工作者探索這些信息時,使用這些信息幫助他們開發新產品、衡量客戶黏度、評估潛在新市場及其他種種任務時,這些過程會發生改變。商業智能支持企業的戰略決策過程。

The Corporate Information Factory

The Corporate Information Factory (CIF) is a widely accepted conceptual architecture that describes and categorizes the information stores used to operate and manage a successful and robust BI infrastructure. These information stores support three high-level organizational processes:

■■Business operations are concerned with the ongoing day-to-day operations of the business. It is within this function that we find the operational transaction-processing systems and external data. These systems help run the business, and they are usually highly automated. The processes that support this function are fairly static, and they change only in quantum leaps. That is, the operational processes remain constant from day to day, and only change through a conscious effort by the company.

■■ Business intelligence is concerned with the ongoing search for a better understanding of the company, of its products, and of its customers. Whereas business operations processes are static, business intelligence includes processes that are constantly evolving, in addition to static processes. These processes can change as business analysts and knowledge workers explore the information available to them, using that information to help them develop new products, measure customer retention, evaluate potential new markets, and perform countless other tasks. The business intelligence function supports the organization’s strategic decision-making process.

■■商業管理:指對知識及在商業智能中發現的新知識制度化,並且帶到整個企業日常的商業操作中。商業管理包括企業爲滿足戰略決策而執行戰術決策的系統。

作爲一個整體,CIF能夠用來使一個機構所有的信息管理活動保持一致。操作型系統連接企業的骨幹,運行實時的業務。數據倉庫收集集成的、歷史的數據,支持客戶分析與分割。操作數據存貯及與它相連的操作集市支持近實時的集成客戶信息與管理,提供個性化的客戶服務。讓我們更詳細的討論CIF的各組件。

■■ Business management is the function in which the knowledge and new insights developed in business intelligence are institutionalized and introduced into the daily business operations throughout the enterprise. Business management encompasses the tactical decisions that an organization makes as it carries out its strategies.

Taken as a whole, the CIF can be used to identify all of the information management activities that an organization conducts. The operational systems continue to be the backbone of the enterprise, running the day-to-day business. The data warehouse collects the integrated, historical data supporting customer analysis and segmentation, and the data marts provide the business community with the capabilities to perform these analyses. The operational data store and associated oper marts support the near-real-time capture of integrated customer information and the management of actions to provide personalized customer service. Let’s examine each component of the CIF in a bit more detail.

操作型系統

操作型系統支持日常企業活動,關注於事務處理,從訂單管理到人力資源系統。在一個典型的組織內,操作型系統使用各種各樣的技術和體系結構,除了企業內部定製開發的軟件,還使用一些軟件公司打包的系統。操作型系在統是靜態的,僅僅業務政策或流程有意識的改變,或者出於技術的原因,如系統維護或性能調整等情況下,才發生改變。

CIF架構內,操作型系統是大多數電子數據的來源,因爲這些系統支持時間敏感的、實時的事務處理,常常爲性能或事務吞吐量而優化。操作型系統的數據可能在不同的系統中複製,而且常常不同步。操作型系統代表了企業最早的業務規則應用,數據的質量直接影響所有其他信息系統的質量。

 

Operational Systems

Operational systems are the ones supporting the day-to-day activities of the enterprise. They are focused on processing transactions, ranging from order entry to billing to human resources transactions. In a typical organization, the operational systems use a wide variety of technologies and architectures, and they may include some vendor-packaged systems in addition to in-house

custom-developed software. Operational systems are static by nature; they change only in response to an intentional change in business policies or processes, or for technical reasons, such as system maintenance or performance tuning.

These operational systems are the source of most of the electronically maintained data within the CIF. Because these systems support time-sensitive real time transaction processing, they have usually been optimized for performance and transaction throughput. Data in the operational systems environment may be duplicated across several systems, and is often not synchronized. These operational systems represent the first application of business rules to an organization’s data, and the quality of data in the operational systems has a direct impact on the quality of all other information used in the organization.

數據獲取

許多公司被引誘跳過至關重要的數據集成階段,而直接發佈一系列不一致的、非集成的數據集市。沒有數據獲取層轉換一致的業務規則,這些公司以建立一些孤立的、基於用戶或部門的數據集市。這些集市常常不能結合起來產生有效的信息,也不能在企業內共享。跳過單一的、集成的數據獲取層,往往導致分析數據不可控制地呈放射性繁殖。

Data Acquisition

Many companies are tempted to skip the crucial step of truly integrating their data, choosing instead to deploy a series of uncoordinated, unintegrated data marts. But without the single set of business rule transformations that the data acquisition layer contains, these companies end up building isolated, user- or department-specific data marts. These marts often cannot be combined to produce valid information, and cannot be shared across the enterprise. The net effect of skipping a single, integrated data acquisition layer is to foster the uncontrolled proliferation of silos of analytical data.

 

數據倉庫

現在普遍接受的數據倉庫定義由Bill Inmon1980年代提出:“面向主題的、集成的、非易失的,隨時間變化的,用來支持戰略決策的數據集合”。數據倉庫是數據集成的中心點,是數據轉化爲信息的第一步,關注於企業,爲以下目標服務:

首先,數據倉庫給整個企業提供統一的視圖,而不管怎麼使用它,這爲對數據倉庫中數據的解釋(分析)提供了靈活性。數據倉庫提供使用者一個穩定的數據源,包括前後一致的歷史數據、各部門一致的可靠數據。

其次,企業作爲一個整體,對歷史信息有巨大的需求,數據倉庫會增長到非常龐大(20——100TB),在一開始設計時,就必須考慮以有效的方式使用企業業務規則,使適應信息的增長。

最後,數據倉庫用於支持企業內各種形式的分析技術,也就是說,在數據倉庫上可以建立很多數據集市,而不是每個數據集市各自提取及使用各自的數據。

Data Warehouse

The universally accepted definition of a data warehouse developed by Bill Inmon in the 1980s is “a subject-oriented, integrated, time variant and nonvolatile collection of data used in strategic decision making”1. The data warehouse acts as the central point of data integration—the first step toward turning data into information. Due to this enterprise focus, it serves the following

purposes.

First, it delivers a common view of enterprise data, regardless of how it may later be used by the consumers. Since it is the common view of data for the business consumers, it supports the flexibility in how the data is later interpreted (analyzed). The data warehouse produces a stable source of historical information that is constant, consistent, and reliable for any consumer.

Second, because the enterprise as a whole has an enormous need for historical information, the data warehouse can grow to huge proportions (20 to 100 terabytes or more!). The design is set up from the beginning to accommodate the growth of this information in the most efficient manner using the enterprise’s business rules for use throughout the enterprise.

Finally, the data warehouse is set up to supply data for any form of analytical technology within the business community. That is, many data marts can be created from the data contained in the data warehouse rather than each data mart serving as its own producer and consumer of data.

操作型數據存貯

操作型數據存貯用於支持戰術決策,不同於數據倉庫用於支持戰略決策,和數據倉庫有些相似的屬性,但是在其他方面有很大的不同:

■■與數據倉庫一樣,都是面向主題的。

■■與數據倉庫一樣,數據是集成的。

■■數據是實時的,如果技術允許,儘可能實時。這一點與數據倉庫的歷史性有明顯的區別,ODS有少量歷史性,儘可能的接近實時狀態。

■■數據是不穩定的,或者說是可修改的,這與數據倉庫的靜態數據也是一個顯著的區別。這一點,ODS類似於事務處理系統,當新的數據流進入ODS,受影響的領域用新的信息重寫或更新,而不是寫審計跟蹤,之前的歷史數據不再保存。

■■基本上都是明細數據,包含少量聚合或彙總數據,ODS常常被設計爲用來包含事務級的數據,也就是說,主題域的最低明細級的數據。

ODS是關於客戶、產品、庫存等系統近實時的、準確的、集成的數據源,企業的任何系統都可以訪問,而不只面向具體的應用。有4ODS的基本應用,每一類有不同的特性和用途,他們之間最顯著的不同是更新的頻率,從每天到近實時(小於一分鐘的響應時間)。業務用戶經常直接訪問ODS,這與數據倉庫不同,數據倉庫很少出報表(報表由數據集市出)。

 

Operational Data Store

The operational data store (ODS) is used for tactical decision making, whereas the data warehouse supports strategic decisions. It has some characteristics that are similar to those of the data warehouse but is dramatically different in other aspects:

■■ It is subject oriented like a data warehouse.

■■ Its data is fully integrated like a data warehouse.

■■ Its data is current—or as current as technology will allow. This is a significant difference from the historical nature of the data warehouse. The ODS has minimal history and shows the state of the entity as close to real time as feasible.

■■ Its data is volatile or updatable. This too is a significant departure from the static data warehouse. The ODS is like a transaction-processing system in that, when new data flows into the ODS, the fields affected are overwritten or updated with the new information. Other than an audit trail, no history of the previous contents is retained.

■■ Its data is almost entirely detailed with a small amount of dynamic aggregation or summarization. The ODS is most often designed to contain the transaction-level data, that is, the lowest level of detail for the subject area.

The ODS is the source of near-real-time, accurate, integrated data about customers, products, inventory, and so on. It is accessible from anywhere in the corporation and is not application specific. There are four classes of ODS commonly used; each has distinct characteristics and usage, but the most significant difference among them is the frequency of updating, ranging from daily to almost real time (subminute latency). Unlike a data warehouse, in which very little reporting is done against the warehouse itself (reporting is pushed out to the data marts), business users frequently access an ODS directly.

數據交付

數據交付一般限於各種操作,如數據聚合、按某些維度或業務需求過濾數據、爲終端用戶使用方便或支持某些BI軟件工具而修改數據格式等,在整個企業交付或轉換數據。在一個成熟的CIF環境下,數據交付的基礎平臺是相對穩定的,然而,數據集市對數據的需求爲跟上商業信息的改變而發生巨大的變化,這意味着數據交付必須有足夠的靈活性來跟上這種需求。

Data Delivery

Data delivery is generally limited to operations such as aggregation of data, filtering by specific dimensions or business requirements, reformatting data to ease end-user access or to support specific BI access software tools, and finally delivery or transmittal of data across the organization. The data delivery infrastructure remains fairly static in a mature CIF environment; however, the data requirements of the data marts evolve rapidly to keep pace with changing business information needs. This means that the data delivery layer must be flexible enough to keep pace with these demands.

數據集市

數據集市是數據倉庫的一個子集,也是大多數BI分析活動發生的地方。每個數據集市的數據裁減成具體的尺寸和功能,如產品利潤分析,KPI分析,客戶統計分析等等。每一個具體的數據集市不必要滿足其他的用途,各種數據集市都有普遍性和特殊性。普遍性就是他們都是數據倉庫的子集,在物理配置上,可以和數據倉庫放在一起,也可以在分開的平臺上,數據量從幾M到幾GTB不等。爲了最大化投資回報率(ROI Return on Investment),需要包含並實現數據倉庫體系結構,這樣才能支持所有的分析。

Data Marts

Data marts are a subset of data warehouse data and are where most of the analytical activities in the BI environment take place. The data in each data mart is usually tailored for a particular capability or function, such as product profitability analysis, KPI analyses, customer demographic analyses, and so on. Each specific data mart is not necessarily valid for other uses. All varieties of data marts have universal and unique characteristics. The universal ones are that they contain a subset of data warehouse data, they may be physically collocated with the data warehouse or on their own separate platform, and they range in size from a few megabytes to multiple gigabytes to terabytes! To maximize your data warehousing ROI, you need to embrace and implement data warehouse architectures that enable this full spectrum of analysis.

元數據管理

元數據管理指在整個CIF體系內,收集、管理、配置元數據的過程集合。元數據分爲3類:技術元數據,描述CIF的物理結構,即移動、轉換數據的詳細過程;業務元數據描述CIF 的數據結構、數據元素、業務規則、業務用例;管理元數據描述CIF的操作,包括審計、性能矩陣、數據質量矩陣等其他統計信息。

 

Meta Data Management

Meta data management is the set of processes the collect, manage, and deploy meta data throughout the CIF. The scope of meta data managed by these processes includes three categories. Technical meta data describes the physical structures in the CIF and the detailed processes that move and transform data in the environment. Business meta data describes the data structures, data elements, business rules, and business usage of data in the CIF. Finally, Administrative meta data describes the operation of the CIF, including audit trails, performance metrics, data quality metrics, and other statistical meta data.

信息反饋

信息反饋允許通過CIF獲取的智能和知識以合適的方式共享給其他存貯的共享機制,它標誌一個企業是否是一個真正的“不斷學習”的企業。例如:

■■從數據集市得來的預算,反饋到數據倉庫存貯,將用於歷史分析。

■■傳送ODS更新的數據(通過事務接口)到合適的操行型系統,使其反映最新的情況。

■■返回分析結果,如客戶片區分類與生命週期價值評分數據,到操作型系統或者ODS

 

Information Feedback

Information feedback is the sharing mechanism that allows intelligence and knowledge gathered through the usage of the Corporate Information Factory to be shared with other data stores, as appropriate. It is the use of information feedback that identifies an organization as a true “learning organization.” Examples of information feedback include:

■■ Pulling derived measures such as new budget targets from data marts and feeding them back to the data warehouse where they will be stored for historical analysis.

■■ Transmitting data that has been updated in an operational data store (through the use of a Transactional Interface) to appropriate operational systems, so that those data stores can reflect the new data.

        Feeding the results of analyses, such as a customer’s segment classification and life time value score, back to the operational systems or ODS.

 

信息車間

信息車間時一套工具集合,幫助企業用戶使用CIF 的資源,一個典型的功能就是提供數據和其他資源組織、分類的方法,方便用戶的查找和使用,它是一種在企業內提升共享和重用分析結果的機制。在一些企業,也叫做企業門戶,用於組織信息資源,並且讓企業用戶直接訪問。信息車間的部件分類爲:資源庫、工具箱、工作臺。

資源庫和工具箱是企業創建信息車間的第一步,資源庫提供CIF 可用的資源和數據目錄,以企業用戶容易理解的方式組織起來。這些目錄類似圖書館的藏書目錄, 按標準的分類方法分類並排序。這些分類法常常基於企業的組織結構或者更高層的企業過程。

工具箱是一些可重用的組件集合,如報表分析工具,企業用戶可以共享,以更有效的工作及分享他人的分析成果。資源庫與工具箱是信息車間的基本組成部分。

更成熟的CIF組織支持工作臺的概念,用於繼承信息。元數據、數據、分析工具按業務功能和任務組織起來。工作臺不再使用資源庫和工具箱那樣使用嚴格的分類法,而是面向任務和工作流,支持企業用戶的工作。

Information Workshop

The information workshop is the set of tools available to business users to help them use the resources of the Corporate Information Factory. The information workshop typically provides a way to organize and categorize the data and other resources in the CIF, so that users can find and use those resources. This is the mechanism that promotes the sharing and reuse of analysis across the organization. In some companies, this concept is manifested as an intranet portal, which organizes information resources and puts them at business users’ fingertips. We classify the components of the information workshop as the library, toolbox, and workbench.

IThe library and toolbox usually represent the organization’s first attempts to create an information workshop. The library component provides a directory of the resources and data available in the CIF, organized in a way that makes sense to business users. This directory is much like a library, in that there is a standard taxonomy for categorizing and ordering information components. This taxonomy is often based on organizational structures or high-level business processes.

The toolbox is the collection of reusable components (for example, analytical reports) that business users can share, in order to leverage work and analysis performed by others in the enterprise. Together, these two concepts constitute a basic version of the information workshop capability.

More mature CIF organizations support the information workshop concept through the use of integrated information workbenches. In the workbench, meta data, data, and analysis tools are organized around business functions and tasks. The workbench dispenses with the rigid taxonomy of the library and toolbox, and replaces it with a task-oriented or workflow interface that supports business users in their jobs.

 

操作與管理

對於一個不斷增長的、可持續的CIF,操作與管理是非常必要的支撐和基礎功能。在早期的CIF實現,很多公司沒有認識到這些功能的重要性,在計劃和實施時沒有考慮。操作與管理功能包括CIF數據管理、系統管理、數據獲取管理、服務管理、變更管理,每一類管理都包含一系列維護與加強這些重要進程的過程與策略。

Operations and Administration

Operation and administration include the crucial support and infrastructure functions that are necessary for a growing, sustainable Corporate Information Factory. In early CIF implementations, many companies did not recognize how important these functions were, and they were often left out during CIF planning and development. The operation and administration functions include CIF Data Management, Systems Management, Data Acquisition Management, Service Management, and Change Management. Each of these functions contains a set of procedures and policies for maintaining and enhancing these critically important processes.

數據倉庫的多用途性

現在讀者對數據倉庫在BI中扮演的角色有了很好的理解。數據倉庫不僅是操作型數據的集成點,更加是把數據提供給各企業用戶的分發點。要爲BI決策應用提供穩定的、永久的歷史數據,必須擁有以下屬性:

它必須是面向企業的。數據倉庫應該是所有數據集市和分析應用的起點,它會被多個部門甚至是多個公司或分部使用。數據倉庫設計組一個艱難而必須解決的問題就是數據元素及定義的衝突,必須要有企業業務人員參加。

The Multipurpose Nature of the Data Warehouse

Hopefully by now, you have a good understanding of the role the data warehouse plays in your BI environment. It not only serves as the integration point for your operational data, it must also serve as the distribution point of this data into the hands of the various business users. If the data warehouse is to act as a stable and permanent repository of historical data for use in your strategic BI applications, it should have the following characteristics:

It should be enterprise focused. The data warehouse should be the starting point for all data marts and analytical applications; thus, it will be used by multiple departments, maybe even multiple companies or subdivisions. CA difficult but mandatory part of any data warehouse design team’s activities must be the resolution of conflicting data elements and definitions. The participation by the business community is also obligatory.

它的設計必須儘可能有彈性,滿足變化。因爲數據倉庫用來存貯多年大量的、細節的、戰略的數據,誰都不願意卸載數據,重新設計數據庫,然後再裝入數據,爲了避免這些操作,你應該考慮一下過程獨立、應用獨立、BI技術獨立的數據模型,目標是創建一個可以輕易容納新的數據元素,而不需要重新設計已存在的數據元素或模型的數據模型。

必須設計成能在很短的時間內裝入大量數據。數據倉庫的數據庫設計必須最小化冗餘,也即減少重複的屬性和實體。大多數的數據庫有批量裝入工具,包含一些屬性和功能用於優化大數據量的裝入,如並行選項、按塊裝入、內置應用程序接口(API)等。這些情況,也許要關掉索引,或者需要平文件。然而,必須注意到非常重要的一點:一個不好的、低效的數據庫,使用最好的裝入工具也不能解決問題。

Its design should be as resilient to change as possible. Since the data warehouse is used to store massive, detailed, strategic data over multiple years, it is very undesirable to unload the data, redesign the database, and then reload the data. To avoid this, you should think in terms of a process independent, application-independent, and BI technology-independent data model. The goal is to create a data model that can easily accommodate new data elements as they are discovered and needed without having to redesign the existing data elements or data model.

It should be designed to load massive amounts of data in very short amounts of time. The data warehouse database design must be created with a minimum of redundancy or duplicated attributes or entities. Most databases have bulk load utilities that include a range of features and functions that can help optimize this process. These include parallelization options, loading

data by block, and native application program interfaces (APIs). They may mean that you must turn off indexing, and they may require flat files. However, it is important to note that a poorly or ineffectively designed database cannot be overcome even with the best load utilities.

 

它的設計應該使用數據交付工具進行數據抽取時最優。記住數據倉庫的目標是爲企業團體使用各種數據集市提供數據,所以數據倉庫必須非常好的文檔,以方便數據交付開發組容易地編寫交付程序,數據的質量、血統、計算或者引出、意義,必須有清晰的文檔。

數據格式必須支持任何可能的BI分析工具、任何可能的技術。在同一格式內,它應該包含最少的公分母級的明細數據,以支持所有的BI技術。它的設計必須沒有偏見或者部門的情緒化的特殊應用。

It should be designed for optimal data extraction processing by the data delivery programs. Remember that the ultimate goal for the data warehouse is to feed the plethora of data marts that are then used by the business community. Therefore, the data warehouse must be well documented

so that data delivery teams can easily create their data delivery programs. The quality of the data, its lineage, any calculations or derivations, and its meaning should all be clearly documented.

Its data should be in a format that supports any and all possible BI analyses in any and all technologies. It should contain the least common denominator level of detailed data in a format that supports all manner of BI technologies. And it must be designed without bias or any particular department’s utilization only in mind.

 

支持的數據集市類型

今天,我們有太多的技術支持不同的分析需要——聯機分析處理(OLAP)、探索、數據挖掘、統計數據集市,還有現在定製的分析應用。從各種技術的唯一特性,就是支持每種類型的數據集市:

OLAP數據集市:數據集市設計成支持一般的多維分析,使用OLAP軟件工具,使用星型模式或者私有的“超立方”技術。星型模式或者多維數據庫管理系統(MD DBMS)極大的支持多維分析,以可以接受的響應時間、重複的報表,滿足穩定的需求、清楚的預定義查詢。這些分析包括銷售分析、產品利潤分析、人力資源貢獻跟蹤、渠道銷售分析等。

 

Types of Data Marts Supported

Today, we have a plethora of technologies supporting different analytical needs—Online Analytical Processing (OLAP), exploration, data mining and statistical data marts, and now customizable analytical applications. The unique characteristics come from the specificity of the technology supporting each type of data mart:

Introduction

OLAP data mart. These data marts are designed to support generalized multidimensional analysis, using OLAP software tools. The data mart is designed using the star schema technique or proprietary “hypercube” technology. The star schema or multidimensional database management

system (MD DBMS) is great for supporting multidimensional analysis in data marts that have known, stable requirements, fairly predictable queries with reasonable response times, and recurring reports. These analyses may include sales analysis, product profitability analysis, human resources headcount distribution tracking, or channel sales analysis.

探索數據倉庫。大多數數據集市設計成支持各種具體的分析與報表,探索數據庫用於提供探索,或則真正的“即興”的數據導航。自從商業瀏覽器做出了有用的發明,分析能夠格式化成另一種形式的數據集市(如OLAP),這樣其他人能長期從中得益。新技術(如語言符號、編碼矢量、位圖技術等)大大提高了探索數據的能力及更快更有效的創建原型。

Exploration warehouse. While most common data marts are designed to support specific types of analysis and reporting, the exploration warehouse is built to provide exploratory or true “ad hoc” navigation through data. After the business explorers make a useful discovery, that analysis may be formalized through the creation of another form of data mart (such as an OLAP one), so that others may benefit from it over time. New technologies have greatly improved the ability to explore data and to create a prototype quickly and efficiently. These include token, encoded vector, and bitmap technologies.

數據挖掘或統計倉庫。數據挖掘或統計倉庫是一個特殊的數據集市,研究人員和分析師用於發掘數據與事件已知的與未知的關係,這些關係出乎預料。這是一個安全的港口,讓人們執行查詢,使用挖掘與統計算法,不用擔心破壞生產數據倉庫或者得到有偏見的數據(在多維設計裏,只構建了已知的、已被證明的關係。)

訂製的分析應用。這種新的數據集市允許廉價的、有效的定製普通的應用,這些“預製”的應用適合每個公司大多數需要,同時也能定製滿足剩下的特殊的功能。這需要考慮靈活性與快速響應,以滿足多樣化與定製化。

Data-mining or statistical warehouse. The data-mining or statistical warehouse is a specialized data mart designed to give researchers and analysts the ability to delve into the known and unknown relationships of data and events without having preconceived notions of those relationships. It is a safe haven for people to perform queries and apply mining and statistical algorithms to data, without having to worry about disabling the production data warehouse or receiving biased data such as that contained in multidimensional designs (in which only known, documented relationships are constructed).

Customizable analytical applications. These new additions permit inexpensive and effective customization of generic applications. These “canned” applications meet a high percentage of every company’s generic needs yet can be customized for the remaining specific functionality. They require that you think in terms of variety and customization through flexibility and quick responsiveness.

 

支持的BI技術

在現實中,數據集市數據庫的結構各不相同,從規範化、到反規範化、到事務平文件。理想的情況是在需求建立之後,然而,數據庫的結構/解決方案往往是在知道具體的業務需要前選擇的。我們那些數據倉庫顧問看到,開發團隊在沒有做業務分析前就爲使用星型模式還是規範化設計爭論不休。因爲某種理由,系統架構師和數據建模員總傾向於某種具體的設計技術——也許是熟悉某項技術,或者忽視了另一項技術——並且強制所有數據集市都使用這種設計,就像使慣了錘子的人,看什麼都像釘子。

Types of BI Technologies Supported

The reality is that database structures for data marts vary across a spectrum from normalized to denormalized to flat files of transactions. The ideal situation 1is after the requirements are established. Unfortunately, the database structure/solution is often selected before the specific

business needs are known. Those of us in the data warehouse consulting business have witnessed development teams debating star versus normalized designs before even starting business analysis. For whatever reason, architects and data modelers latch onto a particular design technique—perhaps through comfort with a particular technique or ignorance of other techniques—and force all data marts to have that one type of design. This is similar to the person who is an expert with a hammer—everything he or she sees resembles a nail.

我們推薦數據集市設計者,應該基於數據的使用和信息的類型來選擇模式。當然,沒有絕對,但我們覺得支持所有類型的數據集市的最好設計應該不要預建立或預確定數據關係。這裏有一點重要的告誡:爲數據集市提供數據的數據倉庫,必須支持所有形式的分析,而不只是多維形式。

Our recommendation for data mart designs is that the schemas should be based on the usage of the data and the type of information requested. There are no absolutes, of course, but we feel that the best design to support all the types of data marts will be one that does not preestablish or predetermine the data relationships. An important caveat here is that the data warehouse that feeds the marts will be required to support any and all forms of analysis—not just multidimensional forms.

爲了決定最好的數據庫和後續數據集市設計以滿足商業需求,我們推薦開發一個簡單的矩陣,按數據揮發性、數據庫類型劃分,如圖1.4。這個矩陣給設計者、架構師、數據庫管理員看到需求與物理數據庫之間的關係,就是說,揮發性、等待時間、多主題域等,與分析工具一起提供信息(通過開發的場景),如重複交付、即興報表、產品報表、算法分析等o determine the best database design for your business requirements and ensuing data mart, we recommend that you develop a simple matrix that plots the volatility of the data against a type of database design required, similar to the one in Figure 1.4. Such a matrix allows designers, architects, and database administrators (DBAs) to view where the overall requirements lie in terms of the physical database drivers, that is, volatility, latency, multiple subject areas, and so on, and the analytical vehicle that will supply the information (via the scenarios that were developed), for example, repetitive delivery, ad hoc reports, production reports, algorithmic analysis, and so on.

可維護的數據倉庫環境的特徵

有了這些背景知識,我們討論一個堅固的、可維護的數據倉庫數據模型是什麼樣子呢?在設計任何數據倉庫時,不管是爲一個剛開始使用BI的公司,還是爲一個已經有了複雜的技術和用戶的公司,不管這個公司目前只使用一種BI工具還是已經有太多的BI工具使用,我們要考慮那些特徵?

Characteristics of a Maintainable Data Warehouse Environment

With this as a background, what does a solid, maintainable data warehouse data model look like? What are the characteristics that should be considered when designing any data warehouse, whether for a company just beginning its BI initiative or for a company having a sophisticated set of technologies and users, whether the company has only one BI access tool today or has a plethora of BI technologies available?

創建BI 環境的方法論具有迭代性,很幸運的,我們擁有很多優秀的書籍專門討論這種方法論(參見書末的“推薦閱讀”章節)。簡言之,包括以下步驟:

1、  首先,選擇及文檔化需要使用BI技術(某種數據集市)解決的商業問題。

2、  儘可能多的收集需求,這些需求會在下一步進行提煉。

3、  決定終端用戶使用的技術,支持何種解決方法(OLAP,數據挖掘、探索、分析應用等等)。

4、  創建數據集市原型,和企業用戶一起檢驗原型的功能,如果有必要,重新設計。

5、  分發數據倉庫數據模型與商業數據模型,基於用戶的需求。

6、  把數據集市的需求映射到數據倉庫,並最終返回到操作型系統本身。

7、  編寫代碼執行ETL和數據交付過程 ,一定要包含錯誤檢測與糾正及審計過程。

8、  測試數據倉庫與數據集市創建過程,衡量數據質量參數,爲環境創建合適的元數據。

9、  可以接受之後,把數據倉庫與數據集市的首次迭代遷移到生產系統,培訓企業的業務團隊,開始計劃下一次迭代。

The methodology for building a BI environment is iterative in nature. We are fortunate today to have many excellent books devoted to describing this methodology. (See the “Recommended Reading” section at the end of this book.) In a nutshell, here are the steps:

1. First, select and document the business problem to be solved with a business intelligence capability (data mart of some sort).

2. Gather as many of the requirements as you can. These will be further refined in the next step.

3. Determine the appropriate end-user technology to support the solution (OLAP, mining, exploration, analytical application, and so on).

4. Build a prototype of the data mart to test its functionality with the business users, redesigning it as necessary.

5. Develop the data warehouse data model, based on the user requirements and the business data model.

6. Map the data mart requirements to the data warehouse data model and ultimately back to the operational systems, themselves.

7. Generate the code to perform the ETL and data delivery processes. Be sure to include error detection and correction and audit trail procedures in these processes.

8. Test the data warehouse and data mart creation processes. Measure the data quality parameters and create the appropriate meta data for the environment.

9. Upon acceptance, move the first iteration of the data warehouse and the data mart into production, train the rest of the business community, and start planning for the next iteration.

警告

迄今爲止,我們建議你在建第一個分析容器(數據集市)前,建一個整體的數據倉庫,包含企業所有的決策數據。每一個後續由其他數據集市解決的商業問題會給數據倉庫增加數據,並作爲基礎。最後,必須加入到數據倉庫來支持新的數據集市的數據越來越少,甚至可以忽略,因爲大多數數據已經存在於數據倉庫內。

WARNING

Nowhere do we recommend that you build an entire data warehouse containing all the strategic enterprise data you will ever need before building the first analytical capability (data mart). Each successive business problem solved by another data mart implementation will add the growing set of data serving as the foundation in your data warehouse. Eventually, the amount of data that must be added to the data warehouse to support a new data mart will be negligible because most of it will already be present in the data warehouse.

既然你不知道最終加入到數據倉庫的數據量會有多大,也不知道最後會有多少BI 技術用於企業解決戰略問題,因此你必須學會假設並依此作出計劃。你可以假設數據倉庫會成爲企業內最大的數據庫。數據倉庫容量開始只有幾GB,然後迅速擴張到幾百GB,甚至TB,而現在預計爲PB的情況並不少見。因此,不管現在是在BI生命週期的那個階段,不管是剛剛開始,還是環境已經搭好了幾年,關係數據庫仍然是數據庫管理系統的最好選擇,它們有利於減少冗餘、提高數據庫設計效率,同時,在數據倉庫部署時,你可以使用關係型DBMS所有複雜的、有用的特性:

Since you will not know how large the data warehouse will ultimately be, nor do you know all of the BI technologies that will eventually be brought to bear upon strategic problems in your enterprise, you must make some educated assumptions and plan accordingly. You can assume that the warehouse will become one of the largest databases found in your enterprise. It is not unusual for the data warehouse size to start out in the low gigabyte range and then grow fairly rapidly to hundreds of gigabytes, terabytes, and some now predict pedabytes. So, regardless of where you are in your BI life cycle—just starting or several years into building the environment—the relational databases are still the best choice for your database management system (DBMS). They have the advantage of being very conducive to nonredundant, efficient database design. In addition, their deployment for the data warehouse means you can use all the sophisticated and useful characteristics of a relational DBMS:

■■通過工具訪問數據(數據建模、ETL、元數據、BI訪問),所有這些都使用關係型數據庫的SQL

■■數據庫尺寸可量測性。關係型數據庫仍然在存貯大量數據方面有優勢。

■■並行處理極大提高數據處理效率。關係數據庫在並行處理方面非常優秀。

■■大量工具,如批量裝載、碎片整理、容量重排、性能監控、備份與恢復、高效索引等。■■■關係型數據庫是支持戰略數據的理想選擇。也許有一天多爲數據庫(MOLAP)能夠在這方面與關係數據庫對抗,但目前還不行。

■■ Access to the data by most any tool (data modeling, ETL, meta data, and BI access). All use SQL on the relational database.

■■ Scalability in terms of the size of data being stored. The relational databases are still superior in terms of storing massive amounts of data.

■■ Parallelism for efficient and extremely fast processing of data. The relational databases excel at this function.

■■ Utilities such as bulk loaders, defragmentation, and reorganization capabilities, performance monitors, backup and recovery functions, and index wizards. Again, the relational databases are ideal for supporting a repository of strategic data. There may come a time when the proprietary multidimensional databases (MOLAP) can effectively compete with their relational cousins, but that is not the situation currently.

數據倉庫數據模型

我們曾經推薦使用關係型DBMS用於數據倉庫建設,那麼,在那種結構下,數據模型會有何特徵?在繼續討論模型的特徵之前,讓我們再一次看看一些假設:

■■假設數據倉庫的核心是面向企業的。這意味着數據倉庫裏的數據不能偏向一個部門或者企業的一部份,而忽視另一部份。因此,最終BI的能力需要進一步處理(如使用數據集市)用於爲某個特定團隊“定製”,但是,最開始的材料(數據)能被所有人使用。

■■作爲上面假設的必然推論,數據倉庫裏的數據不能違反企業建立的商業規則。數據倉庫的數據模型從形式到文檔必須顯示支持這些基本規則。

■■數據倉庫必須儘快、儘可能高效的裝入新數據,批窗口,也許他們曾經存在,現在正變得越來小。大量數據裝入數據倉庫必須發生在ETL過程,只允許用最少的時間裝入數據。

■■數據倉庫必須從一開始就支持多種BI技術,甚至在第一個數據集市項目還沒有建立的時候。數據倉庫建設若偏向於某種技術,如多維分析,會大大消除其他需要的能力,如數據挖掘與統計分析。

■■數據倉庫必須優雅的接納數據與數據結構的變化。記住,我們一開始並不知道所有的需求,也不知道所有戰略數據的用途,我們假設,從一開始建立數據基礎開始,就可能發生改變。在心裏記住這些假設,讓我們看看理想的數據倉庫的數據模型。

The Data Warehouse Data Model

Given that we recommend a relational DBMS for your data warehouse, what should the characteristics of the data model for that structure look like? Again, let’s look at some assumptions before going into the characteristics of the model:

■■ The data warehouse is assumed to have an enterprise focus at its heart. This means that the data contained in it does not have a bias toward one department or one part of the enterprise over another. Therefore, the ultimate BI capabilities may require further processing (for example, the use of a data mart) to “customize” them for a specific group, but the starting material (data) can be used by all.

■■ As a corollary to the above assumption, it is assumed that the data within data warehouse does not violate any business rules established by the enterprise. The data model for the data warehouse must demonstrate adherence to these underlying rules through its form and documentation.

■■ The data warehouse must be loaded with new data as quickly and efficiently as possible. Batch windows, if they exist at all, are becoming smaller and smaller. The bulk of the work to get data into a data warehouse must occur in the ETL process, leaving minimal time to load the data.

■■ The data warehouse must be set up from the beginning to support multiple BI technologies—even if they are not known at the time of the first data mart project. Biasing the data warehouse toward one technology, such as multidimensional analyses, effectively eliminates the ability to satisfy other needs such as mining and statistical analyses.

■■The data warehouse must gracefully accommodate change in its data and data structures. Given that we do not have all of the requirements or known uses of the strategic data in the warehouse from the very beginning, we can be assured that changes will happen as we build onto the existing data warehouse foundation. With these assumptions in mind, let’s look at the characteristics of the ideal data warehouse data model.

 

非冗餘

大多數數據倉庫必須提供最小的裝入週期和大量的數據,因此,數據模型必須包含最小冗餘。冗餘給裝入工具增加很大的負擔,也給設計者帶來負擔,他必須擔心所有冗餘的數據元素和實體在正確的時間得到正確的數據。你的數據模型引入到數據倉庫的冗餘越多,你最終“取數據入”的過程越複雜。這並意味着在數據倉庫裏不允許有冗餘,在第4章,我們會介紹爲什麼及什麼時候引入冗餘,關鍵是深謀遠慮的控制和管理冗餘。

Nonredundant

To accommodate the limited load cycles and the massive amount of data that most data warehouses must have, the data model for the data warehouse should contain a minimum amount of redundancy. Redundancy adds a tremendous burden to the load utilities and to the designers who must worry about ensuring that all redundant data elements and entities get the correct data at the correct time. The more redundancy you introduce to your data warehouse data model, the more complex you make the ultimate process of “getting data in.” This does not mean that redundancy is not ever found in the data warehouse. In Chapter 4, we describe when and why some redundancy is introduced into the data warehouse. The key though thought 筆誤?) is that redundancy is controlled and managed with forethought.

 

穩定

前面我們已經提到,我們以迭代方始構建數據倉庫,這樣能很快創建數據集市,但是也存在風險,可能遺失或者錯誤的表述一些重要的商業規則或數據元素。隨着越來越多數據集市上線,這會變得更加堅決和突出。不可避免的,數據倉庫和它的模型會發生變化。衆所周知,在一個企業變化最多的是它的流程、應有和技術。如果我們依賴這三個因素創建一個模型,當這三者之一發生變化,我們必定要做大修。因此,作爲設計者,我們必須使用數據建模技術,儘可能減輕這些問題,同時抓住企業所有重要的業務規則。減輕這些問題的最好的數據建模技術是創建一個不依賴於流程、應用、技術的數據模型。另一方面,既然變化不可避免,我們必須做好準備,當新的BI 能力和數據集市創建時,容納新的實體和屬性。數據倉庫設計者必須再次使用建模技術,以合併新的變化,而不需要重新設計已經存在的元素和實體的實現。這個模型叫做系統模型,我們會在第3章進行更詳細的描述。

 

Stable

As mentioned earlier, we build the data warehouse in an iterative fashion, which has the benefit of getting a data mart created quickly but runs the risk of missing or misstating significant business rules or data elements. These would be determined or highlighted as more and more data marts came online. It is inevitable that change will happen to the data warehouse and its data model. It is well known that what changes most often in any enterprise are its processes, applications, and technology. If we create a data model dependent upon any of these three factors, we can be assured of a major overhaul when one of the three changes. Therefore, as designers, we must use a data-modeling technique that mitigates this problem as much as possible yet captures the all important business rules of the enterprise. The best data-modeling technique for this mitigation is to create a process-, application-, and technology-independent data model. On the other hand, since change is inevitable, we must be prepared to accommodate newly discovered entities or attributes as new BI capabilities and data marts are created. Again, the designer of the data warehouse must use a modeling technique that can easily incorporate a new change without someone’s having to redesign the existing elements and entities already implemented. This model is called a system model, and will be described in Chapter 3 in more detail.

一致

也許數據倉庫數據模型最重要的特徵是一致性,也就是最重要的資產——數據帶給企業的一致性。包含所有元數據(定義、物理屬性、別名、商業規則、數據所有者和管家、角色等等)的數據模型,對於企業用戶最終理解他們分析的數據非常重要。數據模型創建過程必須協調待發行、數據差異與衝突問題,在任何ETL過程或數據映射之前。

Consistent

Perhaps the most essential characteristic of any data warehouse data model is the consistency it brings to the business for its most important asset—its data. The data models contain all the meta data (definitions, physical characteristics, aliases, business rules, data owners and stewards, domains, roles, and so on) that is critically important to the ultimate understanding of the business users of what they are analyzing. The data model creation process must reconcile outstanding issues, data discrepancies, and conflicts before any ETL processing or data mapping can occur.

 

數據使用方面的穩定

數據倉庫唯一重要的目的就是作爲一個堅實的、可靠的、一致的數據基礎,支持所有BI 應用。從現在開始應該清楚,不管你的第一個BI能力如何,你必須爲所有的業務需求服務,不管他們使用何種技術。因此,數據倉庫的數據模型必須保持應有和技術獨立性,這樣,才能實現支持任何應用和技術的理想。

另一方面,數據模型必須支撐企業建立的業務規則,這意味着數據模型不只是簡單的平文件。平文件,雖然是創建星型模式、數據挖掘、數據探索子集的基礎,但是不能加強,甚至證明任何已知的商業規則。作爲設計者,你必須往前走一步,使用真正的特定企業規則、域、基數性、可選性等,建立一個真正的數據模型。不然,數據的後續使用可能不能掌握,違反業務規則的情況也會發生。

 

Flexible in Terms of the Ultimate Data Usage

The single most important purpose for the data warehouse is to serve as a solid, reliable, consistent foundation of data for any and all BI capabilities. It should be clear by now that, regardless of what your first BI capability is, you must be able to serve all business requirements regardless of their technologies. Therefore, the data warehouse data model must remain application and technology independent, thus making it ideal to support any application or technology.

On the other hand, the model must uphold the business rules established for the organization, and that means that the data model must be more than simply flat files. Flat files, while a useful base to create star schemas, data mining, and exploration subsets of data, do not enforce, or even document, any known business rules. As the designer, you must go one step further and create a real data model with the real business rules, domains, cardinalities, and optionalities specified. Otherwise, subsequent usage of the data could be mishandled, and violations in business rules could occur.

 

Codd Date 前提

考慮上面我們提到的一個好的數據倉庫模型的特徵,我們提出,你能夠使用的好的數據建模技術是基於原來關係數據庫設計的——由Chris Date Ted Codd開發的實體關係圖(ERD),有簡單明瞭的構造規則,是一個已經證明了的、可靠的數據建模方法。規範化規則(將在第3章討論)產生一個穩定的、一致的數據模型,支撐企業建立的政策與規則,同時,爲後來的數據集市怎樣分析數據帶來極大的靈活性。在數據存貯和裝入方面,這樣的數據庫是最有效的。然而,它不是完美的,我們會在下一節看到。

當我們的確感覺這種方法是非常典雅的,更重要的是,這種數據建模技術支撐所有我們提出的數據倉庫環境的所有特徵,這個數據倉庫是可持續的、靈活的、可維護的、好理解的。這種數據倉庫的數據模型是可以使用任意技術翻譯到數據庫的設計。如下所示:

The Codd and Date Premise

Given all of the above characteristics of a good data warehouse data model, we submit that the best data-modeling technique you can use is one based on the original relational database design—the entity-relationship diagram (ERD) developed by Chris Date and Ted Codd. The ERD is a proven and reliable data-modeling approach with straightforward rules of construction. The normalization rules discussed in Chapter 3 yield a stable, consistent data model that upholds the policies and rules of engagement established by the enterprise, while lending a tremendous amount of flexibility in how the data is later analyzed by the data marts. The resulting database is the most efficient in terms of storage and data loading as well. It is, however, not perfect, as we will see in the next section.

While we certainly feel that this approach is elegant in the extreme, more importantly, this data-modeling technique upholds all of the features and characteristics we specified for a sustainable, flexible, maintainable, and understandable data warehouse environment. The resultant data model for your data warehouse is translatable, using any technology, into a database design that is:

業務之間可靠。這種方式下,數據元素或實體的命名、關係、註解沒有衝突。

企業內共享。從這種數據模型實現的數據倉庫能夠被多種數據交付過程及企業內任何地方的用戶訪問。

支持多種靈活的數據集市。這種數據庫不會偏向BI環境的某個方向或另一個方向,所有的技術對你和你的公司都是可以選擇的。

業務之間的正確性。數據倉庫的數據模型能提供關於企業使用信息的方式的準確的、忠誠的表示。

變化時可調整。這樣的數據庫能夠容納新的數據元素和實體,同時保持與已有數據保持完整性。

Reliable across the business. It contains no contradictions in the way that data elements or entities are named, related to each other, or documented.

Sharable across the enterprise. The data warehouse resulting from the implementation of this data model can be accessed by multiple data delivery processes and users from anywhere in the enterprise

Flexible in the types of data marts it supports. The resulting database will not bias your BI environment in one direction or another. All technological opportunities will still be available to you and your enterprise.

Correct across the business. The data warehouse data model will provide an accurate and faithful representation of the way information is used in the business.

Adaptable to changes. The resulting database will be able to accommodate new elements and entities, while maintaining the integrity of the implemented ones.

對創建數據集市的影響

現在我們已經描述了一個堅實的數據倉庫數據模型的特徵,並且已經推薦ERD或者叫規範化的方法,讓我們來看看關於全面的BI環境的分支。

使用數據倉庫最普遍的應用是多維分析,至少目前如此。在星型模式下,維度用於粗糙的關聯到主題域模型——訂單、顧客、產品、市場分區——時間也是一樣。“今年1月到6月我們在西北區,某種商品有多少訂單?”,如果我們使用數據倉庫作爲這個查詢的數據來源,回答這一類問題需要花些功夫。它需要在好幾個大表之間進行連接(訂單、訂單明細、產品、市場分區,SQL語句使用嚴格的時間片)。這種情形不可愛,也不受歡迎,對一般的企業用戶,他們不熟悉SQL。因此,我們可以看到這樣的情形,數據倉庫的訪問受到限制,僅僅那些非常精於數據設計和SQL的企業用戶才能使用。如果一個企業有好的探索與挖掘技術,可能會去除所有對數據倉庫的訪問,而需要所有企業用戶訪問OLAP集市,或者探索集市,或者數據挖掘倉庫。

Impact on Data Mart Creation

Now that we have described the characteristics of a solid data warehouse data model and have recommended an ERD or normalized (in the sense of Date and Codd) approach, let’s look at the ramifications that decision will have on our overall BI environment.

The most common applications that use the data warehouse data are multidimensional ones—at least today. The dimensions used in the star schemas correlate roughly to the subject areas developed in the subject area model—order, customer, product, market segment—and time. To answer the questions, “How many orders for what products did we get in the Northeast section from January to June this year?” would take a significant amount of effort if we were to use the data warehouse as the source of data for that query. It would require a rather large join across several big entities (Order, Order Line Item, Product, Market Segment, with the restriction of the timeframe in the SQL statement). This is not a pretty or particularly welcomed situation for the average business user who is distantly familiar with SQL. So, what we can see about this situation is that data warehouse access will have to be restricted and used by only those business users who are very sophisticated in database design and SQL. If an enterprise has good exploration and mining technology, it may choose to cut off all access to the data warehouse, thus requiring all business users to access an OLAP mart, or exploration or data mining warehouse instead.

這是問題嗎?不真正是。所有BI環境必須有一個某種類型的“後備房間”能力。在這個後備房間裏,我們執行一些困難的任務,如集成、數據健康、錯誤檢測與糾正、轉換、審計與控制機制等,總之,用於保證決策數據的質量。因此,所有BI環境有這種“向公衆封鎖”的部分。我們已經簡單把它帶到了第一步,並且說這一部分應該正式建模、創建於維護。

Is this a problem? Not really. All BI environments must have “back room” capabilities of one sort or another. It is in the back room that we perform the difficult tasks of integration, data hygiene, error correction and detection, transformation, and the audit and control mechanisms to ensure the quality of the strategic data anyway. Therefore, all BI environments have this “closed off to the public” section of their environment. We have simply taken it one step further and said that this section should be formally modeled, created, and maintained.

在只有數據集市的世界,數據交付過程(前面有描述),不只承擔保證在正確的時間把合適的數據交付給正確的集市的任務,而且必須承擔整個ETL任務,一再完成數據獲取過程。如果他們不再需要擔心從一致的、質量可靠的數據源(數據倉庫)獲取數據,數據交付過程可以極大地簡化,只要轉換成數據集市技術需要的格式(星形模式、平文件、規範化子集等),並裝入數據集市就可。構建一個來自一個堅實的、基於ERD的數據模型的數據倉庫的另一個好處是:你得到很多可重用的實體和數據元素。在只有數據集市的環境,每個集市必須在它的數據庫內保存所有需要的明細數據。除非兩個集市共享一致的維度,否則集成兩個集市的數據會很困難,甚至不可能。想象一下,如果有一個存在所有明細數據的倉庫,那麼,如果需要,任何時候,數據交付過程能夠提取數據,並且使用BI訪問工具訪問,而不需要一遍一遍的複製數據。這是數據倉庫帶給你的BI環境的另一個顯著的好處。

In the data-mart-only world, the data delivery processes, described earlier, must take on not only the burden of ensuring the proper delivery of  the information to the right mart at the right time but must also take on the entire set of ETL tasks found in the data acquisition processing over and over again. Given this situation, it should be obvious that the data delivery processes can be simplified greatly if all they have to worry about is extracting the data they specifically need from a consistent, quality source (the data warehouse), format it into that required by the data mart technology (star schema, flat file, normalized subset, and so on), and deliver the data to the data mart environment for uploading. As another benefit to constructing the data warehouse from a solid, ERD-based data model, you get a very nice set of reusable data entities and elements. In a data-mart-only environment, each mart must carry all the detailed data it requires within its database. Unless the two data marts share common conformed dimensions, integrating the two may be difficult, or even impossible. Imagine if a repository of detailed data existed that the data delivery processes could extract from and the BI access tools could access, if they needed to, at any time without having to replicate the data over and over! That is another significant benefit the data warehouse brings to your BI environment.

總結

有很多BI方法學和顧問告訴你,說不需要建立一個數據倉庫,而是所有數據集市合併在一起形成一個“數據倉庫”,至少是一個虛擬倉庫;或者說,所有企業真正想要的是一個孤立的數據集市。我們發現,所有這些方法嚴重缺乏可持續性和成長性。這本書提出了一個“最佳實踐”方法,用於創建數據倉庫。我們使用的最佳實踐方法是給設計者一套建議,哪些活動應該採取,那些活動應該避免,這樣,讓他們的努力能夠獲取最大的成功。

Summary

There are several BI methodologies and consultants who will tell you that you do not need a data warehouse, that the combination of all the data marts together creates the “data warehouse,” or at least a virtual one, or that really, all the business really wants is just a standalone data mart. We find all of these approaches to be seriously lacking in sustainability and sophistication. This book takes a “best practices” approach to creating a data warehouse. The best practices we use are a set of recommendations that tells designers what actions they should take or avoid, thus maximizing the success of their overall efforts. 

這些建議基於多個在這個領域多年的經驗,參與了多個數據庫項目,觀察了很多成功的、可維護的數據倉庫環境。顯然,沒有一個方法是完美的,沒有一個方法可以不考慮具體情況而盲目追從。你應該懂得,什麼方法在你的環境下工作得好,然後使用你認爲適用的規則,當發生變化或者有新情況出現時,改變這些規則。除了這些告誡,本書充滿了有用的、有價值的信息、指導和提示。在接下來的章節,我們會更詳細的描述數據模型,一步一步構建數據模型,並且討論模型部署及你可能會碰到的問題。在本書的最後,你應該有資格開始構建你的BI環境,擁有針對數據倉庫最好的設計技術。

These recommendations are based on the years of experience in the field, participation in many data warehouse projects, and the observation of many successful and maintainable data warehouse environments. Clearly, no one method is perfect, nor should one be followed blindly without thought being given to the specific situation. You should understand what works best in your environment and then apply these rules as you see fit, altering them as changes and new situations arise. In spite of this caveat, this book is filled with useful and valuable information, guidelines, and hints.

In the following chapters, we will describe the data models needed in more detail, go over the construction of the data warehouse data model step by step, and discuss deployment issues and problems you may encounter along the way to creating a sustainable and maintainable business intelligence environment. By the end of the book, you should be fully qualified to begin constructing your BI environment armed with the best design techniques possible for your data warehouse.

Introduction

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章