HPCC初探

HPCC(High Performance Computing Cluster,高性能计算集群)是开源的大规模并行处理计算平台,主要用于解决Big Data问题。

HPCC集成了 Thor (the Data Refinery Cluster)集群与Roxie(the Query Cluster) 集群作为其中间件,包括外部通信层(以客户端接口提供终端服务和系统管理工具)与辅助组件(支持监控和从外部数据源加载存储文件系统数据)。

The HPCC Systems architecture incorporates the Thor and Roxie clusters as well as common middleware components, an external communications layer, client interfaces which provide both end-user services and system management tools, and auxiliary components to support monitoring and to facilitate loading and storing of filesystem data from external sources. An HPCC environment can include only Thor clusters, or both Thor and Roxie clusters. Each of these cluster types is described in more detail in the following sections below the architecture diagram.

High-Level HPCC Architecture



该图从高层描述了平台架构与组件协作原理。
The diagram above illustrates a high level overview of the platform architecture and how the components all work together as a powerful solution for managing Big Data. A brief description on each component is detailed below.
Thor负责读取大规模数据,然后转换,连接以及对数据进行检索。Thor的功能类似分布式文件系统,具有多个节点并行处理能力,支持可伸缩。


Thor (the Data Refinery Cluster) is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. A cluster can scale from a single node to thousands of nodes.


Single-threaded
Distributed parallel processing
Distributed file system
Powerful parallel processing programming language (ECL)
Optimized for Extraction, Transformation, Loading, Sorting, Indexing and Linking
Scales from 1-1000s of nodes
Roxie提供了独立的高性能在线查询处理与数据仓库能力。


Roxie (the Query Cluster) provides separate high-performance online query processing and data warehouse capabilities.


Multi-threaded
Distributed parallel processing
Distributed file system
Powerful parallel processing programming language (ECL)
Optimized for concurrent query processing
Scales from 1-1000s of nodes
ECL IDE是集成开发工具,具有编码,调试与监控ECL程序的功能。


ECL IDE is a modern IDE used to code, debug and monitor ECL programs.


Access to shared source code repositories
Complete development, debugging and testing environment for developing ECL dataflow programs
Access to the ECLWatch tool is built-in, allowing developers to watch job graphs as they are executing
Access to current and historical job workunits
ESP,企业服务平台提供基于XML,HTTP, SOAP and REST的ECL访问接口
ESP (Enterprise Services Platform) provides an easy to use interface to access ECL queries using XML, HTTP, SOAP and REST.


Standards-based interface to access ECL functions
Supports SOAP, XML, HTTP and REST
Supports SAML and various security standards


发布了42 篇原创文章 · 获赞 9 · 访问量 12万+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章