培訓大綱:
- Motivation for Hadoop
Problems with Traditional Large-Scale Systems
Requirements for a New Approach - Hadoop: Basic Concepts
Hadoop Distributed File System (HDFS)
MapReduce
Anatomy of a Hadoop Cluster
Other Hadoop Ecosystem Components - Writing a MapReduce Program
MapReduce Flow
Examining a Sample MapReduce Program
Basic MapReduce API Concepts
Driver Code
Mapper
Reducer
Streaming API
Using Eclipse for Rapid Development
New MapReduce API - Integrating Hadoop into the Workflow
Relational Database Management Systems
Storage Systems
Importing Data from a Relational Database Management System with Sqoop
Importing Real-Time Data with Flume
Accessing HDFS Using FuseDFS and Hoop - Delving Deeper into the Hadoop API
ToolRunner
Testing with MRUnit
Reducing Intermediate Data with Combiners
Configuration and Close Methods for Map/Reduce Setup and Teardown
Writing Partitioners for Better Load Balancing
Directly Accessing HDFS
Using the Distributed Cache - Common MapReduce Algorithms
Sorting and Searching
Indexing
Machine Learning with Mahout
Term Frequency
Inverse Document Frequency
Word Co-Occurrence - Using Hive and Pig
Hive Basics
Pig Basics - Practical Development Tips and Techniques
Debugging MapReduce Code
Using LocalJobRunner Mode for Easier Debugging
Retrieving Job Information with Counters
Logging
Splittable File Formats
Determining the Optimal Number of Reducers
Map-Only MapReduce Jobs - Advanced MapReduce Programming
Custom Writables and WritableComparables
Saving Binary Data Using SequenceFiles and Avro Files
Creating InputFormats and OutputFormats - Joining Data Sets in MapReduce
Map-Side Joins
Secondary Sort
Reduce-Side Joins - Graph Manipulation in Hadoop
Graph Techniques
Representing Graphs in Hadoop
Implementing a Sample Algorithm: Single Source Shortest Path - Creating Workflows with Oozie
Motivation for Oozie
Workflow Definition Format