Apache Hadoop Cloudera管理员培训课程
培训大纲:
1. Hadoop and HDFS
- Why Hadoop?
- HDFS
- MapReduce
- Hive, Pig, HBase, and Other Ecosystem Projects
2. Planning Your Hadoop Cluster
- General Planning Considerations
- Choosing the Right Hardware
- Node Topologies
- Choosing the Right Software
3. Deploying Your Cluster
- Installing Hadoop
- Using SCM Express for Easy Installation
- Typical Configuration Parameters
- Configuring Rack Awareness
- Using Configuration Management Tools
4. Managing and Scheduling Jobs
- Starting and Stopping MapReduce Jobs
- FIFO Scheduler
- Fair Scheduler
5. Cluster Maintenance
- Checking HDFS with Fsck
- Copying Data with Distcp
- Rebalancing Cluster Nodes
- Adding and Removing Cluster Nodes
- Backup and Restore
- Upgrading and Migrating
- NameNode Metadata
6. Cluster Monitoring, Troubleshooting, and Optimizing
- Hadoop Log Files
- Using the NameNode and JobTracker Web UIs
- Interpreting Job Logs
- Monitoring with Ganglia
- Other Monitoring Tools
- General Optimization Tips
- Benchmarking Your Cluster
7. Populating HDFS from External Sources
- Using Sqoop
- Using Flume
- Best Practices for Data Ingestion
8. Installing and Managing Other Hadoop Projects