2014年4月25日星期五

Chapter 4 Managing a Hadoop Cluster

Managing the HDFS Cluster

Use the following steps to check the status of an HDFS cluster with hadoop fsck:
  1. Check the status of the root filesystem with the following command:
    hadoop fsck /
  2. Check the status of all the files on HDFS with the following command:
    hadoop fsck / -files
  3. Check the locations of file blocks with the following command:
    hadoop fsck / -files -locations
  4. Check the locations of file blocks containing rack information with the following command:
    hadoop fsck / -files -blocks -racks
  5. Delete corrupted files with the following command:
    hadoop fsck -delete
  6. Move corrupted files to /lost+found with the following command:
    hadoop fsck -move
Use the following steps to check the status of and HDFS cluster with hadop dfsadmin:
  1. Report the status of each slvae node with the following command:
    hadoop dfsadmin -report
  2. Refresh all the DataNodes using the following command:
    hadoop dfsadmin -refreshNodes
  3. Check the status of the safenode using the following command:
    hadoop dfsadmin -safemode get
  4. Manually put the NameNode into safemode using the following command:
    hadoop dfsadmin -safemode enter
  5. Make the NameNode to leave safemode using the following command:
    hadoop dfsadmin -safemode leave
  6. Wait until NameNode leaves safemode using the following command:
    hadoop dfsadmin -safemode wait
    This command is useful when we want to wait until HDFS finishes data block replication or wait until a newly commissioned DataNode to be ready for servcie.
  7. Save the metadata of HDFS filesystem with the following command:
    hadoop dfsadmin -metadata meta.log
    The meta.log file will be created under the directory $HADOOP_HOME/logs .

The HDFS filesystem will be write proteced when NameNode enters safe mode. When an HDFS cluster is started, it will enter safe mode first. The NameNode will check the replication factor for each data block. If the replica count of a data block is smaller than the configured value, which is 3 by default, the data block will be marked as under-replicated. Finally, an under-replication factor, which is the percentage of under-replicated data blocks, will be calculated. If the percentage number is larger than the threshold value, the NameNode will stay in safe mode until enough new replicas are created for the under-replicated data blocks so as to make the under-relication factor lower than the threshold.

Configuring SecondaryNameNode

Hadoop NameNode is a simple point of failure. By configuring SeconaryNameNode, the filesystem image and edit log can be backed up periodically. And in case of NameNoade failure, the backup files can be used to recover the NameNode.
Configure SecondaryNameNode:
  1. stop-all.sh
  2. Add or change the following into the fiel $HADOOP_HOEM/conf/hdfs-site.xml
    
    fs.checkpoint.dir
    /hadoop/dfs/namesecondary
    
    
    If this property is not set explicitly, the default checkpoint directory will be ${hadoop.tmp.dir}/dfs/namesecondary
  3. start-all.sh

To increase redundancy, we can configure NameNode to write filesystem metatdata on multiple locations. For example, we can add an NFS shared directory for backup by changing the following property in the file $HADOOP_HOME/conf/hdfs-site.xml:


    dfs.name.dir
    /hadoop/dfs/name,/nfs/name

Managing the MapReduce cluster

  1. List all the active TaskTrackers:
    hadoop -job -list-active-trackers
  2. check the status of the JobTracker safe mode:
    hadoop mradmin -safemode get
    The output tells us that the JobTracker is not in safe mode. We can submit jobs to the cluster.
  3. Manually let the JobTracker enter safe mode:
    hadoop mradmin -safemode enter
  4. Let the JobTracker leave safe mode:
    hadoop mradmin -safemode leave
  5. Wait for safe mode to exit:
    hadoop mradmin -safemode wait
  6. Reload the MapReduce queue configuration:
    hadoop mradmin -refreshQueues
  7. Reload active TaskTrackers:
    hadoop mradmin -refreshNodes
The meaning of the hadoop mradmin options is listed in the following table:
Option Description
-refreshServiceAcl Force JobTracker to reload ACL.
-refreshQueues Force JobTracker to reload queue configurations.
-refreshUserToGroupsMappings Force JobTracker to reload user group mappings.
-refreshSuperUserGroupsConfiguration Force JobTracker to reload super user group mappings
-refreshNodes Force JobTracker to refresh the TaskTracker hosts.

Managing TaskTracker

Hadoop maintains three lists for TaskTrackers: black list, gray list and excluded list.

TaskTracker black listing is a function that can blacklist a TaskTracker if it is in an unstable state or its performance has been downgraded. For example, when the ratio of failed tasks for a specific job has reached a certain threashold, the TaskTracker will be blacklisted for this job. Similarly, Hadoop maintaions a gray list of nodes by identifying potential problems nodes.

Sometimes, excluding certain TaskTackers from the cluster is desirable. For example, when we debug ofr upgrade a slave node, we want to separate this node from the cluster in case it affects the cluster. Hadoop supports the live decommission of a TaskTracker from a running cluster.

List the active trackers with the following command on the master node:

hadoop job -list-active-trackers
Configure the heartbeat interval
  1. stop-mapred.sh
  2. $HADOOP_HOME/conf/mapred-site.xml
    
        mapred.tasktracker.expiry.interval
        
        600000
    
    
  3. Copy the configuration into the slave nodes:
    for host in 'cat $HDOOP_HOME/conf/slaves'; do
        echo 'Copying mapred-site.xml to slave node ' $host
        sudo scp $HADOOP_HOME/conf/mapred-site.xml hduser@$host:$HADOOP_HOME/conf
    
  4. start-mapred.sh
Configur TaskTracker blacklisting
  1. stop-mapred.sh
  2. Set the number of task failure for a job to blacklist a TaskTracker by adding or chaning the following property in the fiel $HADOOP_HOME/conf/mapred-site.xml
    
        mapred.max.tracker.failures
        10
    
    
  3. Set the maximum number of successful jobs that can blacklist a TaskTracker:
    
        mapred.max.tracker.blacklists
        5
    
    
  4. for host in 'cat $HADOOP_HOME/conf/slaves'; do
        echo 'Copying mapred-site.xml to slave node ' $host
        sudo scp $HADOOP_HOME/conf/mapred-site.xml hduser@$host:$HADOOP_HOME/conf
    done
    
  5. start-mapred.sh
  6. List blacklisted TaskTrackers:
    hadoop job -list-blacklisted-trackers
Decommission TaskTrackers
  1. Set the TaskTracker exclude file in $HADOOP_HOEM/conf/mapred-site.xml :
    
        mapred.hosts.exclude
        $HDOOP_HOME/conf/mapred-exclude
        
    
    
  2. Force the JobTracker to reload the TaskTracker list:
    hadoop mradmin -refreshNodes
  3. List all the active TaskTrackers again:
    hadoop job -list-active-trackers

TaskTrackers on slave nodes contact the JobTracker on the master node periodically. The interval between two consecutive contact communications is called a heartbeat. More frequent heartbeat configurations can incur higher loads to the cluster. The value of the heartbeat propert should be based on the size of the cluster.

The JobTracker uses TaskTacker blacklisting to remove those unstable TaskTrackers. If a TaskTracker is blacklsited, all the tasks currently running on the TaskTracker can still finish and the TaskTracker will continue the connection with JobTracker through the heartbeat mechanism. But the TaskTracker will not be scheduled for running future tasks. If a blacklisted TaskTracker is restarted, it will be removed from the blacklist.(The total number of TaskTrackers should not exceed 50 percent of the number of TaskTrackers.)

Decommissioning DataNode

  1. Create the file $HADOOP_HOME/conf/dfs-exclude:
    The dfs-exclude file contains the DataNode hostnames, one per line.
  2. $HDOOP_HOME/conf/hdfs-site.xml:
    
        dfs.hosts.exclude
        $HADOOOP_HOME/conf/dfs-exclude
    
    
  3. hadoop dfsadmin -refreshNodes
  4. hadoop dfsadmin -report

Replacing a Salve Node

  1. Decommission the TaskTacker on the slave node.
  2. Decommission the DataNode on the slave node.
  3. Power off the slae node and replace it with the new hardware.
  4. Install and configure the Linux Operation system on the new node. Installing Java and other tools, and Configuring SSH, Preparing for Hadoop Installation.
  5. Install Hadoop on the new node.
  6. Log in to new node and start the DataNode and TaskTracker(assuming slave2):
    ssh hduser@slave2 -C "hadoop DataNode &"
    ssh hduser@slave3 -C "hadoop TaskTracker &"
  7. Refresh the DataNodes:
    hadoop dfsadmin -refreshNodes
  8. Refresh the TaskTrackers:
    hadoop mradmin -refreshNodes
  9. hadoop dfsadmin -report
  10. hadoop job -list-active-trackers

Managing MapReduce jobs

Check the status of Hadoop jobs
  1. List all the running jobs:
    hadoop job -list
  2. List all the submitted jobs since the start of the cluster:
    hadoop job -list all
  3. Check the status of the default queue:
    hadoop queue -list
    Hadoop manages jobs using queue. By default, there is only one default queue. The output of the command shows that the cluster has only one default queue, which is the running state with no scheduling information.
  4. Check the status of a queue ACL:
    hadoop queue -showacls
  5. Show all the jobs in the default queue:
    hadoop queue -info default -showJobs
  6. Check the status of job:
    hadoop job -status JOB-ID
Chaning the status of job
  1. Set the job JOB-ID be on high priority:
    hadoop job -set-priority JOB-ID HIGH
    Available priorities, in descending order, include : VERY_HIGH, HIGH, NORMAL, LOW AND VERY_LOW.
  2. Kill the job:
    hadoop job -kill JOB-ID
Submit a MapReduce job
  1. Create the job configuration file, job.xml:
    
        
            mapred.input.dir
            randtext
        
    
        
            mapred.output.dir
            output
        
    
        
            mapred.job.name
            wordcount
        
    
        
            mapred.mapper.class
            org.apache.hadoop.mapred.WordCount$Map
        
    
        
            mapred.combiner.class
            org.apache.hadoop.mapred.WordCount$Reduce
        
    
        
            mapred.reducer.class
            org.apache.hadoop.mapred.WordCount$Reduce
        
    
        
            mapred.input.format.class
            org.apache.hadoop.mapred.TextInputFormat
        
    
        
            mapred.output.format.class
            org.apache.hadoop.mapred.TextOutputFormat
        
    
    Make sure $HADOOP_HOME/hadoop-examples*.jar is available in CLASSPATH.
    
  2. Submit the job:
    hadoop job -submit job.xml

The queue command is a wrapper command for the JobQueueClient class

, and the job command is wrapper command for the JobClient class. page 107
More job managment commands
  1. Get the value of a counter:
    hadoop job -counter eg: get the counter HDFS_BYTES_WRITTEN of the counter group FileSystemCounter for the job job_201302281451_0002:
    hadoop job -counter job_201302281451_0002 FileSystemCounter HDFS_BYTES_WRITTEN
  2. Query events of a MapReduce job:
    hadoop job -events <#-of-events> eg: query the first 10 events of the job job_201302281451_0002:
    hadoop job -events job_201302281451_0002 0 10
  3. Get the job history include job details, failed and killed jobs:
    hadoop job -history
Managing tasks
  1. Kill a task:
    hadoop job -kill-task
    After the task is killed, the JobTracker will restart the task on a different node.
    Hadoop JobTracker can automatically kill tasks in the following situations:
    • A task node not report progress after timeout.
    • Speculative execution can run one task on multiple nodes; if one of these tasks has successed, other attempts of the same task will be killed because the attempt results for those attempts will be useless.
    • Job/Task schedulers, such as fair scheduler and capacity scheduler, need empty slots for other pools or queues.
  2. Set a task to fail:
    hadoop job -fail-task
  3. List task attempts:
    hadoop job -list-attempt-ids
    Available task types are map, reduce, setup, and clean; avaiable task state are running and completed

Checking Job hitory from web UI

The web UI can be automatically updated every five seconds; this interval can be modified by changing the mapreduce.client.completion.pollinterval property in the $HADOOP_HOME/conf/mapred-site.xml file.


    mapreduce.client.completion.pollinterval
    5000

Importing Data to HDFS

Use the distributed copy the large data to HDFS:
hadoop distcp file://data/datafile /user/hduser/data
This command will initate a MapReduce job with a number of mappers to run the copy task in parallel.

Manipulating file on HDFS

  1. Check the content of file:
    hadoop fs -cat file1
    This command is handy to check the contents of small files. But when the file is large, it is not recommended. Instead, we can use the command hadoop fs -tail file1 to check the content of last few lines.
    Alternatively, hadoop fs text file1 command shows the content of file1 in text format.
  2. Test file1 exists:
    hadoop fs -test -e fil1
    hadoop fs -test -z file1
    hadoop fs -test -d file1
  3. Check the status of file1:
    hadoop fs -stat file1

Perform the following steps to minipulate files and directories on HDFS:

  1. Empty the trash:
    hadoop fs -expunge
  2. Merge files in a directory dir and download it as one big fiel:
    hadoop fs -getmerge dir file1
    This command is similar to the cat command in Linux. It is very useful when we want to get the MapReduce ouput as one file rather than serveral small partitioned files.
  3. Delete file1 unser the current directory:
    hadoop fs -rm file1
  4. Download file1 from HDFS:
    hadoop fs -get file1
  5. Change the group memebership of a regular file:
    hadoop fs -chgrp hadoop file1
  6. Change the ownership of a regular file:
    hadoop fs -chown hduser file1
  7. Change the mode of a file
    hadoop fs -chmod 600 file1
  8. Set the replication factor of file1 to be 3:
    hadoop fs -setrep -w 3 file1
  9. Create an empty file:
    hadoop fs -touchz 0file

Configuring the HDFS quota

Manage an HDFS quota:

  1. Set the name quota on the home directory:
    hadoop dfsadmin -setQuota 20 /user/hduser
    This command will set name quota on the home directory to 20, which means at most 20 files inclusing directories can be created under the home directory.If we reach the quota, an error message will be given.
  2. Set the space quota of the current user's home directory to be 100000000:
    hadoop dfsadmin -setSpaceQuota 100000000 /home/hduser
  3. Check the quota status:
    hadoop fs -count -q /user/hduser
    The meaning of output columns is DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME or QUOTA, REMAINING_QUOTA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE,FILE_NAME.
  4. Clear the name quota:
    hadoop dfsadmin -clrQuota /user/hduser
  5. Clear the space quota:
    hadoop dfsadmin -clrSpaceQuota /user/hduser

Configuring CapacityScheduler

Hadoop CapacityScheduler is a pluggable MapReduce job scheduler. The goal is to maximize the Hadoop cluster utilization by sharding the cluster among multiple users. CapacityScheduler uses queues to quarantee the minimum share of each user. It has features of being secure, elastic, operable, and supporting job priority.
Configure CapacityScheduler:
  1. configure Hadoop to use CapacityScheduler:
    
        mapred.jobtracker.taskScheduler
        org.apache.hadoop.mapred.CapacityTaskScheduler
    
    
  2. Define a new queue, hdqueue:
    
        mapred.queue.names
        
        default,hdqueue
    
    
  3. Configure CapacityScheduler queues in $HADOOP_HOME/conf/capacity-scheduler.xml:
    
        mapred.capacity-scheduler.queue.hdqueue.capacity
        20
    
    
    
        mapred.capacity-scheduler.queue.default.capacity
        80
    
    
    
        mapred.capacity-scheduler.queue.hdqueue.minimum-user-limit-percent
        20
    
    
    
        mapred.capacity-scheduler.maximum-system-jobs
        10
        Maximum number of jobs in the system which can be initialized, concurrently, by the CapacityScheduler.
    
    
    
        mapred.capacity-scheduler.queue.hdqueue.maximum-initialized-active-tasks
        500
        The maximum number of tasks, across all jobs in the queue, which can be initialzied concurrently. Once the queue's jobs exceed this limit they will be queued on disk.
    
    
    
        mapred.capacity-scheduler.queue.hdqueue.maximum-initialized-active-tasks-per-user
        100
    
    
    
        mapred.capacity-scheduler.queue.hdqueue.supports-priority
        true
    
    
  4. stop-mapred.sh
    start-mapred.sh
  5. Get the schedule details of each queue by Opening the URL.
  6. Test the queue configuration by submitting example wordcount job to the queue hdqueue:
    hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount -Dmapred.job.queue.name=hdqueue randtext wordcount.out

Hadoop supports access control on the queue using queue ACLs. Queue ACLs control the authorization of MapReducer job submission to a queue.

Configuring Fair Scheduler

Similar to Capacity Scheduler, Fair Scheduler was designed to enforce fair shares of cluster resources in a multiuser envrionment.
Configure Hadoop Fair Scheduler:
  1. Enable fair scheduling in $HADOOP_HOME/conf/mapred-site.xml
    
        mapred.jobtracker.taskScheduler
        org.apache.hadoop.mapred.FairScheduler
    
    
  2. Create the Fair Scheduelr configuration file, $HADOOP_HOME/conf/fair-scheduler.xml:
    
    
        
            5
            5
            90
            20
            2.0
        
        
            1
        
        3
    
    
  3. stop-mapred.sh
  4. start-mapred.sh

The Hadoop Fair Scheduler schedules jobs in such a way that all jobs can get an equal share of computing resources. Jobs are organized with scheduling pools. A pool can be configured for each Hadoop user. If the pool for a user is not configured, the default pool will be used. A pool specifies the amounts of resources a user can share on the cluster, for example the number of map slots, reducer slots, the total number of running jobs, and so on.

minMaps and minReduces are used to ensure the minimum share of computing slots on the cluster of a pool. The minimum share guarantee can be useful when the required number of computing slots is larger than the number of configured slots. In case the minimum share of a pool is not met, JobTracker will kill tasks on other pools and assign the slots to the starving pool. In such cases, the JobTracker will restart the killed tasks on other nodes and thus, the job will take a longer time to finish. (这里还真没看明白,JOB 执行时间变长了,怎么还是 useful? 还是说后面解释的 in case 与上文没有关系?)

Besides computing slots, the Fair Scheduler can limit the number of concurrency running jobs and tasks on a pool. So ,if a user submits more jobs than the configured limit, some jobs have to in-queue until other jobs finish. In such a case, higher priority jobs will be scheduled by the Fair Scheduler to run earlier than lower priority jobs. If all jobs in the waiting queue have the same priority, the Fair Scheduler can be configured to schedule these jobs with either Fair Scheduler or FIFO Scheduler.

Properties supported by Fair Scheduler:
Property Value Description
minMaps integer Minimum map slots of a pool
minReduces integer Minimum reduce slots of a pool
maxMaps
maxReduces
schedulingMode Fair/FIFO Pool internal scheduling mode
maxRunningJobs Integer Maximum number of concurrently running jobs for a pool. Default value is unlimited.
weight float Value to control non-proportaional share of a cluster resoruce. The default value is 1.0
minSharePreemptionTimeout integer Seconds to wait before killing other pool's tasks if a pool's share is under the minimum share.
poolMaxJobsDefault integer Default maximum number of concurrently running jobs for a pool.
userMaxJobsDefault integer Default maximum number of concurrently running jobs for a user
defaultMinSharePreemptionTimeout integer Default seconds to wait before killing other pools' tasks when a pool's share is uner minimum share
fairSharePreemptionTimeout integer Pre-emption time when a job's resource is below haflf of the fair share.
defaultPoolSchedulingMode Fair/FIFO Default in-pool scheduling mode

Configuring Hadoop daemon logging

  1. Check the current logging level of JobTracker:
    hadoop daemonlog -getlevel master:50030 org.apache.hadoop.mapred.JobTracker
  2. Tell Hadoop to only log error events for JobTracker:
    hadoop daemonlog -setlevel master:50030 org.apache.hadoop.mapred.JobTracker ERROR
  3. Get the log for TaskTracker, NameNode, and DataNode:
    hadoop daemonlog -getlevel master:50070 org.apache.hadoop.dfs.NameNode
    hadoop daemonlog -getlevel master:50030 org.apache.hadoop.mapred.TaskTracker
    hadoop daemonlog -getlevel master:50070 org.apache.hadoop.dfs.DataNode

By default, Hadoop sends log messages to Log4j, which is configured in the fiel $HADOOP_HOME/conf/log4j.properies. This file defines both what to log and where to log. For applications, the default root logger is INFO, console, which logs all messages at level INFO and above the console's stderr. Log files are named $HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-.log

Hadoop supoorts a number of log leveles for different purposes. The log level should be tuned based on the purpose of logging.

Logging levels provided by Log4j:
Log level Description
ALL The lowest logging level, all loggings will be tuned on.
DEBUG Logging events useful for debugging applications.
ERROR Logging error events, but applications can continue to run.
FATAL Logging very severe error events that will abort application.
INFO Logging informational messages that indicates the progress of applications.
OFF Logging will bw turned off.
TRACE Logging more finger-grained events for application debugging.
TRACE_INT Logging in TRACE level on integer values.
WARN Logging potentially harmful events.
Configuring Hadoop logging with hadoop-env.sh
Open the file $HADOOP_HOME/conf/hadoop-env.sh:
export HADOOP_LOG_DIR=/var/log/hadoop
Other environment variables:
Variable name Description
HADOOP_LOG_DIR Directory for log files.
HADOOP_PID_DIR Directory to store the PID for the events.
HADOOP_ROOT_LOGGER Logging configuration for hadoop.root.logger.default, "INFO,console"
HADOOP_SECURITY_LOGGER Logging configuration for hadoop.security.logger.default, "INFO,NullAppender"
HADOOP_AUDIT_LOGGER Logging configuration for hdfs.audit.logger.default, "INFO,NullAppender"
Configuring Hadoop security logging
Security logging can help Hadoop cluster administrators to identify security problems.It is enabled by default. The security logging configuration is located in the file $HADOOP_HOME/conf/log4j.properties. By default, the security logging information is appended to the same file as NameNode logging.
grep security $HADOOP_HOME/logs/hadoop-hduser-namenode-master.log

Configuring Hadoop audit logging

Audit logging might be required for data processing systems such as Hadoop. In Hadoop, audit logging has been implemnted using the Log4j Java logging library at the INFO logging level. By default, Hadoop audit logging is disabled.
Configure Hadoop audit logging:
  1. Enable audit logging in the $HADOOP_HOME/conf/log4j.properties file:
    log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=INFO
    
  2. Try making a directory on HDFS:
    hadoop fs -mkdir audittest
  3. Check the audit log messages in the NameNode log file:
    grep org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit $HADOOP_HOME/logs/hadoop-hduser-namenode-master.log
    The Hadoop NameNode is responsible for managing audit logging messages, which are forwarded to the NameNode logging facility.
  4. Separate the audit logging messages from the NameNode logging messages by configuring the file $HADOOP_HOME/conf/log4j.properties:
    # Log at INFO level, SYSLOG appenders
    log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=INFO
    
    # Disable forwarding the audit logging message to the NameNode logger.
    log4j.additivity.org.apche.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
    
    ###########################################
    # Configure logging appender
    ###########################################
    #
    # Daily Rolling File Appender (DRFA)
    log4j.appender.DRFAAUDIT=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.DRFAAUDIT.File=$HADOOP_HOME/log/audit.log
    log4j.appender.DRFAAUDIT.DatePattern=.yyyy-MM-dd
    log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout
    log4j.appender.DRFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c:%m%n
    

Hadoop logs auditing messages of operations, such as creating, changing or deleting files into a configured log file. By default, audit logging is set to WARN, which disables audit logging. To enable it, the logging level needs be changed to INFO.

When a Hadoop cluster has many jobs to run, the log file can become large very quickly. Log file rotation is a function that periodically rotates a log file to a different name.

Upgrading Hadoop

In the process of upgrading a Hadoop cluster, we want to minimize the damage to the data stored on HDFS, and this proceduer is the cause of most of the upgrade problems. The data damages can be caused by either human operation or software and hardware failures. So, a backup of the data might be necessay.But the sheer size of the dta on HDFS can be a headache for most of the upgrade experience.

A more practical way is to only back up the HDFS filesystem metadta on the master node, while leaving the data blocks intact. If some data blocks are lost after upgrade, Hadoop can automatically recover it from other backup replications.

  1. stop-all.sh
  2. Back up block locations of the data on HDFS:
    hadoop fsck / -files -blocks -locations > dfs.block.locations.fsck.backup
  3. Save the list of all files on the HDFS filesystem:
    hadoop dfs -lsr / > dfs.namespace.lsr.backup
  4. Save the description of each DataNode in the HDFS cluster:
    hadoop dfsadmin -report > dfs.datanodes.report.backup
  5. Copy the checkpoint files to a backup directory:
    sudo cp dfs.name.dir/edits /backup sudo cp dfs.name.dir/image/fsimage /backup
  6. Verify that no DataNode daemon is running:
    for node in 'cat $HADOOP_HOME/conf/slaves'; do echo 'Checking node' $node ssh $node -C "jps" done
  7. If any DataNode process is still running, kill the process.
    ssh $node -C "jps | grep 'DataNode' | cut -d '\t' -f 1 | xargs kill -9"
  8. Decompress the new verison of Hadoop archive file.
  9. Copy the configuration files from the old configuration directory to the new one using one:
    sudo cp $HADOOP_HOME/conf/* /usr/local/hadoop-1.2.0/conf/*
  10. Update the Hadoop sysmbolic link to the Hadoop version:
    sudo rm -rf /usr/local/hadoop
    sudo ln -s /usr/local/hadoop-1.2.0 /usr/local/hadoop
  11. Upgrade in the slave nodes:
    for host in 'cat $HADOOP_HOME/conf/slaves'; do
        echo 'Configuring hadoop on slave node ' $host
        sudo scp -r /usr/local/hadoop-1.2.0 hduser@$host:/usr/local
        echo 'Making sysbolic link for Hadoop home directory on host ' $host
        sudo ssh hduser@$host -C "ln -s /usr/local/hadoop-1.2.0 /usr/local/hadoop"
    done
    
  12. Upgrade the NameNode:
    hadoop namenode -upgrade
    This command will convert the checkpoint to the new version format. We need to wait to let it finish.
  13. start-dfs.sh
  14. Get the list of all files on HDFS and compare its difference with the backed up one:
    hadoop dfs -lsr / > dfs.namespace.lsr.new
    diff dfs.namespace.lsr.new dfs.namespace.lsr.backup
    The two files should have the same content if there is no error in the upgrade.
  15. Get a new report of each DataNode in the cluster and compare the file with the backed up one:
    hadoop dfsadmin -report > dfs.datanodes.report.new
    diff dfs.datanodes.report.new dfs.datanodes.report.backup
  16. Get the locations of all data blocks and compare the output with the previous backup:
    hadoop fsck / -files -blocks -locations > dfs.block.locations.fsck.new
    diff dfs.block.locations.fsck.new dfs.block.locations.fsck.backup
  17. start-mapred.sh

Chapter 3 Configuring a Hadoop Cluster

Use the following commmands to install Hadoop on all nodes:

for host in master slave1 slave2 slave3 slave4 slave5; do
    echo 'Installing Hadoop on node: ' $host
    sudo rpm -ivh ftp://hadoop.admin/repo/hadoop-1.1.2-1.x86_64.rpm
done    

Run a sampele MapReduce job with the following command:

#Option 20 is the number of tasks to run and 100000 specifies the size of the sample for each task.
hadoop jar $HADOOP_HOME/hadoop-examples*.jar pi 20 100000

Run adn example teragen job to generate 10 GB data on the HDFS with the following command:

#Option $((1024 * 1024 * 1024 * 10 / 100)) means how many lines of data will be generated with the total data size 10GB
hadoop jar $HADOOP_HOME/hadoop-examples*.jar teragen $((1024 * 1024 * 1024 * 10/100)) teraout

Common Problems

Can't start HDFS daemons
The NameNode on the master node has not been formatted:
hadoop namenode -format
Check that HDFS has been properly configured and daemon are running. If the output of jps command does not contaion the NameNode and SecondaryNameNode daemons, we need to check the configuration of HDFS.
jps
Open a new terminal and monitor the NameNode logfile on the master node. Alternatively, the hadoop jobtracker command will give the same error.
tail -f $HADOOP_HOME/logs/hadoop.hduser-namenode-master.log
Cluster is missing slave nodes

Most likely, this problem is caused by hostname resolution. To confirm, check the content of the /etc/hosts file.

MapRedue daemons can't be started
The following two reasons can cause this problem:
  • The HDFS daemons are not running, which can cause the MapReduce daemons to ping the NameNode daemon at a regualar interval, which can be illustrated with the following log output:
    13/02/16 11:32:19 INFO ipc.Client: Retrying connect to server:
    master/10.0.0.1:54310. Already tried 0
    time(s); retry
    policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
    sleepTime=1 SECONDS)
    13/02/16 11:32:20 INFO ipc.Client: Retrying connect to server:
    master/10.0.0.1:54310. Already tried 1 time(s); retry policy is
    RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
    SECONDS)
    13/02/16 11:32:21 INFO ipc.Client: Retrying connect to server:
    master/10.0.0.1:54310. Already tried 2 time(s); retry policy is
    RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
    SECONDS)
    13/02/16 11:32:22 INFO ipc.Client: Retrying connect to server:
    master/10.0.0.1:54310. Already tried 3 time(s); retry policy is
    RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
    SECONDS)
    13/02/16 11:32:23 INFO ipc.Client: Retrying connect to server:
    master/10.0.0.1:54310. Already tried 4 time(s); retry policy is
    RetryUpToMaximumCount
    
  • Configuration problems of MapReduce can cause the MapReduce daemons can't be started problem. Before starting a cluster, we need to make sure that the total amount of configured memory should be smaller than the total amount of system memory.
    For example, suppose a slave host has 4 GB of memory, and we have configured 6 map slots, and 6 reduce slots with 512 MB of memory for each slots. So, we can compute the total configured task memory with the following formula :
    6 * 512 + 6 * 512 = 6GB
    As 6GB is larger than the system memory of 4GB, the system will not start. To clear this problem, we can decrease the number of map slots and reduce slots from 6 to 3.

Configuring ZooKeeper

Log in to the the master Node as hduser:
  1. sudo wget ftp://haoop.admin/repo/zookeeper-3.4.5.tar.gz -P /usr/local
  2. cd /usr/local
    sudo tar xvf zookeeper-3.4.5.tar.gz
  3. sudo ln -s /usr/local/zookeepr-3.4.5.tar.gz /usr/local/zookeeper
  4. Open ~/.bashrc file and add the following lines:
    ZK_HOME=/usr/local/zookeeper
    export PATH=$ZK_HOME/bin:$PATH
    
  5. . ~/.bashrc
  6. sudo mkdir -pv /hadoop/zookeeper/{data,log}
  7. Create Java configuration file $ZK_HOME/conf/java.env:
    JAVA_HOME=/usr/java/latest
    export PATH=$JAVA_HOME/bin:$PATH
    
  8. Create the $ZK_HOME/conf/zookeeper.cfg file:
    tickTime=2000
    clientPort=2181
    initLimit=5
    syncLimit=2
    server.1=master:2888:3888
    server.2=slave1:2888:3888
    server.3=slave2:2888:3888
    server.4=slave3:2888:3888
    server.5=slave4:2888:3888
    server.6=slave5:2888:3888
    dataDir=/hadoop/zookeepr/data
    dataLogDir=/hadoop/zookeeper/log
    
  9. Configure ZooKeeper on all slave nodes with the following command:
    for host in cat $HADOOP_HOME/conf/slaves; do
        echo 'Configuring ZooKeeper on ' $host
        scp ~/.bashrc hduser@$host:~/
        sudo scp -R /usr/local/zookeeper-3.4.5 hduser@$host:/usr/local/
        echo 'Making symbolic link for ZooKeeper home directory on ' $host
        sudo ssh hduser@$host -C "ln -s /usr/local/zookeeper-3.4.5 /usr/local/zookeeper"
    done
    
  10. zkServer.sh start
  11. zkCli.sh -server master:2181
  12. zkServer.sh stop

Installing Mahout

Log in to the master node s hduser:
  1. sudo wget ftp://hadoop.admin/repo/mahout-distribution-0.7.tar.gz / usr/local
  2. cd /usr/local
    sudo tar xvf mahout-distribution-0.7.tar.gz
  3. sudo ln -s /usr/local/mahout-distribution-0.7 /usr/local/mahout
  4. Open the ~/.bashrc file
    export MAHOUT_HOME=/usr/local/mahout
    export PATH=$MAHOUT_HOME:$PATH
    
  5. sudo yum install maven
  6. cd $MAHOUT_HOME
    sudo mvn compile
    sudo mvn install
    The install command will run all the tests by default; sudo mvn -DskipTests install command will ignore the tests.
  7. cd example
    sudo mvn compile
Verify Mahout configuration:
  1. wget http://archive.ics.uci.edu/ml/databases/synthetic_control/ synthetic_control.data -P ~/
  2. start-dfs.sh
    start-mapred.sh
  3. hadoop fs -mkdir testdata
    hadoop fs -put ~/synthetic_control.data testdata
  4. mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

Chapter 2 Preparing for Hadoop Installation

Generally, for a small to medium-sized cluster with up to a hundred slave nodes, the NameNode, JobTracker, and SecondaryNameNode daemons can be put on the same master machine. When the cluter grows up to hundreds or even thousands of slave nodes, it becomes advisable to put these daemons on different machines.

从安全方面来看,NameNode 和 SecondaryNameNode 应该不放在同一主机上。

Empirically, recommend the following configurations for a small to medium-sized Hadoop cluster.

Node type Node components Recommended specification
Master Node CPU 2 Quad Core, 2.0 GHz
RAM 16 GB
Hard drive 2 x 1TB SATA II 7200 RPM HDD or SSD
Network card 1GBps Ethernet
Slave Node CPU 2 Quad Core
RAM 16 GB
Hard drive 4 x 1TB HDD
Network card 1GBps Ethernet

在此基础上,再根据主要业务是计算还是存储做对应的调整。其实,Master CPU 主频才 2.0, 这列出的标准是大大低于预期。


In default configuration, the master node is a single failure point. High-end computing hardware and secondary power supplies are suggest.


In Hadoop, each slave nodes simultaneously executes a number of map or reduce tasks. The maximum number of parallel map/reduce tasks are known as a map/reduce slots, which are configurable by a Hadoop administrator. Each slot is a computing unit consisting of CPU, memory disk I/O resources. When a slave node was assigned a task by the JobTracker, its TaskTracker will fork a JVM for that task, allocating a preconfigured amount of computing resources. In addition, each forked JVM also will incur a certain amount of memory requirements. Empiricaly, a Hadoop job can consume 1 GB to 2GB of memory for each CPU core. Higher data throughput requirement can incure high I/O operations for the majority of Hadoop jobs. That's why higher end and parallel hard drivers can help boost the cluster performance. To maximum parallelism, it is advisable to assign two slots for each CPU core. For example, if our slave node has two quad-core CPUs, we ca assign 2 x 4 x 2 = 16 (map only, reduece only, or both) slots in total this node.(The first 2 stands for the number of CPUs of the slave node, the number 4 represents the number of cores per CPU, and the other 2 means the number of slots per CPU core.)

注:

  1. In MR1, the mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum properties dictated how many map and reduce slots each TaskTracker had. These properties no longer exist in YARN. Instead, YARN uses yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores, which control the amount of memory and CPU on each node, both available to both maps and reduces. Essentially, YARN has no TaskTrackers, but just generic NodeManagers. Hence, there's no more Map slots and Reduce slots spearation. Everything depneds on the amount of memory in use/demanded.
  2. YARN configuration reference , http://stackoverflow.com/questions/22069904/controling-and-monitorying-number-of-simultaneous-map-reduce-tasks-in-yarn

Designing the cluster network

集群节点应该位于同一个网段,不推荐通过 VPN 或 路由来实现这个。

Nodes on the same rack(机架) can be interconnected with a 1 GBps Ethernete switch. Cluster level switches then connect the rack switches with faster links, such as 10 GBps optical fiber links, and other networks such as InfiniBand. The cluster-level switches may also interconnect with other cluster-level switches or even uplink to another higher level of switching infrastructure. With the increasing size of cluster, the network, at the same time, will become larger and more complex. Connection redundancies for network high availability can also increase its complexity.

Configuring the cluster administrator machine,below commands are executed on CentOS 6.3

  1. Log in as hddmin and change hostname
    sudo sed -i 's/^HOSTNAME.*$HOSTNAME=hadoop.admin/' /etc/sysconfig/network
  2. mkdir -v ~/mnt ~/isoimages ~/repo (~/mnt as the mount point for ISO images. ~/isoimages will be used to contain the original image files. ~/repo will be used as the repository folder for network installation)
  3. sudo yum -y install dhcp
    sudo yum -y install vsftpd (install the ftp servers)
  4. rsync rsync://mirror.its.dal.ca/centos/6.3/isos/x86_64/CentOS-6.3-x86_64-netinstall.iso ~/isoimages
    or wget http://mirror.its.dal.ca/centos/6.3/isos/x86_64/CentOS-6.3-x86_64-netinstall.iso -P ~/isoimages
  5. sudo mount ~/isoimages/ /CentOS-6.3-x86_64-netinstall.iso
  6. cp -r ~/mnt/* ~/repo
  7. sudo umount ~/mnt
Configure the DHCP Server
  1. /etc/dhcp/dhcpd.conf
    # Domain name
    option domain-name  "hadoop.cluster";
    
    # DNS hostname  or IP address
    option domain-name-servers dlp.server.world;
    
    # Default lease time
    default-lease-time 600;
    
    # Maximum lease time
    max-lease-time  7200;
    
    # Declare the DHCP server to be valid.
    authoritative;
    
    # Network address and subnet mask
    subnet 10.0.0.0 netmask 255.255.255.0 {
    
    # Range of lease IP address, should be based 
        # on the size of the network
        range dynamic-bootp 10.0.0.200 10.0.0.254;
    
        # Broadcast address
        option broadcast-address 10.0.0.255;
    
        # Default gateway
        option routers 10.0.0.1;
    }
        
  2. sudo service dhcpd start
  3. sudo chkconfig dhcpd --level 3 on (Make the DHCP Server to survive a system reboot)
Configure the FTP server
  1. /etc/vsftpd/vsftpd.conf
    # The FTP server will run in standalone mode.
    listen=YES
    
    # Use Anonymous user.
    anonymous_enable=YES
    
    # Disable change root for local users.
    chroot_local_user=NO
    
    # Disable uploading and changing files.
    write_enable=NO
    
    # Enable logging of uploads and downloads.
    xferlog_enable=YES
    
    # Enable port 20 data transfer
    connect_from_port_20=YES
    
    # Specify directory for hosting the Lnux installation packages.
    anon_ropot=~/repo
        
  2. sudo service vsftpd start
  3. ftp hadoop.admin

Creating the kickstart file and boot media

  1. check the filesystem type of USB flash drive
    blkid
  2. If the TYPE attribute is other than vfat, use the following command to clear the first few blocks of the drive:
    dd if=/dev/zero of=/dev/sdb1 bs=1M count=100
Create a kickstart file
  1. sudo yum install system-config-kickstart
  2. ks.cfg
    #! /bin/bash
    # Kickstart for CentOS 6.3 for Hadoop cluster.
    
    #Install system on the machine
    install
    
    # Use ftp as the package repository
    url --url ftp://hadoop.admin/repo
    
    # Use the text installation interface
    text
    
    # Use UTF-8 encoded USA English as the language.
    lang en_US.UTF_8
    
    # Configure time zone.
    timezone America/New_York
    
    # Use USA keyboard
    keyboard us
    
    # Set bootloader location
    bootloader --location=mbr --driveorder=sda rhgb quiet
    
    # Set root password
    rootpw --password=hadoop
    
    #############################
    # Partion the hard disk
    #############################
    # Clear the master boot record on the hard drive.
    zerombr yes
    
    # Clear existing partitions
    clearpart --all --initlabel
    
    # Clear /boot partition, size is in MB.
    part /boot --fstype ext3 --size 128
    
    # Create / (root) partiton.
    part / --fstype ext3 --size 4096 --grow --maxsize 8192
    
    # Create /var partition.
    part /var --fstype ext3 --size 4096 --grow --maxsize 8192
    
    # Create Hadoop data storage directory
    part /hadoop --fstype ext3 --grow
    
    # Create swap partition, 16GB, double size of the main memory.
    # Change size according to your hardware memory configuration.
    part swap --size 16384
    
    
    #############################
    # Configure Network device
    #############################
    
    # Use DHCP and disable IPv6
    network --onboot yes --device eth0 --bootproto dhcp --noipv6
    
    # Disable firewall.
    firewall --disabled
    
    # Put Selinux in permissive mode.
    selinux --permissive
    
    
    #############################
    # Specify packages to install
    #############################
    
    # Automatically resolv package dependencies, 
    # exclude installation of documents and ignore missing packages.
    %packages --resolvedeps  --excludeddocs --ignoremissing
    
    # Install core packages.
    @Base
    
    # Don't install OpenJDK.
    -java
    
    # Install wget
    wget
    
    # Install the vim text editor
    vim
    
    # Install the Emacs text editor
    emacs
    
    # Install rsync
    rsync
    
    # Install nmap network mapper.
    nmap
    
    %end
    
    
    ###################################
    # Post installation configuration.
    ###################################
    
    # Enable post process logging
    %post --log=~/install-post.log
    
    # Create Hadoop user hduser with password hduser.
    useradd -m -p hduser hduser
    
    # Create group Hadoop.
    groupadd hadoop
    
    # Change user hduser's current group to hadoop
    usermod -g hadoop hduser
    
    # Tell the nodes hostname and ip address of the admin machine.
    echo "10.0.0.1 hadoop.admin" >> /etc/hosts
    
    # Configure administrative privilege to hadoop group.
    
    # Configure the kernel settings.
    ulimit -u
    
    
    #############################
    # Startup services.
    #############################
    
    service sshd start
    chkconfig sshd on
    
    %end
    
    # Reboot after installation.
    reboot
    
    # Disable first boot configuration.
    firstboot --disable
        
  3. Put the kickstart file into the root directory of the FTP server with the command:
    cp ks.cfg ~/repo

Create a USB boot media

  1. ~/isolinux/grub.conf, add the following content:
    default=0
    splashimage=@SPLASHPATH@
    timeout 0
    hiddenmenu
    title @PRODUCT@ @VERSION@
        kernel @KERNELPATH@ ks=ftp://hadoop.admin/ks.cfg
        initrd @INITRDPATH@
        
  2. Make an ISO fiel from the isolinux directory using the following commands:
    mkisofs -o CentOS6.3-x86_64--boot.iso \
    -b ~/repo/isolinux/isolinux.bin \
    -c ~/repo/isolinux/boot.cat \
    -no-emul-boot \
    -boot-load-size 4
        
  3. Write the bootable ISO image into the USB flash drive:
    dd if=~/CentOS6.3-x86-64-boot.iso of=/dev/sdb

2014年4月24日星期四

Chapter 1 Big Data and Hadoop

CentOS 6.3
Oracle JDK (Java Development Kit) SE 7
Hadoop 1.1.2
HBase 0.94.5
Hive 0.9.0
Pig 0.10.1
ZooKeeper 3.4.5
Mahout 0.7


Big Data  四个属性
volume
velocity
variety
value

成书时间是 2013.7

2014年4月23日星期三

升级到 14.04

早上开机,系统提示是否升级到 14.04?选择了 Yes。
过程挺快。升级后重启,打开浏览器,带有中文的 title 成了乱码。算是目前发现的惟一 bug 吧。
sudo apt-get install ttf-wqy-zenhei 后重启,小问题也修复了。

2014年4月17日星期四

收到购买空间邮件提醒

又收到美橙的邮件,意思大概是:至此扫黄打黑之际,我司四月到九月进行空间整理。言下之意,当然是如果有网站内容涉及到那些东东,空间就得关闭了。
实际,这与我还是没半毛钱关系。只是本来就开始讨厌各种限制,现在是更加厌恶罢了。

最开始购买空间是为了方便自己查询。偶尔勤劳时,也会记录点自己研究一些东东的的心得。但近两年来,偶尔的勤劳比昙花一现都难见了。下半年续约时还是别再续了吧。

2014年4月3日星期四

换系统

正式切换到 Ubuntu。其实与微软宣布不再维护 Windows XP 没关系。只是恰好是这个时机。