Managing the HDFS Cluster
Use the following steps to check the status of an HDFS cluster with hadoop fsck:- Check the status of the root filesystem with the following command:
hadoop fsck / - Check the status of all the files on HDFS with the following command:
hadoop fsck / -files - Check the locations of file blocks with the following command:
hadoop fsck / -files -locations - Check the locations of file blocks containing rack information with the following command:
hadoop fsck / -files -blocks -racks - Delete corrupted files with the following command:
hadoop fsck -delete - Move corrupted files to /lost+found with the following command:
hadoop fsck -move
- Report the status of each slvae node with the following command:
hadoop dfsadmin -report - Refresh all the DataNodes using the following command:
hadoop dfsadmin -refreshNodes - Check the status of the safenode using the following command:
hadoop dfsadmin -safemode get - Manually put the NameNode into safemode using the following command:
hadoop dfsadmin -safemode enter - Make the NameNode to leave safemode using the following command:
hadoop dfsadmin -safemode leave - Wait until NameNode leaves safemode using the following command:
hadoop dfsadmin -safemode wait
This command is useful when we want to wait until HDFS finishes data block replication or wait until a newly commissioned DataNode to be ready for servcie. - Save the metadata of HDFS filesystem with the following command:
hadoop dfsadmin -metadata meta.log
The meta.log file will be created under the directory $HADOOP_HOME/logs .
The HDFS filesystem will be write proteced when NameNode enters safe mode. When an HDFS cluster is started, it will enter safe mode first. The NameNode will check the replication factor for each data block. If the replica count of a data block is smaller than the configured value, which is 3 by default, the data block will be marked as under-replicated. Finally, an under-replication factor, which is the percentage of under-replicated data blocks, will be calculated. If the percentage number is larger than the threshold value, the NameNode will stay in safe mode until enough new replicas are created for the under-replicated data blocks so as to make the under-relication factor lower than the threshold.
Configuring SecondaryNameNode
Hadoop NameNode is a simple point of failure. By configuring SeconaryNameNode, the filesystem image and edit log can be backed up periodically. And in case of NameNoade failure, the backup files can be used to recover the NameNode.Configure SecondaryNameNode:
- stop-all.sh
- Add or change the following into the fiel $HADOOP_HOEM/conf/hdfs-site.xml
If this property is not set explicitly, the default checkpoint directory will be ${hadoop.tmp.dir}/dfs/namesecondaryfs.checkpoint.dir /hadoop/dfs/namesecondary - start-all.sh
To increase redundancy, we can configure NameNode to write filesystem metatdata on multiple locations. For example, we can add an NFS shared directory for backup by changing the following property in the file $HADOOP_HOME/conf/hdfs-site.xml:
dfs.name.dir /hadoop/dfs/name,/nfs/name
Managing the MapReduce cluster
- List all the active TaskTrackers:
hadoop -job -list-active-trackers - check the status of the JobTracker safe mode:
hadoop mradmin -safemode get
The output tells us that the JobTracker is not in safe mode. We can submit jobs to the cluster. - Manually let the JobTracker enter safe mode:
hadoop mradmin -safemode enter - Let the JobTracker leave safe mode:
hadoop mradmin -safemode leave - Wait for safe mode to exit:
hadoop mradmin -safemode wait - Reload the MapReduce queue configuration:
hadoop mradmin -refreshQueues - Reload active TaskTrackers:
hadoop mradmin -refreshNodes
hadoop mradmin options is listed in the following table:
| Option | Description |
|---|---|
| -refreshServiceAcl | Force JobTracker to reload ACL. |
| -refreshQueues | Force JobTracker to reload queue configurations. |
| -refreshUserToGroupsMappings | Force JobTracker to reload user group mappings. |
| -refreshSuperUserGroupsConfiguration | Force JobTracker to reload super user group mappings |
| -refreshNodes | Force JobTracker to refresh the TaskTracker hosts. |
Managing TaskTracker
Hadoop maintains three lists for TaskTrackers: black list, gray list and excluded list.
TaskTracker black listing is a function that can blacklist a TaskTracker if it is in an unstable state or its performance has been downgraded. For example, when the ratio of failed tasks for a specific job has reached a certain threashold, the TaskTracker will be blacklisted for this job. Similarly, Hadoop maintaions a gray list of nodes by identifying potential problems nodes.
Sometimes, excluding certain TaskTackers from the cluster is desirable. For example, when we debug ofr upgrade a slave node, we want to separate this node from the cluster in case it affects the cluster. Hadoop supports the live decommission of a TaskTracker from a running cluster.
List the active trackers with the following command on the master node:
hadoop job -list-active-trackers
Configure the heartbeat interval
- stop-mapred.sh
- $HADOOP_HOME/conf/mapred-site.xml
mapred.tasktracker.expiry.interval 600000 - Copy the configuration into the slave nodes:
for host in 'cat $HDOOP_HOME/conf/slaves'; do echo 'Copying mapred-site.xml to slave node ' $host sudo scp $HADOOP_HOME/conf/mapred-site.xml hduser@$host:$HADOOP_HOME/conf - start-mapred.sh
Configur TaskTracker blacklisting
- stop-mapred.sh
- Set the number of task failure for a job to blacklist a TaskTracker by adding or chaning the following property in the fiel $HADOOP_HOME/conf/mapred-site.xml
mapred.max.tracker.failures 10 - Set the maximum number of successful jobs that can blacklist a TaskTracker:
mapred.max.tracker.blacklists 5 -
for host in 'cat $HADOOP_HOME/conf/slaves'; do echo 'Copying mapred-site.xml to slave node ' $host sudo scp $HADOOP_HOME/conf/mapred-site.xml hduser@$host:$HADOOP_HOME/conf done - start-mapred.sh
- List blacklisted TaskTrackers:
hadoop job -list-blacklisted-trackers
Decommission TaskTrackers
- Set the TaskTracker exclude file in $HADOOP_HOEM/conf/mapred-site.xml :
mapred.hosts.exclude $HDOOP_HOME/conf/mapred-exclude - Force the JobTracker to reload the TaskTracker list:
hadoop mradmin -refreshNodes - List all the active TaskTrackers again:
hadoop job -list-active-trackers
TaskTrackers on slave nodes contact the JobTracker on the master node periodically. The interval between two consecutive contact communications is called a heartbeat. More frequent heartbeat configurations can incur higher loads to the cluster. The value of the heartbeat propert should be based on the size of the cluster.
The JobTracker uses TaskTacker blacklisting to remove those unstable TaskTrackers. If a TaskTracker is blacklsited, all the tasks currently running on the TaskTracker can still finish and the TaskTracker will continue the connection with JobTracker through the heartbeat mechanism. But the TaskTracker will not be scheduled for running future tasks. If a blacklisted TaskTracker is restarted, it will be removed from the blacklist.(The total number of TaskTrackers should not exceed 50 percent of the number of TaskTrackers.)
Decommissioning DataNode
- Create the file $HADOOP_HOME/conf/dfs-exclude:
The dfs-exclude file contains the DataNode hostnames, one per line. - $HDOOP_HOME/conf/hdfs-site.xml:
dfs.hosts.exclude $HADOOOP_HOME/conf/dfs-exclude - hadoop dfsadmin -refreshNodes
- hadoop dfsadmin -report
Replacing a Salve Node
- Decommission the TaskTacker on the slave node.
- Decommission the DataNode on the slave node.
- Power off the slae node and replace it with the new hardware.
- Install and configure the Linux Operation system on the new node. Installing Java and other tools, and Configuring SSH, Preparing for Hadoop Installation.
- Install Hadoop on the new node.
- Log in to new node and start the DataNode and TaskTracker(assuming slave2):
ssh hduser@slave2 -C "hadoop DataNode &"
ssh hduser@slave3 -C "hadoop TaskTracker &" - Refresh the DataNodes:
hadoop dfsadmin -refreshNodes - Refresh the TaskTrackers:
hadoop mradmin -refreshNodes - hadoop dfsadmin -report
- hadoop job -list-active-trackers
Managing MapReduce jobs
Check the status of Hadoop jobs
- List all the running jobs:
hadoop job -list - List all the submitted jobs since the start of the cluster:
hadoop job -list all - Check the status of the default queue:
hadoop queue -list
Hadoop manages jobs using queue. By default, there is only one default queue. The output of the command shows that the cluster has only one default queue, which is the running state with no scheduling information. - Check the status of a queue ACL:
hadoop queue -showacls - Show all the jobs in the default queue:
hadoop queue -info default -showJobs - Check the status of job:
hadoop job -status JOB-ID
Chaning the status of job
- Set the job JOB-ID be on high priority:
hadoop job -set-priority JOB-ID HIGH
Available priorities, in descending order, include : VERY_HIGH, HIGH, NORMAL, LOW AND VERY_LOW. - Kill the job:
hadoop job -kill JOB-ID
Submit a MapReduce job
- Create the job configuration file, job.xml:
Make sure $HADOOP_HOME/hadoop-examples*.jar is available in CLASSPATH.mapred.input.dir randtext mapred.output.dir output mapred.job.name wordcount mapred.mapper.class org.apache.hadoop.mapred.WordCount$Map mapred.combiner.class org.apache.hadoop.mapred.WordCount$Reduce mapred.reducer.class org.apache.hadoop.mapred.WordCount$Reduce mapred.input.format.class org.apache.hadoop.mapred.TextInputFormat mapred.output.format.class org.apache.hadoop.mapred.TextOutputFormat - Submit the job:
hadoop job -submit job.xml
The queue command is a wrapper command for the JobQueueClient class
, and the job command is wrapper command for the JobClient class. page 107More job managment commands
- Get the value of a counter:
hadoop job -countereg: get the counter HDFS_BYTES_WRITTEN of the counter group FileSystemCounter for the job job_201302281451_0002:
hadoop job -counter job_201302281451_0002 FileSystemCounter HDFS_BYTES_WRITTEN - Query events of a MapReduce job:
hadoop job -events<#-of-events> eg: query the first 10 events of the job job_201302281451_0002:
hadoop job -events job_201302281451_0002 0 10 - Get the job history include job details, failed and killed jobs:
hadoop job -history
Managing tasks
- Kill a task:
hadoop job -kill-task
After the task is killed, the JobTracker will restart the task on a different node.
Hadoop JobTracker can automatically kill tasks in the following situations:- A task node not report progress after timeout.
- Speculative execution can run one task on multiple nodes; if one of these tasks has successed, other attempts of the same task will be killed because the attempt results for those attempts will be useless.
- Job/Task schedulers, such as fair scheduler and capacity scheduler, need empty slots for other pools or queues.
- Set a task to fail:
hadoop job -fail-task - List task attempts:
hadoop job -list-attempt-ids
Available task types are map, reduce, setup, and clean; avaiable task state are running and completed
Checking Job hitory from web UI
The web UI can be automatically updated every five seconds; this interval can be modified by changing the mapreduce.client.completion.pollinterval property in the $HADOOP_HOME/conf/mapred-site.xml file.
mapreduce.client.completion.pollinterval 5000
Importing Data to HDFS
Use the distributed copy the large data to HDFS:hadoop distcp file://data/datafile /user/hduser/data
This command will initate a MapReduce job with a number of mappers to run the copy task in parallel.
Manipulating file on HDFS
- Check the content of file:
hadoop fs -cat file1
This command is handy to check the contents of small files. But when the file is large, it is not recommended. Instead, we can use the commandhadoop fs -tail file1to check the content of last few lines.
Alternatively,hadoop fs text file1command shows the content of file1 in text format. - Test file1 exists:
hadoop fs -test -e fil1
hadoop fs -test -z file1
hadoop fs -test -d file1
- Check the status of file1:
hadoop fs -stat file1
Perform the following steps to minipulate files and directories on HDFS:
- Empty the trash:
hadoop fs -expunge - Merge files in a directory dir and download it as one big fiel:
hadoop fs -getmerge dir file1
This command is similar to thecatcommand in Linux. It is very useful when we want to get the MapReduce ouput as one file rather than serveral small partitioned files. - Delete file1 unser the current directory:
hadoop fs -rm file1 - Download file1 from HDFS:
hadoop fs -get file1 - Change the group memebership of a regular file:
hadoop fs -chgrp hadoop file1 - Change the ownership of a regular file:
hadoop fs -chown hduser file1 - Change the mode of a file
hadoop fs -chmod 600 file1 - Set the replication factor of file1 to be 3:
hadoop fs -setrep -w 3 file1 - Create an empty file:
hadoop fs -touchz 0file
Configuring the HDFS quota
Manage an HDFS quota:
- Set the name quota on the home directory:
hadoop dfsadmin -setQuota 20 /user/hduser
This command will set name quota on the home directory to 20, which means at most 20 files inclusing directories can be created under the home directory.If we reach the quota, an error message will be given. - Set the space quota of the current user's home directory to be 100000000:
hadoop dfsadmin -setSpaceQuota 100000000 /home/hduser - Check the quota status:
hadoop fs -count -q /user/hduser
The meaning of output columns is DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME or QUOTA, REMAINING_QUOTA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE,FILE_NAME. - Clear the name quota:
hadoop dfsadmin -clrQuota /user/hduser - Clear the space quota:
hadoop dfsadmin -clrSpaceQuota /user/hduser
Configuring CapacityScheduler
Hadoop CapacityScheduler is a pluggable MapReduce job scheduler. The goal is to maximize the Hadoop cluster utilization by sharding the cluster among multiple users. CapacityScheduler uses queues to quarantee the minimum share of each user. It has features of being secure, elastic, operable, and supporting job priority.Configure CapacityScheduler:
- configure Hadoop to use CapacityScheduler:
mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.CapacityTaskScheduler - Define a new queue, hdqueue:
mapred.queue.names default,hdqueue - Configure CapacityScheduler queues in $HADOOP_HOME/conf/capacity-scheduler.xml:
mapred.capacity-scheduler.queue.hdqueue.capacity 20 mapred.capacity-scheduler.queue.default.capacity 80 mapred.capacity-scheduler.queue.hdqueue.minimum-user-limit-percent 20 mapred.capacity-scheduler.maximum-system-jobs 10 Maximum number of jobs in the system which can be initialized, concurrently, by the CapacityScheduler. mapred.capacity-scheduler.queue.hdqueue.maximum-initialized-active-tasks 500 The maximum number of tasks, across all jobs in the queue, which can be initialzied concurrently. Once the queue's jobs exceed this limit they will be queued on disk. mapred.capacity-scheduler.queue.hdqueue.maximum-initialized-active-tasks-per-user 100 mapred.capacity-scheduler.queue.hdqueue.supports-priority true - stop-mapred.sh
start-mapred.sh - Get the schedule details of each queue by Opening the URL.
- Test the queue configuration by submitting example wordcount job to the queue hdqueue:
hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount -Dmapred.job.queue.name=hdqueue randtext wordcount.out
Hadoop supports access control on the queue using queue ACLs. Queue ACLs control the authorization of MapReducer job submission to a queue.
Configuring Fair Scheduler
Similar to Capacity Scheduler, Fair Scheduler was designed to enforce fair shares of cluster resources in a multiuser envrionment.Configure Hadoop Fair Scheduler:
- Enable fair scheduling in $HADOOP_HOME/conf/mapred-site.xml
mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.FairScheduler - Create the Fair Scheduelr configuration file, $HADOOP_HOME/conf/fair-scheduler.xml:
5 5 90 20 2.0 1 3 - stop-mapred.sh
- start-mapred.sh
The Hadoop Fair Scheduler schedules jobs in such a way that all jobs can get an equal share of computing resources. Jobs are organized with scheduling pools. A pool can be configured for each Hadoop user. If the pool for a user is not configured, the default pool will be used. A pool specifies the amounts of resources a user can share on the cluster, for example the number of map slots, reducer slots, the total number of running jobs, and so on.
minMaps and minReduces are used to ensure the minimum share of computing slots on the cluster of a pool. The minimum share guarantee can be useful when the required number of computing slots is larger than the number of configured slots. In case the minimum share of a pool is not met, JobTracker will kill tasks on other pools and assign the slots to the starving pool. In such cases, the JobTracker will restart the killed tasks on other nodes and thus, the job will take a longer time to finish. (这里还真没看明白,JOB 执行时间变长了,怎么还是 useful? 还是说后面解释的 in case 与上文没有关系?)
Besides computing slots, the Fair Scheduler can limit the number of concurrency running jobs and tasks on a pool. So ,if a user submits more jobs than the configured limit, some jobs have to in-queue until other jobs finish. In such a case, higher priority jobs will be scheduled by the Fair Scheduler to run earlier than lower priority jobs. If all jobs in the waiting queue have the same priority, the Fair Scheduler can be configured to schedule these jobs with either Fair Scheduler or FIFO Scheduler.
Properties supported by Fair Scheduler:| Property | Value | Description |
|---|---|---|
| minMaps | integer | Minimum map slots of a pool |
| minReduces | integer | Minimum reduce slots of a pool |
| maxMaps | ||
| maxReduces | ||
| schedulingMode | Fair/FIFO | Pool internal scheduling mode |
| maxRunningJobs | Integer | Maximum number of concurrently running jobs for a pool. Default value is unlimited. |
| weight | float | Value to control non-proportaional share of a cluster resoruce. The default value is 1.0 |
| minSharePreemptionTimeout | integer | Seconds to wait before killing other pool's tasks if a pool's share is under the minimum share. |
| poolMaxJobsDefault | integer | Default maximum number of concurrently running jobs for a pool. |
| userMaxJobsDefault | integer | Default maximum number of concurrently running jobs for a user |
| defaultMinSharePreemptionTimeout | integer | Default seconds to wait before killing other pools' tasks when a pool's share is uner minimum share |
| fairSharePreemptionTimeout | integer | Pre-emption time when a job's resource is below haflf of the fair share. |
| defaultPoolSchedulingMode | Fair/FIFO | Default in-pool scheduling mode |
Configuring Hadoop daemon logging
- Check the current logging level of JobTracker:
hadoop daemonlog -getlevel master:50030 org.apache.hadoop.mapred.JobTracker - Tell Hadoop to only log error events for JobTracker:
hadoop daemonlog -setlevel master:50030 org.apache.hadoop.mapred.JobTracker ERROR - Get the log for TaskTracker, NameNode, and DataNode:
hadoop daemonlog -getlevel master:50070 org.apache.hadoop.dfs.NameNode
hadoop daemonlog -getlevel master:50030 org.apache.hadoop.mapred.TaskTracker
hadoop daemonlog -getlevel master:50070 org.apache.hadoop.dfs.DataNode
By default, Hadoop sends log messages to Log4j, which is configured in the fiel $HADOOP_HOME/conf/log4j.properies. This file defines both what to log and where to log. For applications, the default root logger is INFO, console, which logs all messages at level INFO and above the console's stderr. Log files are named $HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-
Hadoop supoorts a number of log leveles for different purposes. The log level should be tuned based on the purpose of logging.
Logging levels provided by Log4j:| Log level | Description |
|---|---|
| ALL | The lowest logging level, all loggings will be tuned on. |
| DEBUG | Logging events useful for debugging applications. |
| ERROR | Logging error events, but applications can continue to run. |
| FATAL | Logging very severe error events that will abort application. |
| INFO | Logging informational messages that indicates the progress of applications. |
| OFF | Logging will bw turned off. |
| TRACE | Logging more finger-grained events for application debugging. |
| TRACE_INT | Logging in TRACE level on integer values. |
| WARN | Logging potentially harmful events. |
Configuring Hadoop logging with hadoop-env.sh
Open the file $HADOOP_HOME/conf/hadoop-env.sh:export HADOOP_LOG_DIR=/var/log/hadoopOther environment variables:
| Variable name | Description |
| HADOOP_LOG_DIR | Directory for log files. |
| HADOOP_PID_DIR | Directory to store the PID for the events. |
| HADOOP_ROOT_LOGGER | Logging configuration for hadoop.root.logger.default, "INFO,console" |
| HADOOP_SECURITY_LOGGER | Logging configuration for hadoop.security.logger.default, "INFO,NullAppender" |
| HADOOP_AUDIT_LOGGER | Logging configuration for hdfs.audit.logger.default, "INFO,NullAppender" |
Configuring Hadoop security logging
Security logging can help Hadoop cluster administrators to identify security problems.It is enabled by default. The security logging configuration is located in the file $HADOOP_HOME/conf/log4j.properties. By default, the security logging information is appended to the same file as NameNode logging.grep security $HADOOP_HOME/logs/hadoop-hduser-namenode-master.log
Configuring Hadoop audit logging
Audit logging might be required for data processing systems such as Hadoop. In Hadoop, audit logging has been implemnted using the Log4j Java logging library at the INFO logging level. By default, Hadoop audit logging is disabled.Configure Hadoop audit logging:
- Enable audit logging in the $HADOOP_HOME/conf/log4j.properties file:
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=INFO
- Try making a directory on HDFS:
hadoop fs -mkdir audittest - Check the audit log messages in the NameNode log file:
grep org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit $HADOOP_HOME/logs/hadoop-hduser-namenode-master.log
The Hadoop NameNode is responsible for managing audit logging messages, which are forwarded to the NameNode logging facility. - Separate the audit logging messages from the NameNode logging messages by configuring the file $HADOOP_HOME/conf/log4j.properties:
# Log at INFO level, SYSLOG appenders log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=INFO # Disable forwarding the audit logging message to the NameNode logger. log4j.additivity.org.apche.hadoop.hdfs.server.namenode.FSNamesystem.audit=false ########################################### # Configure logging appender ########################################### # # Daily Rolling File Appender (DRFA) log4j.appender.DRFAAUDIT=org.apache.log4j.DailyRollingFileAppender log4j.appender.DRFAAUDIT.File=$HADOOP_HOME/log/audit.log log4j.appender.DRFAAUDIT.DatePattern=.yyyy-MM-dd log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout log4j.appender.DRFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c:%m%n
Hadoop logs auditing messages of operations, such as creating, changing or deleting files into a configured log file. By default, audit logging is set to WARN, which disables audit logging. To enable it, the logging level needs be changed to INFO.
When a Hadoop cluster has many jobs to run, the log file can become large very quickly. Log file rotation is a function that periodically rotates a log file to a different name.
Upgrading Hadoop
In the process of upgrading a Hadoop cluster, we want to minimize the damage to the data stored on HDFS, and this proceduer is the cause of most of the upgrade problems. The data damages can be caused by either human operation or software and hardware failures. So, a backup of the data might be necessay.But the sheer size of the dta on HDFS can be a headache for most of the upgrade experience.
A more practical way is to only back up the HDFS filesystem metadta on the master node, while leaving the data blocks intact. If some data blocks are lost after upgrade, Hadoop can automatically recover it from other backup replications.
- stop-all.sh
- Back up block locations of the data on HDFS:
hadoop fsck / -files -blocks -locations > dfs.block.locations.fsck.backup - Save the list of all files on the HDFS filesystem:
hadoop dfs -lsr / > dfs.namespace.lsr.backup - Save the description of each DataNode in the HDFS cluster:
hadoop dfsadmin -report > dfs.datanodes.report.backup - Copy the checkpoint files to a backup directory:
sudo cp dfs.name.dir/edits /backup sudo cp dfs.name.dir/image/fsimage /backup - Verify that no DataNode daemon is running:
for node in 'cat $HADOOP_HOME/conf/slaves'; do echo 'Checking node' $node ssh $node -C "jps" done - If any DataNode process is still running, kill the process.
ssh $node -C "jps | grep 'DataNode' | cut -d '\t' -f 1 | xargs kill -9" - Decompress the new verison of Hadoop archive file.
- Copy the configuration files from the old configuration directory to the new one using one:
sudo cp $HADOOP_HOME/conf/* /usr/local/hadoop-1.2.0/conf/* - Update the Hadoop sysmbolic link to the Hadoop version:
sudo rm -rf /usr/local/hadoop
sudo ln -s /usr/local/hadoop-1.2.0 /usr/local/hadoop - Upgrade in the slave nodes:
for host in 'cat $HADOOP_HOME/conf/slaves'; do echo 'Configuring hadoop on slave node ' $host sudo scp -r /usr/local/hadoop-1.2.0 hduser@$host:/usr/local echo 'Making sysbolic link for Hadoop home directory on host ' $host sudo ssh hduser@$host -C "ln -s /usr/local/hadoop-1.2.0 /usr/local/hadoop" done - Upgrade the NameNode:
hadoop namenode -upgrade
This command will convert the checkpoint to the new version format. We need to wait to let it finish. - start-dfs.sh
- Get the list of all files on HDFS and compare its difference with the backed up one:
hadoop dfs -lsr / > dfs.namespace.lsr.new
diff dfs.namespace.lsr.new dfs.namespace.lsr.backup
The two files should have the same content if there is no error in the upgrade. - Get a new report of each DataNode in the cluster and compare the file with the backed up one:
hadoop dfsadmin -report > dfs.datanodes.report.new
diff dfs.datanodes.report.new dfs.datanodes.report.backup - Get the locations of all data blocks and compare the output with the previous backup:
hadoop fsck / -files -blocks -locations > dfs.block.locations.fsck.new
diff dfs.block.locations.fsck.new dfs.block.locations.fsck.backup - start-mapred.sh