生活在别处: Running Hadoop (Single-Node Cluster)

FROM: Running Hadoop On Ubuntu Linux (Single-Node Cluster)

Hadoop 是用 Java 编写对大数据量进行分布式处理的框架，包括 MapReduce 和 HDFS 两部分。HDFS 是用来部署在低廉的硬件上的分布式文件系统，它不仅容错性高，而且具有高传输率，这为分布式计算存储提供了底层支持。 MapReduce 包括任务分解与结果汇总，
详细了解可看这篇论文 MapReduce: Simplified Data Processing on Large Custers 。

准备

Sun Java 6

Hadoop 需要 Java 1.5.x(aka 5.0.x)的工作环境，推荐使用 Java 1.6.x(aka 6.0x aka 6)，安装请看这里。

添加系统专用帐号

[bash]
addgroup hadoop
adduser --ingroup hadoop hduser
[/bash]

Configuring SSH

Hadoop 通过 SSH 去控制各个结点。假设测试机器上已提供 SSH 服务，且允许使用 SSH public key 认证。下面步骤是为用户 hduesr 生成 SSH key :
[bash]
su - hduser
ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@localdomain
The key's randomart image is:
[...snipp...]
[/bash]
第二行将会生成一无密码的 RSA key 对。通常来说，不推荐使用空密码，但因为在这里需要直接访问，而不是每次 Hadoop 与节点交互时还需要手动输入 Passphrase。
[bash]
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
[/bash]
这样，SSH 将会使用新生成的 key 来访问。
最后一步，测试 SSH 使用 hduser 是否可以访问机器。这一步需要将机器的 host key fingerprint 加入到 hduser 的 known_hosts 文件中。如果 SSH 还有其它特殊的配置如不使用标准的 22 端口，可以将其这些配置选项定义到 $HOME/.ssh/config 文件中。
[bash]
[hduser@localdomain ~]# ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is d7:87:25:47:ae:02:00:eb:1d:75:4f:bb:44:f9:36:26.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
...
[hduser@localdomain ~]
[/bash]
如果连接失败，下面 Notes 可供参考：

ssh -vvv localhost使用该命令激活 Debug，这样将可看到错误的详细信息。

检查 SSH server 的配置文件 /etc/ssh/sshd_config，特别需要注意的是 PubkeyAuthentication (需设置为 yes) 和 AllowUsers (如果这个选项有激活，需要将 hduser 加入其中)选项。若有修改任何的配置，需要重启 SSH 。/etc/init.d/sshd restart

Disabling IPv6

如果不需要使用到 IPv6 ，可直接在 /etc/sysctl.conf 中 disable 系统的 IPv6。
[bash]
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
[/bash]
修改该配置后，需要重启机器才能使其生效。

可用如下命令检查机器是否有启用 IPv6:
[bash]
cat /proc/sys/net/ipv6/conf/all/disable_ipv6
[/bash]
如果返回是 0说明 IPv6 已经启用，而返回值是 1时则是没有启用。

或者
根据文档 https://issues.apache.org/jira/browse/HADOOP-3437 中所描述，也可以在 conf/hadoop-env.sh 文件中加入如下选项使 Hadoop disable IPv6。
[bash]
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
[/bash]

Hadoop

Installation

从 Apache Download Mirrors 中下载　Hadoop后，移至目标目录，解压，修改其属主。
[bash]
$ cd /usr/local
$ sudo tar xzf hadoop-1.0.3.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop
[/bash]
或者第三行也可以是建立到 hadoop-1.0.3 的软链接 hadoop ln -s hadoop-1.0.3.tar.gz hadoop。

Update $HOME/.bashrc

将下面这些行加入到用户 hduser 的 $HOME/.bashrc 文件中。如果使用的 shell 不是 bash，同理更新其对应的配置文件。
[bash gutter="false"]
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-sun

# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}

# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
[/bash]

Excursus: (HDFS)

From The Hadoop Distributed File System: Architecture and Design：

The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the difference from other distributed file system are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop project, which is part of the Apache Lucene project.

The following picture gives an overview of the most important HDFS components.

HDFS Architecture (source: http://hadoop.apache.org/core/docs/current/hdfs_design.html)

Configuration

hadoop-env.sh
在这里，Hadoop 只需要配置 JAVA_HOME 这一个环境变量。打开 Hadoop 的环境配置文件 /usr/local/hodoop/conf/hadoop-env.conf (假设安装目录为 /usr/local/hodoop)，修改 JAVA_HOME，使其指到 SUN JDK/JRE 6 的目录。
[bash]
# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/jdk-1.6.0
[/bash]
Note: 如果使用的是 Mac X 10.7 系统，可使用如下方法设置：
[bash]
# for our Mac users
export JAVA_HOME=`/usr/libexec/java_home`
[/bash]

conf/*-site.xml
Note: As of Hadoop 0.20x and 1.x, the configuration settings previously found in hadoop-site.xml were moved to core-site.xml(hadoop.tmp.dir, fs.default.name), mapred-site.xml(mapred.job.tracker) and hdfs-site.xml(dfs.replication).

这部分，包括配置 Hadoop 的数据存放目录，网络监听端口等。将会使用到 HDFS，即使这里的 cluster 只包含一个节点。

在这里，将 hadoop.tmp.dir 指到 /tmp/hadoop 目录。Hadoop 的默认配置是视其 hadoop.tmp.dir作为本地文件系统和 HDFS 临时存储的根目录，所以，不用惊讶于在该目录下找到 Hadoop 自动的创建的特定的目录。
[bash]
$ sudo mkdir /tmp/hadoop
$ sudo chown hduser:hadoop /tmp/hadoop
# ...and if you want to tighten up security, chmod from 755 to 750...
$ sudo chmod 750 /tmp/hadoop
[/bash]
如果忘记设置其属主和权限，在格式化节点时将会收到 java.io.IOExceiption 异常。

将下面 snippets 加入到指定的 XML 文件的 <configuration> ... </configuration> 标签中。

文件 conf/core-site.xml
[bash gutter="false"]

<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
[/bash]

文件 conf/mapred-site.xml
[bash gutter="false"]

<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
[/bash]

文件 conf/hdfs-site.xml
[bash gutter="false"]

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
[/bash]
关于配置选项，若有任何疑问，可参看 Getting Started with Hadoop 和文档 Hadoop’s API Overview 。

Formatting the HDFS filesystem via the NameNode

Hadoop cluster 安装的第一步是格式化实现于 cluster 本地文件系统上的 Hadoop 文件系统。

Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster(in HDFS).

运行如下命令格式化 (实际上就是初始化 dfs.name.dir 选项指定的目录)
[bash]
$ /usr/local/hadoop/bin/hadoop namenode -formate
[/bash]

输出如下：
[bash gutter="false"]
[hduser@localdomain /usr/local/hadoop]$ bin/hadoop namenode -format
10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = localdomain/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop
10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.
10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
[hduser@localdomain /usr/local/hadoop]$
[/bash]

Starting your single-node cluster

执行命令：
[bash]
[hduser@localdomain ~]$ /usr/local/hadoop/bin/start-all.sh
[/bash]
这将会启动该机器上的 NameNode, DataNode, Jobtracker 和 Tasktracker。
输出如下：
[bash gutter="false"]
[hduser@localdomain ~]$ /usr/local/hadoop$ bin/start-all.sh
starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-localdomain.out
localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-localdomain.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-localdomain.out
starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-jobtracker-localdomain.out
localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-localdomain.out
[hduser@localdomain ~]$
[/bash]
一个比较简捷的方法是执行 jps (part of Sun's Java since v1.5.0)来检测 Hadoop 的进程是否有在运行。更多详细可看 How to debug MapReduce programs.
[bash]
[hduser@localdomin ~]$ /usr/local/hadoop$ jps
2287 TaskTracker
2149 JobTracker
1938 DataNode
2085 SecondaryNameNode
2349 Jps
1788 NameNode
[/bash]
也可使用 netstat 命令来检测 Hadoop 是否有监听配置的端口。
[bash gutter="false"]
[hduser@localdomin ~]$ sudo netstat -plten | grep java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 9236 2471/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 9998 2628/java
tcp 0 0 0.0.0.0:48159 0.0.0.0:* LISTEN 1001 8496 2628/java
tcp 0 0 0.0.0.0:53121 0.0.0.0:* LISTEN 1001 9228 2857/java
tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 8143 2471/java
tcp 0 0 127.0.0.1:54311 0.0.0.0:* LISTEN 1001 9230 2857/java
tcp 0 0 0.0.0.0:59305 0.0.0.0:* LISTEN 1001 8141 2471/java
tcp 0 0 0.0.0.0:50060 0.0.0.0:* LISTEN 1001 9857 3005/java
tcp 0 0 0.0.0.0:49900 0.0.0.0:* LISTEN 1001 9037 2785/java
tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1001 9773 2857/java
[hduser@localdomin ~]$
[/bash]
如果有任何的错误，可在 /logs/ 中查看其日志文件。

Stopping your single-node cluster

运行命令：
[bash]
$ /usr/local/hadoop/bin/stop-all.sh
[/bash]
输出类似：
[bash gutter="false"]
[hduser@localdoamin /usr/local/hadoop]$ bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
[hduser@localdoamin /usr/local/hadoop]$
[/bash]

Running a MapReduce job

这里使用 WordCount来做为第一个运行的MapReduce Job。这个 Job 是读取 text 文件并计算其单词出现的次数，输入和输出都是文件，更多信息查看 Hadoop wiki 上的 what happens behind the scenes.

Download example input data
在这个示例中，使用 Gutenberg 项目提供的三本电子书：
下载每本电子书的 Plain Text UTF-8 版本，存入目录：/tmp/gutenburg (YMMV)
[bash gutter="false"]
[hduser@localdomain /tmp/gutenburg]$ ls -l
-rw-r--r-- 1 hduser hadoop 674566 Aug 1 16:01 pg20417.txt
-rw-r--r-- 1 hduser hadoop 1573150 Aug 1 16:01 pg4300.txt
-rw-r--r-- 1 hduser hadoop 1423801 Aug 1 16:01 pg5000.txt
[/bash]

Restart the Hadoop cluster
如果 Hadoop Cluster 没有运行，重启服务。
[bash]
$ /usr/local/hadoop/bin/start-all.sh
[/bash]

Copy local example data to HDFS
在执行 MapReduce job 前，首先需将本地文件系统上的文件复制到 Hadoop 的 HDFS 中。
[bash]
[hduser@localdomin /usr/local/hadoop]$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberg
[hduser@localdomin /usr/local/hadoop]$ bin/hadoop dfs -ls /user/hduser
Found 1 items
drwxr-xr-x - hduser supergroup 0 2010-05-08 17:40 /user/hduser/gutenberg
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hduser/gutenberg
Found 3 items
-rw-r--r-- 3 hduser supergroup 674566 2011-03-10 11:38 /user/hduser/gutenberg/pg20417.txt
-rw-r--r-- 3 hduser supergroup 1573112 2011-03-10 11:38 /user/hduser/gutenberg/pg4300.txt
-rw-r--r-- 3 hduser supergroup 1423801 2011-03-10 11:38 /user/hduser/gutenberg/pg5000.txt
[hduser@localdomin /usr/local/hadoop]$
[/bash]

Run the MapReduce job
现在，正式开始运行 WordCount job:
[bash]
[hduser@localdomin /usr/local/hadoop]$ bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
[/bash]
上面命令将会读取 HDFS /user/hduser/gutenberg 目录中所有文件，处理后，然后将输出保存到 HDFS 的 /user/hduser/gutenberg-output 目录中。
Note: 执行上面命令收到类似下面的错误时:
Exception in thread "main" java.io.IOException: Error opening job jar: hadoop*examples*.jar at org.apache.hadoop.util.RunJar.main (RunJar.java: 90) Caused by: java.util.zip.ZipException: error in opening zip file
用 Hadoop example JAR 全名取代 hadoop*examples*.jar，重新执行即可， eg：
[hduser@localdomin /usr/local/hadoop]$ bin/hadoop jar hadoop-examples-1.0.3.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output

输出类似如下：
[bash gutter="false"]
[hduser@localdomin /usr/local/hadoop]$ bin/hadoop jar hadoop-examples-1.0.3.jar.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
10/05/08 17:43:00 INFO input.FileInputFormat: Total input paths to process : 3
10/05/08 17:43:01 INFO mapred.JobClient: Running job: job_201005081732_0001
10/05/08 17:43:02 INFO mapred.JobClient: map 0% reduce 0%
10/05/08 17:43:14 INFO mapred.JobClient: map 66% reduce 0%
10/05/08 17:43:17 INFO mapred.JobClient: map 100% reduce 0%
10/05/08 17:43:26 INFO mapred.JobClient: map 100% reduce 100%
10/05/08 17:43:28 INFO mapred.JobClient: Job complete: job_201005081732_0001
10/05/08 17:43:28 INFO mapred.JobClient: Counters: 17
10/05/08 17:43:28 INFO mapred.JobClient: Job Counters
10/05/08 17:43:28 INFO mapred.JobClient: Launched reduce tasks=1
10/05/08 17:43:28 INFO mapred.JobClient: Launched map tasks=3
10/05/08 17:43:28 INFO mapred.JobClient: Data-local map tasks=3
10/05/08 17:43:28 INFO mapred.JobClient: FileSystemCounters
10/05/08 17:43:28 INFO mapred.JobClient: FILE_BYTES_READ=2214026
10/05/08 17:43:28 INFO mapred.JobClient: HDFS_BYTES_READ=3639512
10/05/08 17:43:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3687918
10/05/08 17:43:28 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=880330
10/05/08 17:43:28 INFO mapred.JobClient: Map-Reduce Framework
10/05/08 17:43:28 INFO mapred.JobClient: Reduce input groups=82290
10/05/08 17:43:28 INFO mapred.JobClient: Combine output records=102286
10/05/08 17:43:28 INFO mapred.JobClient: Map input records=77934
10/05/08 17:43:28 INFO mapred.JobClient: Reduce shuffle bytes=1473796
10/05/08 17:43:28 INFO mapred.JobClient: Reduce output records=82290
10/05/08 17:43:28 INFO mapred.JobClient: Spilled Records=255874
10/05/08 17:43:28 INFO mapred.JobClient: Map output bytes=6076267
10/05/08 17:43:28 INFO mapred.JobClient: Combine input records=629187
10/05/08 17:43:28 INFO mapred.JobClient: Map output records=629187
10/05/08 17:43:28 INFO mapred.JobClient: Reduce input records=102286
[/bash]
检测运行结果是否有存入 Hadoop HDFS 中 /user/hduser/gutenberg-output 目录中：
[bash gutter="false"]
[hduser@localdomin /usr/local/hadoop]$ bin/hadoop dfs -ls /user/hduser
Found 2 items
drwxr-xr-x - hduser supergroup 0 2010-05-08 17:40 /user/hduser/gutenberg
drwxr-xr-x - hduser supergroup 0 2010-05-08 17:43 /user/hduser/gutenberg-output
[hduser@localdomin /usr/local/hadoop]$ bin/hadoop dfs -ls /user/hduser/gutenberg-output
Found 2 items
drwxr-xr-x - hduser supergroup 0 2010-05-08 17:43 /user/hduser/gutenberg-output/_logs
-rw-r--r-- 1 hduser supergroup 880802 2010-05-08 17:43 /user/hduser/gutenberg-output/part-r-00000
[hduser@localdomin /usr/local/hadoop]$
[/bash]
如果想在执行时修改某些 Hadoop 的设置如增加 Reduce tasks 的个数，可直接使用 -D 选项来做修改：
[bash]
[hduser@localdomin /usr/local/hadoop]$ bin/hadoop jar hadoop-examples-1.0.3.jar wordcount -D mapred.reduce.tasks=16 /user/hduser/gutenberg /user/hduser/gutenberg-output
[/bash]
An important note about mapred.map.tasks: Hadoop does not honor mapred.map.tasks beyond considering it a hint. But it accepts the user specified mapred.reduce.tasks and doesn't manipulate that. You cannot force mapred.map.tasks but you can specify mapred.reduce.tasks.

Retrieve the job result from HDFS
为了查看结果，可将 HDFS 中保存执行结果的文件复制到本地文件系统中。或者，使用如下命令直接查看：
[bash]
[hduser@localdomain /usr/local/hadoop]$ bin/hadoop dfs -cat /user/hduser/gutenberg-output/part-r-00000
[/bash]

而如果需要将 HDFS 中文件复制到本地文件系统中，可参考如下：
[bash]
[hduser@localdomain /usr/local/hadoop]$ mkdir /tmp/gutenberg-output
[hduser@localdomain /usr/local/hadoop]$bin/hadoop dfs -getmerge /user/hduser/gutenberg-output /tmp/gutenberg-output
[hduser@localdomain /usr/local/hadoop]$head /tmp/gutenberg-output/gutenberg-output
"(Lo)cra" 1
"1490 1
"1498," 1
"35" 1
"40," 1
"A 2
"AS-IS". 1
"A_ 1
"Absoluti 1
"Alack! 1
[hduser@localdomain /usr/local/hadoop]$
[/bash]
注意：上面head命令输出结果中的引号 "，它们不是由 Hadoop 生成的，而是 WordCount 中结果的单词标识生成器(word tokenizer)。
fs -getmerge 命令将会将指定目录下的所有文件进行简单合并。这意味着这些被合并的文件在合并前并没有进行排序 (通常情况下都是如此)。The command fs -getmerge will simply concatenate any files it finds in the directory you specify. This means that the merged file might(and most likely will) not be sorted.

Hadoop Web Interfaces

默认情况下，Hadoop 有提供查看 Hadoop cluster 中详细信息的 Web 界面。具体配置在conf/hadoop-default.xml：

http://localhost:50070/ - web UI of the NameNode daemon

http://localhost:50030/ - web UI of the JobTracker daemon

http://localhost:50060/ - web UI of the TaskTracker daemon

NameNode Web Interface (HDFS layer)
NameNode web 界面展示 cluster 的概览，包括 total/remaining capacity, live/dead nodes。另外，还可浏览 HDFS 的 namespace 及浏览其包括的文件内容和查看本地主机中 Hadoop 的日志文件的访问地址。

默认情况下，访问地址是：http://localhost:50070/

A screenshot of Hadoop's Name Node web interface.

Job Tracker Web Interface(MapReduce layer)
JobTracker web 界面提供查看 Hadoop cluster 执行 job 的统计信息。如 running/completed/failed jobs 和 job 的历史日志文件。另外，也提供查看本地主机 Hadoop 日志文件的访问地址。

默认情况下，访问地址是：http://localhost:50030/

A screenshot of Hadoop's Job Tracker web interface.

TaskTracker Web Interface(MapReduce layer)
TaskTracker web 界面显示 running/non-running tasks。它也提供了查看本地主机的 Hadoop 日志文件的访问地址。

默认情况下，访问地址是：http://localhost:50060/

A screenshot of Hadoop's Task Tracker web interface.

生活在别处

2012年8月5日星期日

Running Hadoop (Single-Node Cluster)

准备