Hi all! If you find any errors in Hadoop in Practice that aren't
listed below, or if you find something that you think isn't well explained, please post your comments in the book's
Author Online Forum. Thanks!
All of the code examples in the book were tested on CDH 3.x, and Hadoop 0.20.x.
Unfortunately when targeting a version of Apache Hadoop newer than 0.20 (which includes
1.x, 2.x and 0.21 and newer), and a version of CDH newer than 3.x, the examples
won't run due to the client and server Hadoop version mismatches.
A push to the
GitHub repo on July 15 2013 was made to fix this problem by prioritizing the locally-installed
Hadoop JAR's ahead of the Hadoop JAR's that are downloaded via the maven build.
If this change is problematic then it can be reverted by setting an environment
variable prior to running run.sh:
The second edition of the book will have a more baked version of Hadoop and third-party JAR management.
Some readers encountered an error running mvn package. They were able to resolve the problem
by adding the following Maven repository to their pom.xml file:
<path to your JDK bin directory>
<path to your JDK installation directory (which contains the bin directory)>
The listing at the bottom of the page which contains the text representation of the graph depicted in
figure 7.10 is incorrect. It should be:
dee kia ali
ali dee bob joe
joe bob ali
kia bob dee
bob kia ali joe
GitHub has been updated with this corrected file:
The code presented in this technique requires version 1.1 of RHadoop. The code may not work
with newer versions of RHadoop.
It is recommended to run Hadoop on Oracle JDK 1.6.
Some readers reported issues installing R on CentOs, and had to install the xdg-utils and
desktop-file-utils RPM's for the R installation to go through.
The RHadoop code in this book requires version 1.1 of RHadoop. There may be issues running
code examples with newer versions of RHadoop.