Sure is easy to get the Cloudera hadoop packages up and running in Debian and RPM based distributions. All you need to do is add repositories and issue an instruction to your package manager!
Since ArchLinux, distribution of choice for the wise, rollin’ and version-worry-free, has no AUR packages for installing the same I thought I might entertain you to a more manual approach of setting up your Cloudera Hadoop and running the beautiful Cloudera Desktop on it. Well actually the article caters to almost any other Linux too, but I love ArchLinux and like getting sued. Anyway, lets get on with it.
Took me a couple of minutes to hunt down the archive site of Cloudera which gave away source packages (not source rpms, those wouldn’t be what we need, at least not what I need). You can find the Cloudera’s CDH2 releases here. Navigate above and away for other releases if you want to grow older or to bleed till you feel like stemming it.
Download their Hadoop and Desktop archives (Versions 0.20.1 and 0.3.0 as of this article’s writing date).
Unpack Cloudera-Hadoop and configure as you like, format the namenode, and run it using provided bin/ scripts. Configuring help may be found on Hadoop’s site.
Unpack Cloudera-Desktop and build it using make, install using make install (use a PREFIX if you like). Next, follow this article (from 1.5 on) (README of the desktop package helps too) to pour special sauce into Cloudera-Hadoop for the Desktop to integrate smoothly. Run the desktop using cloudera-desktop/bin/supervisor (This runs a server-like process, so ensure you don’t SIGTERM it — start within screen or with a &).
Connect to your (hopefully working if Hadoop) new Cloudera Desktop using http://localhost:8088 and enjoy using the simple Job Designer tool, amongst others.
I leave the daemon user-setup and other finer cluster-related tuning to your tastes. This guide serves good for a pseudo cluster, $HOME run setup.
