Lets MapReduce with Pentaho Data Integrator
I was exploring Pentaho Data Integrator for quite some time, and always wanted to see how to work with BigData using Pentaho. Today I got a chance to do some simple MapReduce with Pentaho on the Airline dataset which I had dumped into my Cloudera Hadoop cluster (Sample Datasets).
Before doing MR in Pentaho, we need it to configure it to work with Cloudera (by default it is configured to work with Apache). It can be easily done in 5 minutes by following this link: