org.apache.hadoop.mapred.lib.aggregate
Class ValueAggregatorJob

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJob

public class ValueAggregatorJob
extends Object

This is the main class for creating a map/reduce job using Abacus framework. The Abacus is a specialization of map/reduce framework, specilizing for performing various simple aggregations. Generally speaking, in order to implement an application using Map/Reduce model, the developer is to implement Map and Reduce functions (and possibly combine function). However, a lot of applications related to counting and statistics computing have very similar characteristics. Abacus abstracts out the general patterns of these functions and implementing those patterns. In particular, the package provides generic mapper/redducer/combiner classes, and a set of built-in value aggregators, and a generic utility class that helps user create map/reduce jobs using the generic class. The built-in aggregators include: sum over numeric values count the number of distinct values compute the histogram of values compute the minimum, maximum, media,average, standard deviation of numeric values The developer using Abacus will need only to provide a plugin class conforming to the following interface: public interface ValueAggregatorDescriptor { public ArrayList generateKeyValPairs(Object key, Object value); public void configure(JobConfjob); } The package also provides a base class, ValueAggregatorBaseDescriptor, implementing the above interface. The user can extend the base class and implement generateKeyValPairs accordingly. The primary work of generateKeyValPairs is to emit one or more key/value pairs based on the input key/value pair. The key in an output key/value pair encode two pieces of information: aggregation type and aggregation id. The value will be aggregated onto the aggregation id according the aggregation type. This class offers a function to generate a map/reduce job using Abacus framework. The function takes the following parameters: input directory spec input format (text or sequence file) output directory a file specifying the user plugin class


Constructor Summary
ValueAggregatorJob()
           
 
Method Summary
static JobConf createValueAggregatorJob(String[] args)
          Create an Abacus based map/reduce job.
static JobControl createValueAggregatorJobs(String[] args)
           
static void main(String[] args)
          create and run an Abacus based map/reduce job.
static boolean runJob(JobConf job)
          Submit/run a map/reduce job.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ValueAggregatorJob

public ValueAggregatorJob()
Method Detail

createValueAggregatorJobs

public static JobControl createValueAggregatorJobs(String[] args)
                                            throws IOException
Throws:
IOException

createValueAggregatorJob

public static JobConf createValueAggregatorJob(String[] args)
                                        throws IOException
Create an Abacus based map/reduce job.

Parameters:
args - the arguments used for job creation
Returns:
a JobConf object ready for submission.
Throws:
IOException

runJob

public static boolean runJob(JobConf job)
                      throws IOException
Submit/run a map/reduce job.

Parameters:
job -
Returns:
true for success
Throws:
IOException

main

public static void main(String[] args)
                 throws IOException
create and run an Abacus based map/reduce job.

Parameters:
args - the arguments used for job creation
Throws:
IOException


Copyright © 2006 The Apache Software Foundation