org.apache.hadoop.mapred.lib.aggregate
Class ValueAggregatorJob
java.lang.Object
org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJob
public class ValueAggregatorJob
- extends Object
This is the main class for creating a map/reduce job using Abacus framework.
The Abacus is a specialization of map/reduce framework, specilizing for
performing various simple aggregations.
Generally speaking, in order to implement an application using Map/Reduce
model, the developer is to implement Map and Reduce functions (and possibly
combine function). However, a lot of applications related to counting and
statistics computing have very similar characteristics. Abacus abstracts out
the general patterns of these functions and implementing those patterns. In
particular, the package provides generic mapper/redducer/combiner classes,
and a set of built-in value aggregators, and a generic utility class that
helps user create map/reduce jobs using the generic class. The built-in
aggregators include:
sum over numeric values
count the number of distinct values
compute the histogram of values
compute the minimum, maximum, media,average, standard deviation of numeric values
The developer using Abacus will need only to provide a plugin class
conforming to the following interface:
public interface ValueAggregatorDescriptor {
public ArrayList generateKeyValPairs(Object key, Object value);
public void configure(JobConfjob);
}
The package also provides a base class,
ValueAggregatorBaseDescriptor, implementing the above interface. The user can
extend the base class and implement generateKeyValPairs accordingly.
The primary work of generateKeyValPairs is to emit one or more key/value
pairs based on the input key/value pair. The key in an output key/value pair
encode two pieces of information: aggregation type and aggregation id. The
value will be aggregated onto the aggregation id according the aggregation
type.
This class offers a function to generate a map/reduce job using Abacus
framework. The function takes the following parameters: input directory spec
input format (text or sequence file) output directory a file specifying the
user plugin class
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ValueAggregatorJob
public ValueAggregatorJob()
createValueAggregatorJobs
public static JobControl createValueAggregatorJobs(String[] args)
throws IOException
- Throws:
IOException
createValueAggregatorJob
public static JobConf createValueAggregatorJob(String[] args)
throws IOException
- Create an Abacus based map/reduce job.
- Parameters:
args
- the arguments used for job creation
- Returns:
- a JobConf object ready for submission.
- Throws:
IOException
runJob
public static boolean runJob(JobConf job)
throws IOException
- Submit/run a map/reduce job.
- Parameters:
job
-
- Returns:
- true for success
- Throws:
IOException
main
public static void main(String[] args)
throws IOException
- create and run an Abacus based map/reduce job.
- Parameters:
args
- the arguments used for job creation
- Throws:
IOException
Copyright © 2006 The Apache Software Foundation