org.apache.hadoop.mapred.lib
Class FieldSelectionMapReduce

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.FieldSelectionMapReduce
All Implemented Interfaces:
Closeable, JobConfigurable, Mapper, Reducer

public class FieldSelectionMapReduce
extends Object
implements Mapper, Reducer

This class implements a mapper/reducer class that can be used to perform field selections in a manner similar to unix cut. The input data is treated as fields separated by a user specified separator (the default value is "\t"). The user can specify a list of fields that form the map output keys, and a list of fields that form the map output values. If the inputformat is TextInputFormat, the mapper will ignore the key to the map function. and the fields are from the value only. Otherwise, the fields are the union of those from the key and those from the value. The field separator is under attribute "mapred.data.field.separator" The map output field list spec is under attribute "map.output.key.value.fields.spec". The value is expected to be like "keyFieldsSpec:valueFieldsSpec" key/valueFieldsSpec are comma (,) separated field spec: fieldSpec,fieldSpec,fieldSpec ... Each field spec can be a simple number (e.g. 5) specifying a specific field, or a range (like 2-5) to specify a range of fields, or an open range (like 3-) specifying all the fields starting from field 3. The open range field spec applies value fields only. They have no effect on the key fields. Here is an example: "4,3,0,1:6,5,1-3,7-". It specifies to use fields 4,3,0 and 1 for keys, and use fields 6,5,1,2,3,7 and above for values. The reduce output field list spec is under attribute "reduce.output.key.value.fields.spec". The reducer extracts output key/value pairs in a similar manner, except that the key is never ignored.


Field Summary
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
FieldSelectionMapReduce()
           
 
Method Summary
 void close()
          Called after the last call to any other method on this object to free and/or flush resources.
 void configure(JobConf job)
          Initializes a new instance from a JobConf.
 void map(WritableComparable key, Writable val, OutputCollector output, Reporter reporter)
          The identify function.
 void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter)
          Combines values for a given key.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

FieldSelectionMapReduce

public FieldSelectionMapReduce()
Method Detail

map

public void map(WritableComparable key,
                Writable val,
                OutputCollector output,
                Reporter reporter)
         throws IOException
The identify function. Input key/value pair is written directly to output.

Specified by:
map in interface Mapper
Parameters:
key - the key
val - the values
output - collects mapped keys and values
Throws:
IOException

configure

public void configure(JobConf job)
Description copied from interface: JobConfigurable
Initializes a new instance from a JobConf.

Specified by:
configure in interface JobConfigurable
Parameters:
job - the configuration

close

public void close()
           throws IOException
Description copied from interface: Closeable
Called after the last call to any other method on this object to free and/or flush resources. Typical implementations do nothing.

Specified by:
close in interface Closeable
Throws:
IOException

reduce

public void reduce(WritableComparable key,
                   Iterator values,
                   OutputCollector output,
                   Reporter reporter)
            throws IOException
Description copied from interface: Reducer
Combines values for a given key. Output values must be of the same type as input values. Input keys must not be altered. Typically all values are combined into zero or one value. Output pairs are collected with calls to OutputCollector.collect(WritableComparable,Writable).

Specified by:
reduce in interface Reducer
Parameters:
key - the key
values - the values to combine
output - to collect combined values
Throws:
IOException


Copyright © 2006 The Apache Software Foundation