org.apache.hadoop.mapred
Class SequenceFileInputFilter

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat
      extended by org.apache.hadoop.mapred.SequenceFileInputFormat
          extended by org.apache.hadoop.mapred.SequenceFileInputFilter
All Implemented Interfaces:
InputFormat

public class SequenceFileInputFilter
extends SequenceFileInputFormat

A class that allows a map/red job to work on a sample of sequence files. The sample is decided by the filter class set by the job.

Author:
hairong

Nested Class Summary
static interface SequenceFileInputFilter.Filter
          filter interface
static class SequenceFileInputFilter.FilterBase
          base calss for Filters
static class SequenceFileInputFilter.MD5Filter
          This class returns a set of records by examing the MD5 digest of its key against a filtering frequency f.
static class SequenceFileInputFilter.PercentFilter
          This class returns a percentage of records The percentage is determined by a filtering frequency f using the criteria record# % f == 0.
static class SequenceFileInputFilter.RegexFilter
          Records filter by matching key to regex
 
Field Summary
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
 
Constructor Summary
SequenceFileInputFilter()
           
 
Method Summary
 RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter)
          Create a record reader for the given split
static void setFilterClass(Configuration conf, Class filterClass)
          set the filter class
 
Methods inherited from class org.apache.hadoop.mapred.SequenceFileInputFormat
listPaths
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
getSplits, isSplitable, setMinSplitSize, validateInput
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SequenceFileInputFilter

public SequenceFileInputFilter()
Method Detail

getRecordReader

public RecordReader getRecordReader(InputSplit split,
                                    JobConf job,
                                    Reporter reporter)
                             throws IOException
Create a record reader for the given split

Specified by:
getRecordReader in interface InputFormat
Overrides:
getRecordReader in class SequenceFileInputFormat
Parameters:
split - file split
job - job configuration
reporter - reporter who sends report to task tracker
Returns:
RecordReader
Throws:
IOException

setFilterClass

public static void setFilterClass(Configuration conf,
                                  Class filterClass)
set the filter class

Parameters:
conf - application configuration
filterClass - filter class


Copyright © 2006 The Apache Software Foundation