org.apache.hadoop.mapred
Class FileInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat
All Implemented Interfaces:
InputFormat
Direct Known Subclasses:
InputFormatBase, SequenceFileInputFormat, TextInputFormat

public abstract class FileInputFormat
extends Object
implements InputFormat

A base class for InputFormat.


Field Summary
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
FileInputFormat()
           
 
Method Summary
abstract  RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter)
          Construct a RecordReader for a FileSplit.
 InputSplit[] getSplits(JobConf job, int numSplits)
          Splits files returned by listPaths(JobConf) when they're too big.
protected  boolean isSplitable(FileSystem fs, Path filename)
          Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be.
protected  Path[] listPaths(JobConf job)
          List input directories.
protected  void setMinSplitSize(long minSplitSize)
           
 void validateInput(JobConf job)
          Are the input directories valid? This method is used to test the input directories when a job is submitted so that the framework can fail early with a useful error message when the input directory does not exist.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

FileInputFormat

public FileInputFormat()
Method Detail

setMinSplitSize

protected void setMinSplitSize(long minSplitSize)

isSplitable

protected boolean isSplitable(FileSystem fs,
                              Path filename)
Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be.

Parameters:
fs - the file system that the file is on
filename - the file name to check
Returns:
is this file splitable?

getRecordReader

public abstract RecordReader getRecordReader(InputSplit split,
                                             JobConf job,
                                             Reporter reporter)
                                      throws IOException
Description copied from interface: InputFormat
Construct a RecordReader for a FileSplit.

Specified by:
getRecordReader in interface InputFormat
Parameters:
split - the InputSplit
job - the job that this split belongs to
Returns:
a RecordReader
Throws:
IOException

listPaths

protected Path[] listPaths(JobConf job)
                    throws IOException
List input directories. Subclasses may override to, e.g., select only files matching a regular expression.

Parameters:
job - the job to list input paths for
Returns:
array of Path objects
Throws:
IOException - if zero items.

validateInput

public void validateInput(JobConf job)
                   throws IOException
Description copied from interface: InputFormat
Are the input directories valid? This method is used to test the input directories when a job is submitted so that the framework can fail early with a useful error message when the input directory does not exist.

Specified by:
validateInput in interface InputFormat
Parameters:
job - the job to check
Throws:
InvalidInputException - if the job does not have valid input
IOException

getSplits

public InputSplit[] getSplits(JobConf job,
                              int numSplits)
                       throws IOException
Splits files returned by listPaths(JobConf) when they're too big.

Specified by:
getSplits in interface InputFormat
Parameters:
job - the job whose input files are to be split
numSplits - the desired number of splits
Returns:
the splits
Throws:
IOException


Copyright © 2006 The Apache Software Foundation