org.apache.hadoop.mapred
Interface InputFormat

All Known Implementing Classes:
FileInputFormat, InputFormatBase, KeyValueTextInputFormat, SequenceFileAsTextInputFormat, SequenceFileInputFilter, SequenceFileInputFormat, StreamInputFormat, TextInputFormat

public interface InputFormat

An input data format. Input files are stored in a FileSystem. The processing of an input file may be split across multiple machines. Files are processed as sequences of records, implementing RecordReader. Files must thus be split on record boundaries.


Method Summary
 RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter)
          Construct a RecordReader for a FileSplit.
 InputSplit[] getSplits(JobConf job, int numSplits)
          Splits a set of input files.
 void validateInput(JobConf job)
          Are the input directories valid? This method is used to test the input directories when a job is submitted so that the framework can fail early with a useful error message when the input directory does not exist.
 

Method Detail

validateInput

void validateInput(JobConf job)
                   throws IOException
Are the input directories valid? This method is used to test the input directories when a job is submitted so that the framework can fail early with a useful error message when the input directory does not exist.

Parameters:
job - the job to check
Throws:
InvalidInputException - if the job does not have valid input
IOException

getSplits

InputSplit[] getSplits(JobConf job,
                       int numSplits)
                       throws IOException
Splits a set of input files. One split is created per map task.

Parameters:
job - the job whose input files are to be split
numSplits - the desired number of splits
Returns:
the splits
Throws:
IOException

getRecordReader

RecordReader getRecordReader(InputSplit split,
                             JobConf job,
                             Reporter reporter)
                             throws IOException
Construct a RecordReader for a FileSplit.

Parameters:
split - the InputSplit
job - the job that this split belongs to
Returns:
a RecordReader
Throws:
IOException


Copyright © 2006 The Apache Software Foundation