org.apache.hadoop.util
Class CopyFiles.CopyFilesMapper

java.lang.Object
  extended by org.apache.hadoop.mapred.MapReduceBase
      extended by org.apache.hadoop.util.CopyFiles.CopyFilesMapper
All Implemented Interfaces:
Closeable, JobConfigurable
Direct Known Subclasses:
CopyFiles.FSCopyFilesMapper, CopyFiles.HTTPCopyFilesMapper
Enclosing class:
CopyFiles

public abstract static class CopyFiles.CopyFilesMapper
extends MapReduceBase

Base-class for all mappers for distcp

Author:
Arun C Murthy

Constructor Summary
CopyFiles.CopyFilesMapper()
           
 
Method Summary
abstract  void cleanup(Configuration conf, JobConf jobConf, String srcPath, String destPath)
          Interface to cleanup *distcp* specific resources
 int getMapCount(int initialEstimate, long totalBytes, JobClient client)
          Calculate how many maps to run.
static Path makeRelative(Path root, Path absPath)
          Make a path relative with respect to a root path.
abstract  void setup(Configuration conf, JobConf jobConf, String[] srcPaths, String destPath, boolean ignoreReadFailures)
          Interface to initialize *distcp* specific map tasks.
 
Methods inherited from class org.apache.hadoop.mapred.MapReduceBase
close, configure
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CopyFiles.CopyFilesMapper

public CopyFiles.CopyFilesMapper()
Method Detail

setup

public abstract void setup(Configuration conf,
                           JobConf jobConf,
                           String[] srcPaths,
                           String destPath,
                           boolean ignoreReadFailures)
                    throws IOException
Interface to initialize *distcp* specific map tasks.

Parameters:
conf - : The dfs/mapred configuration.
jobConf - : The handle to the jobConf object to be initialized.
srcPaths - : The source paths.
destPath - : The destination path.
ignoreReadFailures - : Ignore read failures?
Throws:
IOException

cleanup

public abstract void cleanup(Configuration conf,
                             JobConf jobConf,
                             String srcPath,
                             String destPath)
                      throws IOException
Interface to cleanup *distcp* specific resources

Parameters:
conf - : The dfs/mapred configuration.
jobConf - : The handle to the jobConf object to be initialized.
srcPath - : The source uri.
destPath - : The destination uri.
Throws:
IOException

makeRelative

public static Path makeRelative(Path root,
                                Path absPath)
Make a path relative with respect to a root path. absPath is always assumed to descend from root. Otherwise returned path is null.


getMapCount

public int getMapCount(int initialEstimate,
                       long totalBytes,
                       JobClient client)
                throws IOException
Calculate how many maps to run. Ideal number of maps is one per file (if the map-launching overhead were 0). It is limited by jobtrackers handling capacity which, lets say, is MAX_NUM_MAPS. It is also limited by MAX_MAPS_PER_NODE. Also for small files it is better to determine number of maps by amount of data per map.

Parameters:
initialEstimate - Initial guess at number of maps (e.g. count of files).
totalBytes - Count of total bytes for job (If not known, pass -1).
client -
Returns:
Count of maps to run.
Throws:
IOException


Copyright © 2006 The Apache Software Foundation