org.apache.hadoop.filecache
Class DistributedCache

java.lang.Object
  extended by org.apache.hadoop.filecache.DistributedCache

public class DistributedCache
extends Object

The DistributedCache maintains all the caching information of cached archives and unarchives all the files as well and returns the path

Author:
Mahadev Konar

Constructor Summary
DistributedCache()
           
 
Method Summary
static void addArchiveToClassPath(Path archive, Configuration conf)
          Add an archive path to the current set of classpath entries.
static void addCacheArchive(URI uri, Configuration conf)
          Add a archives to be localized to the conf
static void addCacheFile(URI uri, Configuration conf)
          Add a file to be localized to the conf
static void addFileToClassPath(Path file, Configuration conf)
          Add an file path to the current set of classpath entries It adds the file to cache as well.
static boolean checkURIs(URI[] uriFiles, URI[] uriArchives)
          This method checks if there is a conflict in the fragment names of the uris.
static void createAllSymlink(Configuration conf, File jobCacheDir, File workDir)
          This method create symlinks for all files in a given dir in another directory
static byte[] createMD5(URI cache, Configuration conf)
          Returns md5 of the checksum file for a given dfs file.
static void createSymlink(Configuration conf)
          This method allows you to create symlinks in the current working directory of the task to all the cache files/archives
static Path[] getArchiveClassPaths(Configuration conf)
          Get the archive entries in classpath as an array of Path
static String[] getArchiveMd5(Configuration conf)
          Get the md5 checksums of the archives
static URI[] getCacheArchives(Configuration conf)
          Get cache archives set in the Configuration
static URI[] getCacheFiles(Configuration conf)
          Get cache files set in the Configuration
static Path[] getFileClassPaths(Configuration conf)
          Get the file entries in classpath as an array of Path
static String[] getFileMd5(Configuration conf)
          Get the md5 checksums of the files
static Path getLocalCache(URI cache, Configuration conf, Path baseDir, boolean isArchive, String md5, Path currentWorkDir)
           
static Path[] getLocalCacheArchives(Configuration conf)
          Return the path array of the localized caches
static Path[] getLocalCacheFiles(Configuration conf)
          Return the path array of the localized files
static boolean getSymlink(Configuration conf)
          This method checks to see if symlinks are to be create for the localized cache files in the current working directory
static void releaseCache(URI cache, Configuration conf)
          This is the opposite of getlocalcache.
static void setArchiveMd5(Configuration conf, String md5)
          This is to check the md5 of the archives to be localized
static void setCacheArchives(URI[] archives, Configuration conf)
          Set the configuration with the given set of archives
static void setCacheFiles(URI[] files, Configuration conf)
          Set the configuration with the given set of files
static void setFileMd5(Configuration conf, String md5)
          This is to check the md5 of the files to be localized
static void setLocalArchives(Configuration conf, String str)
          Set the conf to contain the location for localized archives
static void setLocalFiles(Configuration conf, String str)
          Set the conf to contain the location for localized files
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DistributedCache

public DistributedCache()
Method Detail

getLocalCache

public static Path getLocalCache(URI cache,
                                 Configuration conf,
                                 Path baseDir,
                                 boolean isArchive,
                                 String md5,
                                 Path currentWorkDir)
                          throws IOException
Parameters:
cache - the cache to be localized, this should be specified as new URI(hdfs://hostname:port/absoulte_path_to_file#LINKNAME). If no schema or hostname:port is provided the file is assumed to be in the filesystem being used in the Configuration
conf - The Confguration file which contains the filesystem
baseDir - The base cache Dir where you wnat to localize the files/archives
isArchive - if the cache is an archive or a file. In case it is an archive with a .zip or .jar extension it will be unzipped/unjarred automatically and the directory where the archive is unjarred is returned as the Path. In case of a file, the path to the file is returned
md5 - this is a mere checksum to verufy if you are using the right cache. You need to pass the md5 of the crc file in DFS. This is matched against the one calculated in this api and if it does not match, the cache is not localized.
currentWorkDir - this is the directory where you would want to create symlinks for the locally cached files/archives
Returns:
the path to directory where the archives are unjarred in case of archives, the path to the file where the file is copied locally
Throws:
IOException

releaseCache

public static void releaseCache(URI cache,
                                Configuration conf)
                         throws IOException
This is the opposite of getlocalcache. When you are done with using the cache, you need to release the cache

Parameters:
cache - The cache URI to be released
conf - configuration which contains the filesystem the cache is contained in.
Throws:
IOException

createMD5

public static byte[] createMD5(URI cache,
                               Configuration conf)
                        throws IOException
Returns md5 of the checksum file for a given dfs file. This method also creates file filename_md5 existence of which signifies a new cache has been loaded into dfs. So if you want to refresh the cache, you need to delete this md5 file as well.

Parameters:
cache - The cache to get the md5 checksum for
conf - configuration
Returns:
md5 of the crc of the cache parameter
Throws:
IOException

createAllSymlink

public static void createAllSymlink(Configuration conf,
                                    File jobCacheDir,
                                    File workDir)
                             throws IOException
This method create symlinks for all files in a given dir in another directory

Parameters:
conf - the configuration
jobCacheDir - the target directory for creating symlinks
workDir - the directory in which the symlinks are created
Throws:
IOException

setCacheArchives

public static void setCacheArchives(URI[] archives,
                                    Configuration conf)
Set the configuration with the given set of archives

Parameters:
archives - The list of archives that need to be localized
conf - Configuration which will be changed

setCacheFiles

public static void setCacheFiles(URI[] files,
                                 Configuration conf)
Set the configuration with the given set of files

Parameters:
files - The list of files that need to be localized
conf - Configuration which will be changed

getCacheArchives

public static URI[] getCacheArchives(Configuration conf)
                              throws IOException
Get cache archives set in the Configuration

Parameters:
conf - The configuration which contains the archives
Returns:
A URI array of the caches set in the Configuration
Throws:
IOException

getCacheFiles

public static URI[] getCacheFiles(Configuration conf)
                           throws IOException
Get cache files set in the Configuration

Parameters:
conf - The configuration which contains the files
Returns:
A URI array of the files set in the Configuration
Throws:
IOException

getLocalCacheArchives

public static Path[] getLocalCacheArchives(Configuration conf)
                                    throws IOException
Return the path array of the localized caches

Parameters:
conf - Configuration that contains the localized archives
Returns:
A path array of localized caches
Throws:
IOException

getLocalCacheFiles

public static Path[] getLocalCacheFiles(Configuration conf)
                                 throws IOException
Return the path array of the localized files

Parameters:
conf - Configuration that contains the localized files
Returns:
A path array of localized files
Throws:
IOException

getArchiveMd5

public static String[] getArchiveMd5(Configuration conf)
                              throws IOException
Get the md5 checksums of the archives

Parameters:
conf - The configuration which stored the md5's
Returns:
a string array of md5 checksums
Throws:
IOException

getFileMd5

public static String[] getFileMd5(Configuration conf)
                           throws IOException
Get the md5 checksums of the files

Parameters:
conf - The configuration which stored the md5's
Returns:
a string array of md5 checksums
Throws:
IOException

setArchiveMd5

public static void setArchiveMd5(Configuration conf,
                                 String md5)
This is to check the md5 of the archives to be localized

Parameters:
conf - Configuration which stores the md5's
md5 - comma seperated list of md5 checksums of the .crc's of archives. The order should be the same as the order in which the archives are added

setFileMd5

public static void setFileMd5(Configuration conf,
                              String md5)
This is to check the md5 of the files to be localized

Parameters:
conf - Configuration which stores the md5's
md5 - comma seperated list of md5 checksums of the .crc's of files. The order should be the same as the order in which the files are added

setLocalArchives

public static void setLocalArchives(Configuration conf,
                                    String str)
Set the conf to contain the location for localized archives

Parameters:
conf - The conf to modify to contain the localized caches
str - a comma seperated list of local archives

setLocalFiles

public static void setLocalFiles(Configuration conf,
                                 String str)
Set the conf to contain the location for localized files

Parameters:
conf - The conf to modify to contain the localized caches
str - a comma seperated list of local files

addCacheArchive

public static void addCacheArchive(URI uri,
                                   Configuration conf)
Add a archives to be localized to the conf

Parameters:
uri - The uri of the cache to be localized
conf - Configuration to add the cache to

addCacheFile

public static void addCacheFile(URI uri,
                                Configuration conf)
Add a file to be localized to the conf

Parameters:
uri - The uri of the cache to be localized
conf - Configuration to add the cache to

addFileToClassPath

public static void addFileToClassPath(Path file,
                                      Configuration conf)
                               throws IOException
Add an file path to the current set of classpath entries It adds the file to cache as well.

Parameters:
file - Path of the file to be added
conf - Configuration that contains the classpath setting
Throws:
IOException

getFileClassPaths

public static Path[] getFileClassPaths(Configuration conf)
Get the file entries in classpath as an array of Path

Parameters:
conf - Configuration that contains the classpath setting

addArchiveToClassPath

public static void addArchiveToClassPath(Path archive,
                                         Configuration conf)
                                  throws IOException
Add an archive path to the current set of classpath entries. It adds the archive to cache as well.

Parameters:
archive - Path of the archive to be added
conf - Configuration that contains the classpath setting
Throws:
IOException

getArchiveClassPaths

public static Path[] getArchiveClassPaths(Configuration conf)
Get the archive entries in classpath as an array of Path

Parameters:
conf - Configuration that contains the classpath setting

createSymlink

public static void createSymlink(Configuration conf)
This method allows you to create symlinks in the current working directory of the task to all the cache files/archives

Parameters:
conf - the jobconf

getSymlink

public static boolean getSymlink(Configuration conf)
This method checks to see if symlinks are to be create for the localized cache files in the current working directory

Parameters:
conf - the jobconf
Returns:
true if symlinks are to be created- else return false

checkURIs

public static boolean checkURIs(URI[] uriFiles,
                                URI[] uriArchives)
This method checks if there is a conflict in the fragment names of the uris. Also makes sure that each uri has a fragment. It is only to be called if you want to create symlinks for the various archives and files.

Parameters:
uriFiles - The uri array of urifiles
uriArchives - the uri array of uri archives


Copyright © 2006 The Apache Software Foundation