Frequently Asked Questions about lsof

**********************************************************************
| The latest release of lsof is always available via anonymous ftp   |
| from vic.cc.purdue.edu.  Look in pub/lsof.README for its location. |
**********************************************************************

______________________________________________________________________

This file contains frequently asked questions about lsof and answers
to them.

Vic Abell
March 8, 1996
______________________________________________________________________

Table of Contents:

1.0	General Concepts
1.1	Lsof -- what is it?
1.2	Where do I get lsof?
1.2.1	Are there mirror sites?
1.2.2	Are lsof executables available?
1.2.3	Why can't I extract the lsof tar files?


2.0	Lsof Ports
2.1	What ports exist?
2.2	What about a new port?
2.2.1	User-contributed Ports
2.2.2	Dell SVR4
2.3	Why isn't there an AT&T SVR4 port?


3.0	Lsof Problems
3.1	Why doesn't lsof report full path names?
3.1.1	Why do lsof -r reports show different path names?
3.1.2	Why does lsof report the wrong path names?
3.2	Does lsof have security problems?
3.3	Will lsof show remove hosts using files via NFS?

3.4	AIX Problems
3.4.1	How can I compile a working lsof for AIX 4.1?
3.4.2	What is the Stale Segment ID bug and why is -X needed?
3.4.2.1	Stale Segment ID APAR

3.5	DEC OSF/1 Problems
3.5.1	Why does lsof complain about non-existent /dev/fd entries?
3.5.2	Why does the DEC OSF/1 V3.2 ld complain about Ots* symbols?

3.6	HP-UX Problems
3.6.1	Why does an HP-UX lsof compilation get ``unknown "O" option?''

3.7	Linux Problems
3.7.1	Why doesn't lsof work (or even compile) on my Linux system?
3.7.2	Why does lsof complain about /dev/kmem?
3.7.3	Why can't lsof find kernel addresses?
3.7.4	Why does lsof have trouble reading kernel structures?
3.7.5	Where is /zSystem.map (or /System.map)?  Why doesn't it match
	my kernel?
3.7.6	Why does lsof complain about the random_fops and urandom_fops
	kernel symbols?
3.7.7.	Why does lsof complain about get_kernel_syms()?

3.8	NetBSD Problems
3.8.1	Why doesn't a NetBSD 1.0A binary run on my 1.0A system?

3.9	Output problems
3.9.1	Why does an asterisk (`*') precede some inode numbers?
3.9.2	Why does the offset have ``0t' and ``0x'' prefixes?

3.10	SGI IRIX Problems
3.10.1	Why doesn't lsof display open IRIX 5.3 XFS files properly?
3.10.2	Where is the IRIX 5.3 <sys/vnode.h>?

3.11	Sun Problems
3.11.1	My Sun gcc-compiled lsof doesn't work -- why?
3.11.2	How can I make lsof compile with gcc under Solaris 2.4?
3.11.3	How can I make lsof compile with gcc under SunOS 4.1.x?
3.11.4	Why does the Solaris SunPRO cc complain about system header files?
3.11.5	Why doesn't lsof work under my Solaris 2.4?
3.11.6	Where are the Solaris header files?
3.11.7	Where is the Solaris /usr/src/uts/<architecture>/sys/machparam.h?


4.0	Lsof Features
4.1     Why doesn't lsof doesn't report on /proc entries on my
	system?
4.2	How do I disable the device cache file feature or alter
	it's behavior?
4.2.1	What's the risk with a perverted device cache file?
4.2.2	How do I put the full host name in a personal device cache file
	path?
4.2.3	How do I put the personal device cache file in /tmp?
4.3	Why doesn't lsof know about AFS files on my favorite dialect?
4.3.1	Why doesn't lsof report node numbers for all AFS volume files,
	or how do I reveal dynamic module addresses to lsof?
______________________________________________________________________


1.0	General Concepts

1.1	Lsof -- what is it?

	Lsof is a Unix-specific tool.  It's name stands for LiSt
	Open Files, and it does just that.  It lists information
	about files that are open by the processes running on a
	Unix system.

	See the lsof man page, the 00DIST file, and the 00README
	file of the lsof distribution for more information.

1.2	Where do I get lsof?

	Lsof is available via anonymous ftp from vic.cc.purdue.edu
	(128.210.15.16).  Look in the pub/tools/unix/lsof sub-
	directory.

	Compressed and gzip'd tar files with PGP certificates are
	available.

1.2.1	Are there mirror sites?

	The lsof distribution is currently mirrored at:

	    coast.cs.purdue.edu
		pub/tools/unix/lsof
	    ftp.auscert.org.au
		/pub/mirrors/vic.cc.purdue.edu/lsof/*
	    ftp.cert.dfn.de
		/pub/tools/admin/lsof
	    ftp.ci.uminho.pt
		/pub/security/lsof/
	    ftp.ConnectDE.NET
		pub/utils/lsof
	    ftp.crc.doc.ca
		packages/lsof
	    ftp.cs.columbia.edu
		archives/lsof/
	    ftp.fu-berlin.de
		pub/unix/tools/lsof
	    ftp.gre.ac.uk
		pub/tools/lsof
	    ftp.rge.com
		/pub/lsof
	    ftp.pacbell.com
		/mirror/vic.cc.purdue.edu/lsof
	    ftp.sterling.com
		/admin-tools/lsof
	    ftp.sunet.se
		pub/unix/admin/lsof
	    ftp.tau.ac.il
		/pub/unix/admin
	    ftp.uni-mainz.de
		pub/misc/lsof
	    ftp.web.ad.jp
		/pub/UNIX/tools/lsof
	    wuarchive.wustl.edu
		/packages/security/lsof

1.2.2	Are lsof executables available?

	Some lsof executables are available in the subdirectory
	tree pub/tools/unix/lsof/binaries  These are neither guaranteed
	to be current nor cover every dialect and machine architecture.

	I don't recommend you use pre-compiled lsof binaries; I
	recommend you obtain the sources and build your own binary.
	Even if you're a Sun user without a SunPRO C compiler, you
	can use gcc to compile lsof.

	If you must use a binary file, please be conscious of the
	security implications in using an executable of unknown
	origin.  The lsof binaries are accompanied by PGP certificates.
	Please use them!

	Three additional cautions apply to executables:

	1.  Don't try to use an lsof executable, compiled for one
	    version of a Unix dialect, on another.

	2.  A SunOS lsof executable, compiled for one Sun architecture,
	    won't work on different Sun architecture, even if both
	    systems run the same version of SunOS.

	3.  A Solaris lsof executable, compiled for one Sun
	    architecture, isn't guaranteed to work on a different
	    Sun architecture, even if both systems run the same
	    version of Solaris.

1.2.3   Why can't I extract the lsof tar files?

	I have had a report from a Solaris user that he was unable
	to extract the lsof distribution file under Solaris 2.3 or
	2.4.  I was able to duplicate his report.

	When I upgraded tar on my NeXT cube (where I generate the
	lsof distribution) to GNU tar 1.11.2 (plus some local
	fixes), I could no longer duplicate the problem.

	The Solaris user still reports that he can extract the GNU-
	tar-1.11.2-built archive, but gets warning messages from
	tar.  I get no warning messages, so we jointly suspect that
	it's possible some Sun patch has made our two tar programs
	somewhat incompatible.

	If you have problems extracting the lsof distribution,
	please try GNU tar 1.11.2 or later.  If that fails, contact
	me.


2.0	Lsof Ports

2.1	What ports exist?

	The pub/lsof.README file carries the latest port information:

	AIX 3.2.[45], 4.1, and		IBM RISC/System 6000
	    4.1.[1234]
	BSDI BSD/OS 2.0, 2.0.1, and	Intel-based systems
	    2.1-BETA
	EP/IX 2.1.1			CDC 4680
	FreeBSD 1.1.5.1, 2.0, 2.0.5,	Intel-based systems
	    and 2.1
	HP-UX 8.x, 9.x, 10		HP
	IRIX 4.0.5, 5.2, 5.3, 6.0,	SGI
	    6.0.1, 6.1, and 6.2-BETA
	Linux through 1.3.56		Intel-based systems
	NetBSD 1.0 and 1.1		Intel and SPARC-based systems
	NEXTSTEP 2.1 and 3.[0123]	all NEXTSTEP architectures
	OSF/1 2.0, 3.[02], and 4.0-BETA	DEC Alpha
	RISC/os 4.52			MIPS R2000-based systems
	SCO OpenDesktop, OpenServer	Intel-based systems
	    1.1, 3.0, and 5.0
	Sequent PTX 2.1.[156],		Sequent systems
	    4.0.[23], and 4.1.[02]
	Solaris 2.[12345]		Sun 4 and i86pc
	SunOS 4.1.3			Sun 3 and 4
	Ultrix 2.2, 4.2, 4.3, 4.4,	DEC RISC and VAX
	    and 4.5

2.2	What about a new port?

	The 00PORTING file in the distribution gives hints on doing
	a port.  I will consider doing a port in exchange for
	permanent access to a test host.  I require permanent access
	so I can test new lsof revisions, because I will not offer
	distributions of dialect ports I cannot upgrade and test.

2.2.1	User-contributed Ports

	Sometimes I receive contributions of ports of lsof to
	systems where I can't test future revisions of lsof.  Hence,
	I don't incorporate these contributions into my lsof
	distribution.

	However, I do make these contributions available in the
	directory:

		pub/tools/unix/lsof/contrib

	on my ftp server, vic.cc.purdue.edu.

	Consult the 00INDEX file in the contrib/ directory for a
	list of the available contributions.

2.2.2	Dell SVR4

	There is no lsof port for Dell SVR4.  However, Kevin Kadow
	<kadokev@rci.ripco.com reported that he was able to use
	the Novell UnixWare version of lsof under Dell SVR4.
	Although that version of lsof is no longer distributed,
	remnants of it may be found on vic.cc.purdue.edu in:

	    pub/tools/unix/lsof/OLD/binaries
	
	and

	    pub/tools/unix/lsof/OLD/dialects

2.3	Why isn't there an AT&T SVR4 port?

	I haven't produced an AT&T SVR4 port because I haven't seen
	a Unix dialect that is strictly limited to the AT&T System
	V, Release 4 source code.  Every one I have seen is a
	derivative with vendor additions.

	The vendor additions are significant to lsof because they
	affect the internal kernel structures with which lsof does
	business.  While some vendor derivatives of SVR4 are similar,
	each one I have encounted so far has been different enough
	from its siblings to require special source code.

	If you're interested in an SVR4 version of lsof, here are
	some existing ports you can consider:

		EP/IX 2.1.1
		IRIX 5.2, 5.3, 6.0, 6.0.1, 6.1, and 6.2-BETA
		Sequent PTX 4.0.[23] and 4.1.[02]
		Solaris 2.[1234]


3.0	Lsof Problems

3.1	Why doesn't lsof report full path names?

	Lsof reports full path names in two, limited cases: 1) some
	systems, e.g., some IRIX and SunOS versions, contain full
	directory path names in their user structure, so lsof
	reports those; or 2) lsof reports the full path name when
	it is specified as a search argument for open files that
	match it.

	Lsof reports some or all path name components (e.g., the
	sys and proc.h components of /usr/include/sys/proc.h) for
	all buf AFS files for these dialects:

		DEC OSF/1 2.0, 3.[02], and 4.0-BETA
		EP/IX 2.1.1
		FreeBSD 1.1.5.1, 2.0, 2.0.5, and 2.1.0
		HP-UX 9.01 and 10.x
		NetBSD 1.0 and 1.1
		NEXTSTEP 3.1
		RISC/os 4.52
		SCO OpenDesktop/OpenServer
		SGI IRIX 5.3
		Solaris 2.[345]
		SunOS 4.1.[23]
		Ultrix 2.2, 4.2, and 4.3 (last component only)

	Lsof obtains the components from the kernel's name cache.
	(As far as I can determine, AFS path lookups don't share
	in kernel name cache operations.) Since the size of the
	cache is limited and the cache is in constant flux, it does
	not always contain the names of all components in an open
	file's path; sometimes it contains none of them.

	Lsof reports the file system directory name and whatever
	components of the file's path it finds in the cache, starting
	with the last component and working backwards through the
	directories that contain it.  If lsof finds no path
	components, lsof reports the file system device name instead.

	When lsof does report some path components in the NAME
	column, it prefixes them with the file system directory
	name, followed by " -- ", followed by the components --
	e.g., /usr -- sys/path.h for /usr/include/sys/path.h.  The
	" -- " is omitted when lsof finds all the path name components
	of a file's name.

	Lsof can't obtain path name components from the kernel name
	caches of the following dialects:

	    AIX                 The knlist() function won't return
				cache addresses -- some IBM wisdom
				to "protect" their customers.

	    SGI IRIX 4.0.5H	I saw no unified name cache in the
				header files.

	    SGI IRIX 5.2, 6.0,	The name cache is not visible to
		6.0.1, 6.1,	application programs.
		and 6.2-BETA

	No Unix kernel records full path names in the structures
	it records about open files; instead, kernels convert path
	names to device and node number doublets and use them for
	subsequent file references once files have been opened.

	To convert the device and node number doublet into a
	complete path name, lsof would have to start at the root
	node (root directory) of the file system on which the node
	resides, and search every branch for the node, building
	possible path names along the way.  That would be a time
	consuming operation and require access to the raw disk
	device (usually implying setuid(root) permission).

	If the prospect of all that local disk activity doesn't
	concern you, think about the cost when the device is
	NFS-mounted.

3.1.1	Why do lsof -r reports show different path names?

	When you run lsof with its repeat (``-r'') option, you may
	notice that the extent to which it reports path names for
	the same files may vary from cycle to cycle.  That happens
	because other processes are making kernel calls affecting
	the cache and causing entries to be removed from and added
	to it.

3.1.2	Why does lsof report the wrong path names?

	Under some circumstances lsof may report an incorrect path
	name component, especially for files in a rapidly changing
	directory like /tmp.  This error occurs when the kernel
	uses device and inode numbers as keys to its name cache --
	e.g., in Linux or SCO name cache implementations.

	In a rapidly changing directory, like /tmp, if the kernel
	doesn't clear the cache entry when it removes a file, a
	new file may be given the same keys and lead lsof to believe
	that the old cache entry with the same keys belongs to the
	new file.

	Lsof tries to avoid this error by purging duplicate entries
	from its copy of the kernel name cache when they have the
	same device and inode number, but different names.

	This error is unlikely to occur in Unix dialects where the
	keys to the name cache are node address and possibly a
	capability ID.  The BSDI, EP/IX, FreeBSD, HP-UX, NeXTSTEP,
	OSF/1, PTX, RISCos, IRIX, SunOS, Solaris, and Ultrix dialects
	use node address.  BSDI, FreeBSD, IRIX, NetBSD, OSF/1, and
	Ultrix also use a capability ID to further identify name
	cache entries.

3.2	Does lsof have security problems?

	I don't think so.  However, lsof does usually start with
	setgid permission or the equivalent.  In some SYSV derivatives
	it has setuid(root) permission to access /proc file system
	entries.  Any program that has setgid or setuid permission,
	however briefly, should always be regarded suspiciously.

	The setuid(root) power leads to a potential security
	problem.  It could allow lsof to read a kernel name list
	or memory file via the -k and -m options.  To circumvent
	this problem lsof (revisions 3.07 and above) uses access(2)
	to determine its real UID's authority to read files declared
	with -k and -m.  My thanks to Tim Ramsey <tar@ksu.ksu.edu>
	for identifying this problem.

	At revision 3.44 and above, lsof drops the setgid power,
	holding it only while it opens access to kernel memory
	devices (e.g., /dev/kmem, /dev/mem, /dev/swap).  That allows
	lsof to bypass the weaker security of access(2) in favor
	of the stronger checks the kernel makes when it examines
	the right of the lsof process to open files declared with
	-k and -m.  Lsof revision 3.44 and above also restricts
	some device cache file naming options when it senses that
	the lsof process has setuid(root) power.

	The device cache file (typically .lsof_hostname in the home
	directory of the real user ID that executes lsof) has 0600
	modes.  (The suffix, hostname, is the first component of
	the host's name returned by gethostname(2).)  However, even
	when lsof runs setuid(root), it makes sure the file's
	ownerships are changed to that of the real user and group.
	In addition, lsof checks the file carefully before using
	it (see section 4.2 for a description of the checks);
	discards the file if it fails the scrutiny; complains about
	the condition of the file; then rebuilds the file.

	See the 00DCACHE file of the lsof distribution for more
	information about device cache file handling and the risks
	associated with the file.

3.3	Will lsof show remote hosts using files via NFS?

	No.  Remember, lsof displays open files for the processes
	of the host on which it runs.  If the host on which lsof
	is running is an NFS server, the remote NFS client processes
	that are accessing files on the server leave no process
	records on the server for lsof to examine.

3.4	AIX Problems

3.4.1	How can I compile a working lsof for AIX 4.1?

	If you have updated your AIX system to 4.1, but haven't
	updated your xlc compiler, the lsof you compile may not
	work.  This is caused by the new -qlonglong or -qlongdouble
	default option to xlc; it causes the _LONG_LONG symbol to
	be defined; _LONG_LONG causes a slight change in the size
	of the user structure from <sys/user.h>; and the size of
	the user structure is important to lsof when it issues the
	undocumented getuser() call, because getuser() fails when
	the size of the user structure is stated incorrectly.

	You can tell if your compiler has been updated by using
	the xlc command without options.  Called that way xlc will
	show you the options it supports.  If -qlonglong or
	-qlongdouble aren't among them, your compiler is not
	sufficiently up to date.

	There is an easy work-around: add -D_LONG_LONG to the CFGF
	string in the Makefile.  Change

		CFGF=   -D_AIXV=4100
	to
		CFGF=   -D_AIXV=4100 -D_LONG_LONG

3.4.2	What is the Stale Segment ID bug and why is -X needed?

	Kevin Ruderman <rudi@acs.bu.edu> reports that he has been
	informed by IBM that processes using the AIX 3.2.x and
	4.1[.x] kernel's readx() function can cause other AIX
	processes to hang because of what appears to be file system
	corruption.

	This failure, known as the Stale Segment ID bug, is caused
	by an error in the AIX kernel's journalled segment memory
	handler that causes the kernel's dir_search() function
	erroneously to believe directory entries contain zeroes.
	The process using the readx() call need not be doing anything
	wrong.  Usually the system must be under such heavy load
	that the segment ID being used in the readx() call has been
	freed and then reallocated to another process since it was
	obtained from kernel memory.

	Lsof uses the readx() function to access library entry
	structures, based on the segment ID it finds in the proc
	structure of a process.  Since IBM probably will not fix
	the kernel bug in AIX 3.2.x or 4.1[.x] and may not fix it
	until some version of 4.2, I've added an AIX-specific option
	to lsof that controls its use of the readx() function.

	By default lsof readx() use is disabled; specifying the
	``-X'' option enables readx() use.  When readx() use is
	disabled, lsof will report that in the NAME column for AIX
	3.2.x and 4.1 text and loader references whose loader entry
	structures must be obtained using readx().  (Lsof won't
	report anything for AIX 4.1.x text and loader references
	when readx() use is disabled.)  If lsof encounters an AIX
	3.2.x or 4.1 loader entry that it can't read because readx()
	use is disabled, it stops reporting loader entry information,
	since loader entries are linked by pointer elements.

	If you want to change the default readx() behavior of AIX
	lsof, change the HASXOPT, HASXOPT_ROOT, and HASXOPT_VALUE
	definitions in dialects/aix/machine.h.  You can also use
	these definitions to enable or disable readx() -- consult
	the comments in machine.h.  You may want to disable readx()
	use permanently if you plan to make lsof publicly executable.

	When HASXOPT_ROOT is defined, lsof will restrict use of
	the -X option to processes whose real UID is root; if
	HASXOPT_ROOT isn't defined, any user may specify the -X
	option.  The Customize script offers the option to change
	HASXOPT_ROOT when HASXOPT is defined and HASXOPT_ROOT is
	named in any dialect's machine.h header file.

	I have never seen lsof cause this problem, but I believe
	there is some chance it could, given the right circumstances.

3.4.2.1	Stale Segment ID APAR

	Here are the details of the Stale Segment ID bug and IBM's
	response, provided by Kevin Ruderman <rudi@acs.bu.edu>.

	AIX V3
	  APAR=ix49183
	      user process hangs forever in kernel due to file
	      system corruption
	  STAT=closed prs  TID=tx2527 ISEV=2 SEV=2
	       (A "closed prs" is one closed with a Permanent
	       ReStriction.)
	  RCOMP=575603001 aix v3 for rs/6 RREL=r320

	AIX V4  (internal defect, no apar #)
	  prefix        p
	  name          175671
	  abstract      KERMP: loop for ever in dir_search()

	Problem description:

	1. Some user application -- e.g., lsof -- gets the segment
	   ID (SID) for the process private segment of a target
	   process from the process table.

	2. The target process exits, deleting the process private
	   segment.

	3. The SID is reallocated for use as a persistent segment.

	4. The user application runs again and tries to read the
	   user area structure from /dev/mem, using the SID it read
	   from the process table.

	5. The loads done by the driver for /dev/mem cause faults
	   in the directory; new blocks are allocated; the size
	   changed; and zero pages created.

	6. The next application that looks for a file in the affected
	   directory hangs in the kernel's dir_search() function
	   because of the zero pages.  This occurs because the
	   kernel's dir_search() function loops through the variable
	   length entries one at a time, moving from one to the
	   next by adding the length of the current entry to its
	   address to get the address of the next entry. This
	   process should end when the current pointer passes the
	   end of the known directory length.

	   However, while the directory length has increased, the
	   entry length data has not, so when dir_search() reaches
	   the zero pages, it loops forever, adding a length of
	   zero to the current pointer, never passing the end of
	   the directory length.  The application process is hung;
	   it can't be killed or stopped.

	IBM has closed the problem with a PRS code (Permanent
	ReStriction) under AIX Version 3 and has targeted a fix
	for AIX V4.2.

3.5	DEC OSF/1 Problems

3.5.1	Why does lsof complain about non-existent /dev/fd entries?

	When you run lsof for DEC OSF/1 3.0 or 3.2, lsof may
	complain:

	    lsof: can't lstat /dev/fd/xxx: No such file or directory
	    lsof: can't lstat /dev/fd/yyy: No such file or directory

	(Or it may warn about other missing /dev/fd paths.)  When
	you do an ``ls /dev/fd'' none of the missing paths are listed.

	This is caused by a bug in the DEC library function
	getdirentries().  For some reason, when /dev/fd is a file
	system mount point, getdirentries() returns an incorrect
	size for it to readdir().  (Lsof calls readdir() in its
	ddev.c readdev() function.)  Because of the incorrect size,
	readdir() goes past the end of the /dev/fd directory buffer,
	encounters random paths and returns them to lsof.  Lsof
	then attempts to lstat(2) the random paths, gets error
	replies from lstat(2), and complains about the paths.

	Duncan McEwan <duncan@Comp.VUW.AC.NZ> discovered this error
	and has reported it to DEC.  Duncan also supplied a work-
	an alternate readdir() function as a work-around.  I've
	incorporated his readdir() in dialects/osf/ddev.c (as the
	static ReadDir() function) with some slight modifications,
	and enabled its use when the USELOCALREADDIR symbol is
	defined.

	The Configure script defines USELOCALREADDIR for DEC OSF/1
	versions 3.0 and 3.2.  If you don't want to use Duncan's
	local readdir() function, edit the Makefile and remove
	-DUSELOCALREADDIR from the CFGF string.  When DEC releases
	a corrected getdirentries() function, I'll modify the
	Configure script to stop defining USELOCALREADDIR.

3.5.2	Why does the DEC OSF/1 V3.2 ld complain about Ots* symbols?

	When you compile lsof on your DEC OSF/1 V3.2 system, ld
	may complain:

	    ld:
	    Unresolved:
	    knlist
	    _OtsRemainder32Unsigned
	    _OtsDivide64Unsigned
	    _OtsRemainder64Unsigned
	    _OtsDivide32Unsigned
	    _OtsMove
	    _OtsDivide32
	    _OtsRemainder32
	    *** Exit 1

	I'm not sure why this happens, nor do I think it happens
	on all DEC OSF/1 V3.2 systems.  However, I have had one
	report about it.

	The best work-around seems to be to remove -lmld from the
	CFGL string in the Makefile produced by Configure -- i.e.,
	change:

	    CFGL=    -lmld
	to
	    CFGL=

	According to the V3.2 man page for nlist(3), this shouldn't
	work, but my testing shows that it does.  Although I haven't
	been able to test this second work-around, you might try
	adding -lots to CFGL, rather than removing -lmld -- i.e.,
	change:

	    CFGL=    -lmld
	to
	    CFGL=    -lmld -lots

	WARNING: my testing also shows that the V2.0 nlist(3) man
	page means what it says when it calls for -lmld -- lsof
	loaded without -mld under V2.0 can't locate the proc
	(process) table address.

	    DON'T REMOVE -LMLD FROM THE DEC OSF/1 V2.0 MAKEFILE.

	If you run into this problem, please let me know what
	problem you encountered and how you solved it.

3.6	HP-UX Problems

3.6.1	Why does an HP-UX lsof compilation get ``unknown "O" option?''

	If you only have the standard HP-UX C compiler and haven't
	purchased and installed the optional one, when you try to
	compile lsof with the Makefile that "Configure hpux"
	produces, you'll get the warning message:

		cc: error 422: unknown option "O" ignored.

	The HP-UX cc(1) man page says this:

	  "Options
	     Note that in the following list, the cc and c89 options
	     -A , -G , -g , -O , -p , -v , -y , +z , and +Z are
	     not supported by the C compiler provided as part of
	     the standard HP-UX operating system.  They are supported
	     by the C compiler sold as an optional separate product."

	If you can't install the "optional separate product," you
	can get rid of the warning message by editing the Makefile
	and removing the "-O" option from the DEBUG string -- i.e.,
	change

		DEBUG=	-O
	to
		DEBUG=

3.7	Linux

3.7.1	Why doesn't lsof work on my Linux system?

	I test lsof on what Linux systems are available to me.
	Currently I have access to a 1.2.13 system, courtesy of
	Joseph J. Nuspl Jr. <nuspl@nvwls.cc.purdue.edu>.  Keith
	Parks <emkxp01@mtcc.demon.co.uk> does testing for me on
	the latest Linux kernel.

	If lsof doesn't even compile on your Linux system, you may
	be using a version of Linux whose header files differ from
	the ones I used.  Or you may not have installed /usr/src/linux,
	and lsof can't find header files that it needs from that
	directory.

3.7.2	Why does lsof complain about /dev/kmem?

	Lsof reads kernel information via /dev/kmem.  If you get
	this error message:

		lsof: can't open /dev/kmem

	then the permissions on /dev/kmem or the authority you have
	when using lsof aren't powerful enough to allow lsof to
	read from it.  Often /dev/kmem is owned by the kmem or
	system group and has group read permission, so lsof needs
	to run setgid kmem or system, or the login that runs it
	must be in the kmem or system group (that's the way I test
	lsof).  So, become the super user and:

	either		$ chgrp kmem lsof
	or		$ chgrp system lsof
	and		$ chmod 2755 lsof

3.7.3	Why can't lsof find kernel addresses?

	The failure to read kernel addresses usually is accompanied
	by error messages like:

		lsof: can't read kernel name list from <file_name>
		lsof: missing kernel high memory definition
		lsof: missing kernel memory map definition
		lsof: missing kernel memory start definition
		lsof: no _task kernel definition
		lsof: can't read memory parameters

	These messages describe failures in obtaining addresses
	for the symbols that identify kernel structures lsof wants
	to read.  Lsof obtains kernel symbol addresses from the
	/zSystem.map file -- that will usually be the <file_name>
	argument in the "can't read kernel name list from" error
	message.  You might not have that file, or it might not be
	in that place (See 3.7.5.)

	If you encounter kernel address access errors and find a
	strategy that works, please let me know and I'll add its
	description to this file.

3.7.4	Why does lsof have trouble reading kernel structures?

	Your kernel and /System.map or /zSystem.map file may not
	match.  (See the next section, 3.7.5.)

3.7.5	Where is /zSystem.map (or /System.map)?  Why doesn't it match
	my kernel?

	Lsof uses the system map file -- /zSystem.map or /System.map
	-- to locate addresses of the symbols for kernel information
	it needs to read.  Without this file, lsof cannot function.

	The system map file is installed automatically when you
	use the kernel Makefile to install a new kernel.  If you
	made a new kernel and installed it manually, you may have
	forgotten to install the system map file that matches it.

	The Configure script tries to determine the system map file
	to use -- /zSystem.map or /System.map -- when it processes
	the linux abbreviation.  If /zSystem.map exists, the
	Configure script lets lsof default to using it; if /zSystem.map
	doesn't exist, but /System.map does, the Configure script
	defines a symbol that causes lsof to use it; if neither
	exists, Configure issues a warning and lets lsof try to
	use /zSystem.map.  Garner Halloran <kheldar@felix.cc.gatech.edu>
	helped me sort this out.

	Lsof revisions 3.35 and above have code, courtesy of Marty
	Leisner <leisner@sdsp.mc.xerox.com>, that tries to determine
	if the system map file and the booted kernel are a matched
	set.  The code compares the symbol names and addresses from
	the system map file to the symbol names and addresses from
	/proc/ksyms.  If any matching pair of names has different
	addresses, lsof complains and stops -- e.g.,

	    $ lsof -k ./XXX
	    lsof: kernel symbol address mismatch: do_munmap
	          /proc/ksyms value=0x122018; ./XXX value=0x12201a
	          There were 161 additional mismatches.
	          ./XXX and the booted kernel may not be a matched set.

3.7.6	Why does lsof complain about the random_fops and urandom_fops
	kernel symbols?

	When lsof is run on the Linux 1.3.57 through 1.3.61 kernels,
	it complains about address conflicts for two symbols,
	random_fops and urandom_fops with a message that looks like
	this:

	lsof: kernel symbol address mismatch: random_fops
	      get_kernel_syms() value is 0x100d76c;
	      /System.map value is 0x19abb0.
	      There was 1 additional mismatch.
	      /System.map and the booted kernel may not be a
	      matched set.
	
	Then lsof exits, because the address conflict on these
	symbols between /System.map and get_kernel_syms() output
	makes lsof believe that the /System.map file and the running
	kernel do not match.  (See section 3.7.5.)

	The address mismatch for these two symbols appears to be
	a kernel bug, triggered by using the mouse/psaux loadable
	module.  Keith Parks <emkxp01@mtcc.demon.co.uk> first
	reported the problem.  He discussed it with Ted Ts'o
	<tytso@MIT.EDU>, and Ted suggested a patch to the random.h
	header file that Keith reports seems to solve the problem.

	The patch became avalilable at Linux release 1.3.62.  If
	you have a release below that, but above 1.3.56, you should
	look at the file Linux-mouse-module.patch in the subdirectory
	.../lsof*/dialects/linux/patches.  Keith's description in
	in that file with has more detail than appears in this
	00FAQ section.

3.7.7.	Why does lsof complain about get_kernel_syms()?

	Linux lsof may complain:

	    lsof: WARNING: get_kernel_syms() unimplemented
		  CONFIG_MODULES not defined in autoconf.h?
	    lsof: WARNING: unable to verify symbols in /System.map
	    lsof: WARNING: uncertain kernel loader format; assuming ...

	The first complaint means that the get_kernel_syms() function
	isn't implemented in the Linux kernel, probably because
	the kernel wasn't configured for module support.  If you
	look at /usr/src/linux/include/linux/autoconf.h you'll
	probably find there's no ``#define CONFIG_MODULES'' in it.

	Lsof uses the information get_kernel_syms() supplies to
	validate the information in /System.map.  Since it is easy
	to install a new Linux kernel without installing its
	/System.map file, this lsof check is an important one.
	The "WARNING: unable to verify..." message indicates that
	lsof is unable to validate the /System.map symbols.

	The last warning, "WARNING: uncertain kernel load format..."
	indicates that, being unable to examine the output of
	get_kernel_syms(), lsof is also unable to determine if the
	Linux kernel was loaded in COFF or ELF format.  The kernel
	load format dictates whether the kernel symbols whose
	addresses lsof requires should have a leading underscore.
	(COFF kernel symbols do.)

	When lsof can't determine the kernel load format, it assumes
	and reports a default, established by the Configure script's
	analysis of autoconf.h -- i.e., the default is ELF if
	autoconf.h contains ``#defines CONFIG_KERNEL_ELF''.

	I recommend you rebuild your kernel with module support
	enabled.

3.8	NetBSD Problems

3.8.1	Why doesn't a NetBSD 1.0A binary run on my 1.0A system?

	Apparently NetBSD uname output isn't always enough to
	identify the system on which a given lsof binary will run.
	I've had trouble on an Intel system, identified as 1.0A
	before and after it was updated.  A binary generated on
	the earlier instance wouldn't run on the later one.

	If you get a pre-compiled NetBSD binaries (I don't recommend
	it.), and it won't run, try building your own binary from
	the sources.

3.9	Output Problems

3.9.1	Why does an asterisk (`*') precede some inode numbers?

	An asterisk (`*') prefix on an inode number marks an inode
	number was too large for its output field.  Typically lsof
	reserves six digits for the inode number field.  If the
	inode number is larger than that, lsof prints an asterisk
	and the last five digits of the inode number.

	If you have a system where inode numbers are usually larger
	than six digits, please let me know.  There are two other
	things you can consider:
	
	    1.  You can change the source code to print a larger
		inode number field -- look at the print_file()
		function in dfile.c. The print_file() function may
		come from common/prtf.frag for many dialects; check
		Mksrc and dfile.c in the dialect sub-directory to
		see if print_file() comes from prtf.frag.

	    2.  You can specify field output (with -F, -f, and -0)
		and post-process the field output to display larger
		inode numbers.  The sample awk and Perl field
		listing scripts do that.

3.9.2	Why does the offset have ``0t' and ``0x'' prefixes?

	The offset value that appears in the SIZE/OFF column has
	``0t' and ``0x'' prefixes to distinguish it from size values
	that may appear in the same column.

	If the offset value is less than 100,000,000, it appears
	in decimal with a ``0t' prefix; over 99,999,999, in
	hexadecimal with a ``0x'' prefix.

	A decimal offset is handy, for example, when tracking the
	progress of an outbound ftp transfer.  When lsof reports
	on the ftp process, it will report the size of the file
	being sent with its open descriptor; it will report the
	progress of the transfer via the offset of the outbound
	open ftp data socket descriptor.

3.10	SGI IRIX Problems

3.10.1	Why doesn't lsof display open IRIX 5.3 XFS files properly?

	Dave Olson <olson@anchor.engr.sgi.com> who provided the
	IRIX 5.3 changes to lsof, says he was unable to include
	support for the new XFS file system, because of completely
	different in-core data structures for XFS inodes.

3.10.2	Where is the IRIX 5.3 <sys/vnode.h>?

	According to Dave Olson of SGI, <sys/vnode.h> is shipped
	with IRIX 5.3 in eoe1.sw.unix.  However, during the XFS
	installation or the installation of some XFS patch, it is
	installed a second time.  (So far no problem.)  However,
	if XFS or the XFS patch is removed, <sys/vnode.h> is removed,
	too.

	Some possible solutions: 1) copy <sys/vnode.h> manually
	from an IRIX 5.3 source where it still exists; or 2) mount
	the IRIX 5.3 CDROM and type:

	# inst -a -f /CDROM/dist -I eoe.sw.unix -Y /usr/include/sys/vnode.h

	The second solution was suggested by John R. Vanderpool
	<fish@daacdev1.stx.com>.

3.11	Sun Problems

3.11.1	My Sun gcc-compiled lsof doesn't work -- why?

	Gcc can be used to build lsof successfully.  However, an
	improperly installed Sun gcc compiler will usually not
	produce a working lsof.

	Under SunOS 4.1.x this may happen when the gcc compiler is
	copied from one Sun architecture -- e.g., from a sun4m to
	a sun4.  The problem comes from the copying of the special
	#include header files that gcc "fixes" during installation
	to circumvent ANSI-C conflicts, especially on #else and
	#endif pre-processor declarations.  Some of the "fixed"
	header files declare kernel structures whose length varies
	with architecture type.  In particular, the size of the
	user structure (<sys/user.h>) changes with architecture
	type, and, since lsof gets command name and file pointers
	from that structure, can cause lsof to malfunction when
	its length is incorrect.

	These architecture-related structure differences changes
	do not seem to occur under Solaris.  Instead, the more
	common reason a gcc-compiled lsof doesn't work there is
	that the special gcc header files were not updated during
	the change from one version Solaris to the next -- e.g.,
	from 2.3 to 2.4.

	If your Sun gcc-compiled lsof doesn't report anything,
	check that the gcc "fixincludes" step was run on the system
	where you're using gcc to compile lsof.

3.11.2	How can I make lsof compile with gcc under Solaris 2.4?

	Presuming your gcc-specific header files are wrong for
	Solaris, edit the lsof Configure-generated Makefile and
	change:

		CFGF=   -Dsolaris=20400
	to
		CFGF=   -Dsolaris=20400 -D__STDC__=0 -I/usr/include

	This is only a temporary work-around.  You really should
	rerun gcc's fixincludes scripts to update your gcc-specific
	header files.

3.11.3	How can I make lsof compile with gcc under SunOS 4.1.x?

	Presuming your gcc-specific header files are wrong for
	SunOS 4.1.x, edit the lsof Configure-generated Makefile
	and change:

		CFGF=   -ansi -DSUNOSV=40103
	to
		CFGF=   -DSUNOSV=40103 -I/usr/include

	This is only a temporary work-around.  You really should
	rerun gcc's fixincludes scripts to update your gcc-specific
	header files.

3.11.4	Why does the Solaris SunPRO cc complain about system header files?

	You're probably trying to use /usr/ucb/cc if you get compiler
	complaints like:

	    cc -O -Dsun -Dsolaris=20300 ...
	    "/usr/include/sys/machsig.h", line 81: macro BUS_OBJERR
	    redefines previous macro at "/usr/ucbinclude/sys/signal.h",
	    line 444

	Note the reference to "/usr/ucbinclude/sys/signal.h".  It
	reveals that the BSD Compatibility Package C compiler is
	in use.  Lsof requires the ANSI C version of the Solaris
	C compiler, usually found in /usr/opt/bin/cc or
	/opt/SUNWspro/bin/cc.

	Try adding a CC string to the lsof Makefile that points to
	the Sun ANSI C version if the SunPRO C compiler -- e.g.,

	    CC= /usr/opt/bin/cc
	or
	    CC= /opt/SUNWspro/bin/cc.

3.11.5	Why doesn't lsof work under my Solaris 2.4?

	If lsof doesn't work under your Solaris 2.4 system -- e.g.,
	it produces no output, little output, or the output is
	missing command names or file descriptors -- you may have
	a pair of conflicting Sun patches installed.

	Solaris patch 101945-32 installs a kernel that was built
	with a <sys/auxv.h> header file whose NUM_*_VECTORS
	definitions don't match the ones in the <sys/auxv.h> updated
	by Solaris patch 102303-02.

	NUM_*_VECTORS in the kernel of patch 101945-32 are smaller
	than the ones in the <sys/auxv.h> of patch 102303-02.  The
	consequence is that when lsof is compiled with the <sys/auxv.h>
	whose NUM_*_VECTORS definitions are larger than the ones
	used to compile the patched kernel, lsof's user structure
	does not align with the one that the kernel employs.

	If you have these two patches installed, contact Sun and
	complain about the mis-match.

	The lsof Configure script attempts to work around the
	mis-matched patches by including a modified <sys/auxv.h>
	header file from ./dialects/sun/include/sys.  That auxv.h
	has these alternate definitions:

		    #define NUM_GEN_VECTORS 4
		    #define NUM_SUN_VECTORS 8
	
	The Configure script issues a prominent WARNING that it is
	putting this work-around into effect.  If it doesn't succeed
	for you, please contact me.

	I thank Leif Hedstrom <leif@infoseek.com> for identifying the
	offending patches.

3.11.6	Where are the Solaris header files?

	If you try to compile lsof under Solaris and get a compiler
	complaint that it can't find system header files, perhaps
	you forgot to add the header file package, SUNWhea.

3.11.7	Where is the Solaris /usr/src/uts/<architecture>/sys/machparam.h?

	When you try to Configure lsof for Solaris 2.[234] -- e.g.,
	on a `uname -m` == sun4m system -- Configure complains:

	    grep: /usr/src/uts/sun4m/sys/machparam.h:
			No such file or directory
	    grep: /usr/src/uts/sun4m/sys/machparam.h:
			No such file or directory

	And when you try to compile the configured lsof, cc or gcc
	complains:

	    dproc.c:530: `KERNELBASE' undeclared (first use this function)

	The explanation is that somehow your Solaris system doesn't
	have the header files in /usr/src/uts it should have.  Perhaps
	someone removed the directory to save space.  Perhaps you're
	using a gcc installation, copied from another system.  In any
	event, you will have to load the header files from the SUNWhea
	package of your Solaris distribution.

	KERNELBASE is an important symbol to lsof -- it keeps lsof
	from sending an illegal kernel value to kvm_read() where
	a segmentation violation might result (a bug in the kvm
	library).  Lsof can get illegal kernel values because it
	reads kernel values slowly with kvm_read() calls that the
	kernel is changing rapidly.

	Lsof doesn't need KERNELBASE at Solaris 2.5, because it
	has a Kernelbase value whose address lsof can find with
	/dev/ksyms and whose value it can read with kvm_read().
	Under Solaris 2.5 /usr/src/uts has moved to /usr/platform.


4.0	Lsof Features

4.1     Why doesn't lsof doesn't report on /proc entries on my
	system?

	/proc file system support is generally available only for
	BSD, OSF, and SYSV R4 dialects.  It's also available for
	Linux.

	Even on some SYSV R4 dialects I encountered many problems
	while trying to incorporate /proc file system support.
	The chief problem is that some vendors don't distribute
	the header file that describes the /proc file system node
	-- usually called prdata.h.

	I wasn't able to figure out how to provide /proc file system
	support under EP/IX 2.1.1 for the CDC 4680, because of
	environment conflicts.  Lsof compiles in the svr3 environment,
	but some of the functions and header files it needs for
	/proc file system support come from the svr4 environment.
	I couldn't figure out how to mix the two.

4.2	How do I disable the device cache file feature or alter
	it's behavior?

	To disable the device cache file feature for a dialect,
	remove the HASDCACHE definition from the machine.h file of
	the dialect's machine.h header file.  You can also use
	HASDCACHE to change the default prefix (``.lsof'') of the
	device cache file.

	Be sure you consider disabling the device cache file feature
	carefully.  Having a device cache file significantly reduces
	lsof startup overhead by eliminating a full scan of /dev
	(or /devices) once the device cache file has been created.
	That full scan also overloads the kernel's name cache with
	the names of the /dev (or /devices) nodes, reducing the
	opportunity for lsof to find path name components of open
	files.

	If you're worried about the presence of mode 0600 device
	cache files in the home directories of the real user IDs
	that execute lsof, consider these checks that lsof makes
	on the file before using it:

	    1.  To read the device cache file, lsof must gain
		permission from access(2).

	    2.  The device cache file's modes must be 0600 (0644
		if lsof is reading a system-wide device cache file)
		and its size non-zero.

	    3.  The device cache file's mtime must be greater than
		the mtime and ctime of the device directory (usually
		/dev or /devices).

	    4.  There must be a correctly formatted section count
		line at the beginning of the file.

	    5.  Each section must have a header line with a count
	        that properly numbers the lines in the section.
		Legal sections are device, clone, pseudo-device,
		and CRC.

	    6.  The lines of a section must have the proper format.

	    7.  All lines are included in a 16 bit CRC, and it is
		recorded in a non-checksummed section line at the
		end of the file.

	    8.  The checksum computed when the file is read must
		match the checksum recorded when the file was
		written.

	    9.  The checksum section line must be followed by
		end-of-information.

	   10.  Lsof must be able to get matching results from
		stat(2) on a randomly chosen entry of the device
		section.

	For more information on the device cache file, read the
	00DCACHE file of the lsof distribution.

4.2.1	What's the risk with a perverted device cache file?

	Even with the checks that lsof makes on the device cache
	file, it's conceivable that an intruder could modify it so
	it would pass lsof's tests.

	The only serious consequence I know of this change is the
	removal of a file whose major device number identifies a
	socket from some user ID's device cache file.  When such
	a device has been removed from the device cache file, and
	when lsof doesn't detect the removal, lsof may not be able
	to identify socket files when executed by the affected user
	ID.  Only certain dialects are at risk to this attack --
	e.g., SCO and Solaris 2.x (but not SunOS 4.1.x).

	If you're tracking a network intruder with lsof, that could
	be important to you.  If you suspect that someone has
	corrupted the device cache file you're using, I recommend
	you use lsof's -Di option to tell it to ignore it and use
	the contents of /dev (or /devices) instead; or remove the
	device cache file (usually .lsof_hostname, where hostname
	is the first component of the host's name returned by
	gethostname(2)) from the user ID's home directory and let
	lsof create a new one for you.

4.2.2	How do I put the full host name in a personal device cache file
	path?

	Lsof constructs the personal device cache file path name
	from a format specified in the HASPERSDC #define in the
	dialect's machine.h header file.  As distributed HASPERSDC
	declares the path to be ``.lsof_'' plus the first component
	of the host name with the format ``.lsof_%L''.

	If you want to change the way lsof constructs the personal
	device cache file path name, you can change the HASPERSDC
	#define and recompile lsof.  If, for example, you #define
	HASPERSDC to be ``.lsof_%l'' (note the lower case `l'),
	Configure and remake lsof, then the personal device cache
	file path will be ``.lsof_'' plus the host name returned
	by gethostname(2).

	See the 00DCACHE file of the lsof distribution for more
	information on the formation of the personal device cache
	file path and the use of the HASPERSDC #define.

4.2.3	How do I put the personal device cache file in /tmp?

	Change the HASPERSDC definition in your dialect's machine.h
	header file.
	
	When you redefine HASPERSDC, make sure you put at least
	one user identification conversion in it to keep separate
	the device cache files for each user of lsof.  Also give
	some thought to including the ``%0'' conversion to define
	an alternate path for setuid(root) and root processes.

	Here's a definition that puts a personal device cache file
	in /tmp with the name ``.lsof_UID''.

	    #define HASPERSDC "/tmp/.lsof_%U"

	Thus the personal device cache file path for UID 548 would
	be:

	    /tmp/.lsof_548

	You can add the login name to the path with the ``%u''
	conversion; the full host name with ``%l''; and the first
	host name component with ``%L''.

	CAUTION: be careful using absolute paths like /tmp lest
	lsof processes that are setuid(root) or whose real UID is
	root be used to exploit some security weakness via /tmp.
	Elect instead to add an alternate path for those processes
	with the ``%0'' conversion.  Here's an extension of the
	previous HASPERSDC format for /tmp that declares an alternate
	path:

	    #define HASPERSDC "/tmp/.lsof_%U%0%h/.lsof_%l"

	When the lsof process is setuid(root) or its real UID is
	root, presuming root's home directory is `/' and the host's
	name is ``vic.cc.purdue.edu'', the extended format yields:

	    /.lsof_vic.cc.purdue.edu

4.3	Why doesn't lsof know about AFS files on my favorite dialect?

	Lsof currently supports AFS for these dialects:

	    Solaris 2.4
	    SunOS 4.1.4

	It may recognize AFS files on other versions of these
	dialects, but I have no way to test that.  Lsof may report
	correct information for AFS files on other dialects, but
	I can't test that either.

	AFS support must be custom crafted for each Unix dialect
	and then tested.  If lsof supports your favorite dialect,
	but doesn't recognize its AFS files, probably I don't have
	access to a test system.  If you want AFS support badly
	for your dialect, consider helping me do the development
	and testing.

4.3.1	Why doesn't lsof report node numbers for all AFS volume files,
	or how do I reveal dynamic module addresses to lsof?

	When AFS is implemented via dynamic kernel modules -- e.g.,
	in NEXTSTEP or SunOS -- lsof can't obtain the addresses of
	AFS variables in the kernel that it uses to identify AFS
	vnodes.  It can guess that a vnode is assigned to an AFS
	file and it can obtain other information about AFS files,
	but it has trouble computing AFS volume node numbers.

	To determine node numbers for AFS volumes other than the
	root volume, /afs, lsof needs access to a hashed volume
	structure pointer table.  When it can't find the address
	of that table, because AFS support is implemented via
	dynamic kernel modules, lsof will return blanks in the
	INODE column for AFS volume files.  Lsof can identify the
	root volume's node number (0), and can compute the node
	numbers for all other AFS files.

	If you have a name list file that contains the addresses
	of the AFS dynamic modules -- e.g., you saved SunOS module
	symbols when you created a loadable module kernel with
	modload(8) by specifying -sym -- lsof may be able to find
	the kernel addresses it needs in that file.

	As of revision 3.59, lsof looks up AFS dynamic kernel
	addresses for two dialects at these default paths:

	    NEXTSTEP 3.2	/usr/vice/etc/afs_loadable
	    SunOS 4.1.4		/usr/vice/etc/modload/libafs

	A diferent path to a name list file with AFS dynamic kernel
	addresses may be specified with the -A option, when the -A
	option description appears in lsof's -h or -? (help) output.

	If any addresses appear in the -A name list file that also
	appear in the regular kernel name list file -- e.g., /vmunix
	-- they must match, or lsof will silently ignore the -A
	addresses on the presumption that they are out of date.