Chapter 13. Performance, Reliability, and Availability

Table of Contents

Introduction
Dissection and Discussion
Guidelines for Reliable Samba Operation
Name Resolution
Samba Configuration
Use and Location of BDCs
Use One Consistent Version of MS Windows Client
For Scalability, Use SAN-Based Storage on Samba Servers
Distribute Network Load with MSDFS
Replicate Data to Conserve Peak-Demand Wide-Area Bandwidth
Hardware Problems
Large Directories
Key Points Learned

Well, you have reached one of the last chapters of this book. It is customary to attempt to wrap up the theme and contents of a book in what is generally regarded as the chapter that should draw conclusions. This book is a suspense thriller, and since the plot of the stories told mostly lead you to bigger, better Samba-3 networking solutions, it is perhaps appropriate to close this book with a few pertinent comments regarding some of the things everyone can do to deliver a reliable Samba-3 network.

 

In a world so full of noise, how can the sparrow be heard?

 
 --Anonymous

Introduction

The sparrow is a small bird whose sounds are drowned out by the noise of the busy world it lives in. Likewise, the simple steps that can be taken to improve the reliability and availability of a Samba network are often drowned out by the volume of discussions about grandiose Samba clustering designs. This is not intended to suggest that clustering is not important, because clearly it is. This chapter does not devote itself to discussion of clustering because each clustering methodology uses its own custom tools and methods. Only passing comments are offered concerning these methods.

A search for “samba cluster” produced 71,600 hits. And a search for “highly available samba” and “highly available windows” produced an amazing number of references. It is clear from the resources on the Internet that Windows file and print services availability, reliability, and scalability are of vital interest to corporate network users.

So without further background, you can review a checklist of simple steps that can be taken to ensure acceptable network performance while keeping costs of ownership well under control.

Dissection and Discussion

If it is your purpose to get the best mileage out of your Samba servers, there is one rule that must be obeyed. If you want the best, keep your implementation as simple as possible. You may well be forced to introduce some complexities, but you should do so only as a last resort.

Simple solutions are likely to be easier to get right than are complex ones. They certainly make life easier for your successor. Simple implementations can be more readily audited than can complex ones.

Problems reported by users fall into three categories: configurations that do not work, those that have broken behavior, and poor performance. The term broken behavior means that the function of a particular Samba component appears to work sometimes, but not at others. The resulting intermittent operation is clearly unacceptable. An example of broken behavior known to many Windows networking users occurs when the list of Windows machines in MS Explorer changes, sometimes listing machines that are running and at other times not listing them even though the machines are in use on the network.

A significant number of reports concern problems with the smbfs file system driver that is part of the Linux kernel, not part of Samba. Users continue to interpret that smbfs is part of Samba, simply because Samba includes the front-end tools that are used to manage smbfs-based file service connections. So, just for the record, the tools smbmnt, smbmount, smbumount, and smbumnt are front-end facilities to core drivers that are supplied as part of the Linux kernel. These tools share a common infrastructure with some Samba components, but they are not maintained as part of Samba and are really foreign to it.

The new project, cifsfs, is destined to replace smbfs. It, too, is not part of Samba, even though one of the Samba Team members is a prime mover in this project.

Table 13.1 lists typical causes of:

  • Not Working (NW)

  • Broken Behavior (BB)

  • Poor Performance (PP)

Table 13.1. Effect of Common Problems

Problem

NW

BB

PP

File locking

-

X

-

Hardware problems

X

X

X

Incorrect authentication

X

X

-

Incorrect configuration

X

X

X

LDAP problems

X

X

-

Name resolution

X

X

X

Printing problems

X

X

-

Slow file transfer

-

-

X

Winbind problems

X

X

-

It is obvious to all that the first requirement (as a matter of network hygiene) is to eliminate problems that affect basic network operation. This book has provided sufficient working examples to help you to avoid all these problems.

Guidelines for Reliable Samba Operation

Your objective is to provide a network that works correctly, can grow at all times, is resilient at times of extreme demand, and can scale to meet future needs. The following subject areas provide pointers that can help you today.

Name Resolution

There are three basic current problem areas: bad hostnames, routed networks, and network collisions. These are covered in the following discussion.

Bad Hostnames

When configured as a DHCP client, a number of Linux distributions set the system hostname to localhost. If the parameter netbios name is not specified to something other than localhost, the Samba server appears in the Windows Explorer as LOCALHOST. Moreover, the entry in the /etc/hosts on the Linux server points to IP address 127.0.0.1. This means that when the Windows client obtains the IP address of the Samba server called LOCALHOST, it obtains the IP address 127.0.0.1 and then proceeds to attempt to set up a NetBIOS over TCP/IP connection to it. This cannot work, because that IP address is the local Windows machine itself. Hostnames must be valid for Windows networking to function correctly.

A few sites have tried to name Windows clients and Samba servers with a name that begins with the digits 1-9. This does not work either because it may result in the client or server attempting to use that name as an IP address.

A Samba server called FRED in a NetBIOS domain called COLLISION in a network environment that is part of the fully-qualified Internet domain namespace known as parrots.com, results in DNS name lookups for fred.parrots.com and collision.parrots.com. It is therefore a mistake to name the domain (workgroup) collision.parrots.com, since this results in DNS lookup attempts to resolve fred.parrots.com.parrots.com, which most likely fails given that you probably do not have this in your DNS namespace.

Note

An Active Directory realm called collision.parrots.com is perfectly okay, although it too must be capable of being resolved via DNS, something that functions correctly if Windows 200x ADS has been properly installed and configured.

Routed Networks

NetBIOS networks (Windows networking with NetBIOS over TCP/IP enabled) makes extensive use of UDP-based broadcast traffic, as you saw during the exercises in ???.

UDP broadcast traffic is not forwarded by routers. This means that NetBIOS broadcast-based networking cannot function across routed networks (i.e., multi-subnet networks) unless special provisions are made:

  • Either install on every Windows client an LMHOSTS file (located in the directory C:\windows\system32\drivers\etc). It is also necessary to add to the Samba server smb.conf file the parameters remote announce and remote browse sync. For more information, refer to the online manual page for the smb.conf file.

  • Or configure Samba as a WINS server, and configure all network clients to use that WINS server in their TCP/IP configuration.

Note

The use of DNS is not an acceptable substitute for WINS. DNS does not store specific information regarding NetBIOS networking particulars that get stored in the WINS name resolution database and that Windows clients require and depend on.

Network Collisions

Excessive network activity causes NetBIOS network timeouts. Timeouts may result in blue screen of death (BSOD) experiences. High collision rates may be caused by excessive UDP broadcast activity, by defective networking hardware, or through excessive network loads (another way of saying that the network is poorly designed).

The use of WINS is highly recommended to reduce network broadcast traffic, as outlined in ???.

Under no circumstances should the facility be supported by many routers, known as NetBIOS forwarding, unless you know exactly what you are doing. Inappropriate use of this facility can result in UDP broadcast storms. In one case in 1999, a university network became unusable due to NetBIOS forwarding being enabled on all routers. The problem was discovered during performance testing of a Samba server. The maximum throughput on a 100-Base-T (100 MB/sec) network was less than 15 KB/sec. After the NetBIOS forwarding was turned off, file transfer performance immediately returned to 11 MB/sec.

Samba Configuration

As a general rule, the contents of the smb.conf file should be kept as simple as possible. No parameter should be specified unless you know it is essential to operation.

Many UNIX administrators like to fully document the settings in the smb.conf file. This is a bad idea because it adds content to the file. The smb.conf file is re-read by every smbd process every time the file timestamp changes (or, on systems where this does not work, every 20 seconds or so).

As the size of the smb.conf file grows, the risk of introducing parsing errors also increases. It is recommended to keep a fully documented smb.conf file on hand, and then to operate Samba only with an optimized file.

The preferred way to maintain a documented file is to call it something like smb.conf.master. You can generate the optimized file by executing:

root#  testparm -s smb.conf.master > smb.conf

You should carefully observe all warnings issued. It is also a good practice to execute the following command to confirm correct interpretation of the smb.conf file contents:

root#  testparm
Load smb config files from /etc/samba/smb.conf
Can't find include file /etc/samba/machine.
Processing section "[homes]"
Processing section "[print$]"
Processing section "[netlogon]"
Processing section "[Profiles]"
Processing section "[printers]"
Processing section "[media]"
Processing section "[data]"
Processing section "[cdr]"
Processing section "[apps]"
Loaded services file OK.
'winbind separator = +' might cause problems with group membership.
Server role: ROLE_DOMAIN_PDC
Press enter to see a dump of your service definitions

You now, of course, press the enter key to complete the command, or else abort it by pressing Ctrl-C. The important thing to note is the noted Server role, as well as warning messages. Noted configuration conflicts must be remedied before proceeding. For example, the following error message represents a common fatal problem:

ERROR: both 'wins support = true' and 'wins server = <server list>' 
cannot be set in the smb.conf file. nmbd will abort with this setting.

There are two parameters that can cause severe network performance degradation: socket options and socket address. The socket options parameter was often necessary when Samba was used with the Linux 2.2.x kernels. Later kernels are largely self-tuning and seldom benefit from this parameter being set. Do not use either parameter unless it has been proven necessary to use them.

Another smb.conf parameter that may cause severe network performance degradation is the strict sync parameter. Do not use this at all. There is no good reason to use this with any modern Windows client. The strict sync is often used with the sync always parameter. This, too, can severely degrade network performance, so do not set it; if you must, do so with caution.

Finally, many network administrators deliberately disable opportunistic locking support. While this does not degrade Samba performance, it significantly degrades Windows client performance because this disables local file caching on Windows clients and forces every file read and written to invoke a network read or write call. If for any reason you must disable oplocks (opportunistic locking) support, do so only on the share on which it is required. That way, all other shares can provide oplock support for operations that are tolerant of it. See ??? for more information.

Use and Location of BDCs

On a network segment where there is a PDC and a BDC, the BDC carries the bulk of the network logon processing. If the BDC is a heavily loaded server, the PDC carries a greater proportion of authentication and logon processing. When a sole BDC on a routed network segment gets heavily loaded, it is possible that network logon requests and authentication requests may be directed to a BDC on a distant network segment. This significantly hinders WAN operations and is undesirable.

As a general guide, instead of adding domain member servers to a network, you would be better advised to add BDCs until there are fewer than 30 Windows clients per BDC. Beyond that ratio, you should add domain member servers. This practice ensures that there are always sufficient domain controllers to handle logon requests and authentication traffic.

Use One Consistent Version of MS Windows Client

Every network client has its own peculiarities. From a management perspective, it is easier to deal with one version of MS Windows that is maintained to a consistent update level than it is to deal with a mixture of clients.

On a number of occasions, particular Microsoft service pack updates of a Windows server or client have necessitated special handling from the Samba server end. If you want to remain sane, keep you client workstation configurations consistent.

For Scalability, Use SAN-Based Storage on Samba Servers

Many SAN-based storage systems permit more than one server to share a common data store. Use of a shared SAN data store means that you do not need to use time- and resource-hungry data synchronization techniques.

The use of a collection of relatively low-cost front-end Samba servers that are coupled to a shared backend SAN data store permits load distribution while containing costs below that of installing and managing a complex clustering facility.

Distribute Network Load with MSDFS

Microsoft DFS (distributed file system) technology has been implemented in Samba. MSDFS permits data to be accessed from a single share and yet to actually be distributed across multiple actual servers. Refer to TOSHARG2, Chapter 19, for information regarding implementation of an MSDFS installation.

The combination of multiple backend servers together with a front-end server and use of MSDFS can achieve almost the same as you would obtain with a clustered Samba server.

Replicate Data to Conserve Peak-Demand Wide-Area Bandwidth

Consider using rsync to replicate data across the WAN during times of low utilization. Users can then access the replicated data store rather than needing to do so across the WAN. This works best for read-only data, but with careful planning can be implemented so that modified files get replicated back to the point of origin. Be careful with your implementation if you choose to permit modification and return replication of the modified file; otherwise, you may inadvertently overwrite important data.

Hardware Problems

Networking hardware prices have fallen sharply over the past 5 years. A surprising number of Samba networking problems over this time have been traced to defective network interface cards (NICs) or defective HUBs, switches, and cables.

Not surprising is the fact that network administrators do not like to be shown to have made a bad decision. Money saved in buying low-cost hardware may result in high costs incurred in corrective action.

Defective NICs, HUBs, and switches may appear as intermittent network access problems, intermittent or persistent data corruption, slow network throughput, low performance, or even as BSOD problems with MS Windows clients. In one case, a company updated several workstations with newer, faster Windows client machines that triggered problems during logon as well as data integrity problems on an older PC that was unaffected so long as the new machines were kept shut down.

Defective hardware problems may take patience and persistence before the real cause can be discovered.

Networking hardware defects can significantly impact perceived Samba performance, but defective RAID controllers as well as SCSI and IDE hard disk controllers have also been known to impair Samba server operations. One business came to this realization only after replacing a Samba installation with MS Windows Server 2000 running on the same hardware. The root of the problem completely eluded the network administrator until the entire server was replaced. While you may well think that this would never happen to you, experience shows that given the right (unfortunate) circumstances, this can happen to anyone.

Large Directories

There exist applications that create or manage directories containing many thousands of files. Such applications typically generate many small files (less than 100 KB). At the best of times, under UNIX, listing of the files in a directory that contains many files is slow. By default, Windows NT, 200x, and XP Pro cause network file system directory lookups on a Samba server to be performed for both the case preserving file name as well as for the mangled (8.3) file name. This incurs a huge overhead on the Samba server that may slow down the system dramatically.

In an extreme case, the performance impact was dramatic. File transfer from the Samba server to a Windows XP Professional workstation over 1 Gigabit Ethernet for 250-500 KB files was measured at approximately 30 MB/sec. But when tranferring a directory containing 120,000 files, all from 50KB to 60KB in size, the transfer rate to the same workstation was measured at approximately 1.5 KB/sec. The net transfer was on the order of a factor of 20-fold slower.

The symptoms that will be observed on the Samba server when a large directory is accessed will be that aggregate I/O (typically blocks read) will be relatively low, yet the wait I/O times will be incredibly long while at the same time the read queue is large. Close observation will show that the hard drive that the file system is on will be thrashing wildly.

Samba-3.0.12 and later, includes new code that radically improves Samba perfomance. The secret to this is really in the case sensitive = True line. This tells smbd never to scan for case-insensitive versions of names. So if an application asks for a file called FOO, and it can not be found by a simple stat call, then smbd will return "file not found" immediately without scanning the containing directory for a version of a different case.

Canonicalize all the files in the directory to have one case, upper or lower - either will do. Then set up a new custom share for the application as follows:

	[bigshare]
			path = /data/xrayfiles/neurosurgeons/
			read only = no
			case sensitive = True
			default case = upper
			preserve case = no
			short preserve case = no
	

All files and directories under the path directory must be in the same case as specified in the smb.conf stanza. This means that smbd will not be able to find lower case filenames with these settings. Note, this is done on a per-share basis.

Key Points Learned

This chapter has touched in broad sweeps on a number of simple steps that can be taken to ensure that your Samba network is resilient, scalable, and reliable, and that it performs well.

Always keep in mind that someone is responsible to maintain and manage your design. In the long term, that may not be you. Spare a thought for your successor and give him or her an even break.

Last, but not least, you should not only keep the network design simple, but also be sure it is well documented. This book may serve as your pattern for documenting every aspect of your design, its implementation, and particularly the objects and assumptions that underlie it.