I will start with some background. Prior to adding the Macs we had a stable network of three PCs. We had set these machines up over two years ago. Since setup the network had never had any significant issues. A good router was installed and the server computer was properly specced for the task. The PCs were running Windows 7 and the Mac’s were running Max OSX 10.6.8 Lion.
The addition of the Macs to the network did have any immediately noticeable effect on the PCs. However when the Macs attempted to use to the PCs file sharing we could see file access was slow, and rewrite operations often failed. After about two days the client showed us a message stating:
“There was an error connecting to the server "Server". Check the server name or IP address, and then try again. If you are unable to resolve the problem contact your system administrator”.
We immediately ruled out the possibility of infrastructure problems because Internet remained up during all of these problems. Further the Internet proved to be fast. On investigating we saw that the drive had been formatted to NTFS despite the fact that we request it to be formatted in FAT. We backed up the clients data and then reformatted the drive. This fixed problem of intermittent read write errors but introduced a new set of problems.
The Macs would regularly get bumped off the PCs, and when viewing files they would often show for a second, then disappear, and then reappear almost two minutes later. To cap it all off the problems were intermittent. At this point we decided to further investigate the server. An inspection of the event viewer revealed a “The server was unable to allocate from the system nonpaged pool because the server reached the configured limit for nonpaged pool allocations.” Error 2017”.
Fortunately we found Alan LaMielle’s fix. This pointed us to the fact that SMB problems were regular and caused by a restriction Microsoft placed in Windows 7 to encourage people to buy a server operating system. Two registry changes recommended by Alan and we thought we had the problem fixed.
First, set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\LargeSystemCache to ‘1’.
Second, set HKLM\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters\Size to ’3′.
Unfortunately the network was really slow and the intermittent disconnections persisted. We returned to looking at the Macs. Research revealed they we needed to adjust the network configuration of the Macs. We typed into terminal
sudo sysctl -w net.inet.tcp.delayed_ack=0
This is supposed to change the way the packets are sent around the network. There are four settings (from http://www.small-tree.com/kb_results.asp?id=1):
delayed_ack=0 responds after every packet (OFF)
delayed_ack=1 always employs delayed ack, 6 packets can get 1 ack
delayed_ack=2 immediate ack after 2nd packet, 2 packets per ack (Compatibility Mode)
delayed_ack=3 should auto detect when to employ delayed ack, 4 packets per ack. (DEFAULT)
We executed this command and restarted the Macs. The connection sped up but the unreliability remained. Further research advised that we should enable Internet Connection Sharing. We did this and there was no change at all. Further researches recommended disabling IPv6. This improved the stability of the connection but the network connections still dropped every three days. When they dropped out the only solution was to restart the server.
We decided to reinvestigate the server. Further research revealed we needed to disable SMB2 in the server. To do this we changed the registry.
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters\SMB2\ was set to a dword value of 00000000.
Network performance was still considered slow but the network was now reliable. To speed up the network we have now configured the server service on the Windows 7 server to restart at the start of every day. Not an optimum solution but it works well. We notice that approximately every 24 hours there is an error 2019 recorded in the event viewer. This error is caused by a memory leak. I am not sure why all these changes would cause a memory leak but it all seems to be working well now so we are leaving what works alone. When the system has proved to be completely stable I will probably take Robert Pearman's advice and use Polmon to see what is causing the memory leak.
Sometimes computer repairs are only solved by a mixture of persistence and pragmatism as was the case in this example.