One of my clients has a Windows 2000 SP4/ Exchange 2000 SP3 server that I support. They know if needs to be replaced soon, but because they are already handling too many other projects, they are attempting to hold off on migrating to Exchange 2007 in summer 2008.
Everything had been running great on the server until the beginning of March 2007. All of a sudden, the inetinfo.exe process would jump to 100% utilization and lock up the server. It would literally take 25 seconds for a character to appear after you entered a command. Rebooting the machine fixed the problem temporarily, but it always returned within an hour.
To give you a little background, this is a Compaq Presario server that also runs the following items:
- Active Directory
- DNS and global catalog
- Symantec Corporate Edition for file level anti-virus
- Symantec Mail Security for Exchange anti-virus
- GFI MailEssentials 12 for spam filtering
- IIS/Outlook Web Access
- Automatic Updates/Microsoft Update
Before anyone mentions it, yes, I do know it’s not a good idea to run an Exchange or IIS server as a domain controller, but to make a long story short, that’s just the way it’s going to be for the time being.
When I saw Automatic Updates was enabled and the server was unresponsive I initially though it was the svchost/msi problem, but that didn’t end up being a factor.
First, I double check to ensure the correct Exchange files and directories were excluded from the Symantec Corporate file level scanner. Next I made sure the Mail Security software didn’t show any errors or warnings.
I also verified that no well meaning administrator had attempted to ‘optimize‘ the server without my knowledge.
Of course I tried stopping the SMTP service and all the MS Exchange services, but that made no difference. The server was so unresponsive that I had to use the wonderful PSservice.exe from another machine in order to stop services. I verified services unused in their environment such as POP3/IMAP were disabled. Even after changing The MS Exchange services, Symantec services, www, IIS admin, and SMTP service to manual start and rebooting, inetinfo.exe still slowly increased it’s CPU utilization to 100%.
My next step was to look in the event logs for a clue as to the culprit. The pertinent events included:
Event Source: Service Control Manager
Event Type: Error
Event ID: 7031
Description: The IIS Admin Service service terminated unexpectedly. It has done this x time(s).Event Source: Service Control Manager
Event Type: Error
Event ID: 7034
Description: The Simple Mail Transport Protocol (SMTP) service terminated unexpectedly. It has done this x time(s).The 7031 event came up with tons of hits on eventid.net I’ll save you from the boring details, but be assured I tried every single suggestion posted, and nothing fixed my problem once the GFI services started back up. The same thing happened with the 7034 error, nothing I tried fixed the situation.
I was running out of things to try, so I stopped the DNS server on the Exchange box and pointed it to a different DNS server, then rebooted – still no change. I ran dcdiag.exe and netdiag.exe to look for communication errors, but none were found.
I then remembered a SysInternals utility called filemon that lets you see in real time the operations a particular process is executing. I targeted the inetinfo.exe process, and was stunned to see it was opening, reading, writing to, and closing two files different files, weights.bsp and scan.log,133 times per second each. Now I’m no programming whiz, but that seems like an awful lot of disk activity for an old server to handle.
After a quick analysis I discovered the files inetinfo.exe was accessing belonged to the GFI MailEssentials spam filtering software! I couldn’t believe I have forgotten to disable those services. I quickly did, rebooted, and crossed my fingers that inetinfo.exe would’t start hogging the CPU. We went to lunch, came back, and inetinfo.exe was behaving itself. I was fairly sure I had caught the culprit, but now I had to figure out a solution to deal with it. I re-enabled all the services I had disabled (except the GFI related ones).
First thing I did was verify the server was running the most recent build of the GFI software (it was). Next was a trip to the GFI knowledge base. I found two articles relevant to inetinfo.exe. The first pointed me to a Microsoft KB article that referred to using the Exchange 2003 routing engine on an Exchange 2000 server. That didn’t sound like it applied to me, but I followed the link to the SMTP security update it said to install anyway. Since I had Automatic Updates installed, I was fairly confident I already had this critical update installed. I went into add/remove programs, but it wasn’t listed!
I remembered Windows Update doesn’t update all products, so I went to update.microsoft.com to scan for missing patches. I was greeted with the following message:
“The site cannot continue because one or more of these Windows services is not running
- Automatic Updates
- Event Log
- Background Intelligent Transfer Service (BITS)”
Now normally, I’d just go restart the offending service, but much to my suprise, all three were already running! I rebooted again, and suddenly Microsoft Update was able to scan my server. It found two missing critical security updates (even though AU is set to download and install automatically) for Office 2003 (which requires Office 2003 SP2) and Word Viewer 2003 (no office products are installed except the Word Viewer). I let MU install both updates, then after rebooting scanned the computer with MBSA 2.0.1, which didn’t find anything earth shattering wrong.
After some more querying in the Microsoft knowledge base, I found I was missing the Update Rollup for Exchange 2000 post service pack 3 version 6603.1. I installed it, then rebooted. Nothing was different with the system’s performance when GFI was enabled, but I wanted to make sure I was 100% patched in case I had to call technical support.
I was beginning to get concerned about the patching process. Both Microsoft Update and the most recent update MBSA didn’t find any missing patches, but when I manually searched the MS download site I found vastly different numbers of articles depending on how I worded my query. For example:
- Exchange 2000 – 169 items
- Exchange 2000 patch – 33 items
- Exchange 2000 rollup – 15 items
- Exchange 2000 security update – 13 items
So my question is, how exactly are we supposed to know if all the critical updates are actually installed? I know Exchange 2000 is not officially supported anymore, but why can’t I still not go to one place for all the updates?
Anyway, back to the Update Rollup post install. After the reboot I checked the even logs, and I now had a brand new error:
Event: 1, Source: ExWin
The Exchange IFS failed to map drive <drive letter>:. Please free drive <drive letter>: to use Exchange IFS.
I went into Windows Explorer, and yep, my M: was now missing! I went through all the information at Eventid.net, and the only thing that really applied to my situation was KB305145, which tells you how to remove the M: mapping. I thought maybe it would give me an idea about what to do by reverse engineering the process, but it isn’t until I get to the end of this document that I see
” You can safely ignore this error message; it will be removed in later versions of Exchange 2000.”
Later versions of Exchange 2000?
So I determined the above error was a false alarm, and I decided to take a different approach to the problem. I knew inetinfo.exe was involved in the problem, so I decided to search on IIS patches. I found KB830695, which describes increased memory usage in inetinfo.exe process if delivery restrictions are set on the SMTP connector – but they weren’t set.
I searched the knowledge base on “inetinfo.exe update” and received 143 hits. I then searched on “inetinfo.exe update 2000″, and it returned 82 hits.
I started wading through the articles, and found KB885882, which referred me to Microsoft Security Bulletin MS04-035, which led me to KB890066, which is another missing security update for Exchange 2000! So I installed KB890066 (which wasn’t listed in add/remove programs or by netdiag) and rebooted. Again.
Here’s a summary of what I ended up manually installing:
Remember, none of these patches were found missing by Microsoft Update or MBSA.
After the reboot, I was greeted with the following message in the Event Log.
Event: 1005 Source: MSExchangeSA
“Unexpected error <<0xC1050000 – Network Problems are preventing connection to the Microsoft Exchange Server Computer. An unexpected unknown error has occurred. Microsoft Exchange Server Information Store ID no: 80040115-0514-00000006>> occurred.”
Once again, I head over to Eventid.net to look up the error code, and the following note is the first thing I see:
“Event 1005 seems to be a general event code used to indicate that something the System Attendant needs is missing or corrupt.”
By now I’m ready to throw in the towel. The last thing I need is corrupted Exchange databases, I’ve dealt with that nightmare before. So I do some more investigating, and fiND much to my surprise all the Exchange services had started.
I read through all of eventid.net’s suggestion and the following KB articles:
and found no resolution to my problem. One thing I did find was KB242450, which describes the incredibly messed up and not user friendly keywords you need to use to find pertinent information in Microsoft’s screwed up knowledge base! I mean, if I hadn’t found this article I wouldn’t know about keywords such as kbExchange2000Serv and kbExchangeServ2003 (thanks for the consistent naming conventions!)
After installing the patches listed above, I rebooted and saw the following message:
“SCSI ID 2 failed – degraded array”
I stopped what I was doing, and ordered a new hard drive.
It took forever to boot up, and I actually thought it might be terminal. Finally I was able to log in, and saw the following services did not start:
- MS Exchange Event Log
- MS Exchange Information Store
- MS Exchange MTA stacks
Luckily, I was able to start them manually.
Based on the information I had gathered, I deduced the failing array was causing write errors (which I never saw logged, not even ftdisk errors) via inteinfo.exe errors when the GFI spam filter was doing its thing. After the array error, I didn’t want to change anything else, and immediately started backing up the server.
I left the GFI services turned off, and my Exchange PreSubmission queue eventually emptied itself a few hours later.
One person who was working on the issue with me found an article on the GFI knowledge base that suggested this was a DNS problem, but I went back and ran all the netdiag/dcdiag dns tests, and didn’t find any errors.
I hear everything was fine after replacing the drive (I was at another client’s site the next day). I found it strange that I didn’t get hard drive warnings until the array actually failed.
To summarize what I learned in this experience.:
- Don’t assume you’re fully patched, even if you use Windows Update/Microsoft Update/MBSA
- When weird shit starts happening all of the sudden, don’t wait to suspect hardware until everything else you’ve tried doesn’t work
- 100% utilization in one process may be caused by a totally different process
- Backup at the first sign of hardware failure