Access is denied when attempting to view or restore Volume Shadow Copy contents

I setup our help desk users to be able to restore documents using Microsoft’s Volume Shadow Copy client on remote servers yesterday.  Everything worked just fine for me as an administrator, and for users who owned the files, but it didn’t work for the help desk folks.  I found out they didn’t have NTFS rights to the files and folders, so I assumed all I had to do was assign them change permissions, and they’d be able to do the restore.

I made the permission change, but when the help desk folks tried to view the contents of the shadow copy snapshots they received “Access Denied” errors.  I had them confirm they could UNC to the location where the snapshots were located, and they could create and delete files there.

After much Googling didn’t provide many troubleshooting ideas, I decided to manually create a snapshot of the same volume.  I had them test again, and they were able to view the snapshot’s contents and restore files.  Underlying cause was the help desk group didn’t have permissions to the original snapshot, so they couldn’t see the files to restore them.  Hope this helps someone else out.

Fix: Groupwise webaccess hanging with “Request aborted while waiting on locked conversation” in webaccess log file

Users of our external Groupwise 6.5.6 webaccess gateway have been experiencing problems with their sessions hanging when trying to open items. Users access our internal webaccess gateways, which are the same version and configuration as the external gateway, have not been experiencing this problem. All Groupwise servers were running on NetWare 6.5.5. and were patched up to FTF GW 6.5 post SP6 English only Agents Rev 6.

It did not matter which Internet browser the external users were using, the same problem was apparent for users of Firefox 2.x, 3.x, IE6, and IE7. They’d try to open an item, and their browser would appear to hang for anywhere from a few seconds to a few minutes. Rebooting the webaccess server did not have any affect on the problem.

The following message was seen in the webaccess log file:

Error: Request aborted while waiting on locked conversation

The only Novell TID that was relevant to this problem is TID #10023251.  It describes this exact error message, but it specifically states the error only occurs when trying to view an attachment, which wasn’t the case for our users, since the error was logged when they were just navigating though the webaccess client.

The TID referenced above was not helpful, since it stated the only fix is to have the user perform the action again (ie resend the message) and there are no configurable parameters to increase this timeout.

Here’s what we did to troubleshoot this problem:

First thing I did was to verify the amount of free disk space on the sys volume of each server. The two well behaving servers had at least 750MB free, while the failing server only had 500MB free. Java sometimes behaves poorly when it lacks an abundance of free disk space, so I cleared out 500MB of old log files and restarted the server. Unfortunately, no change in performance or error rate was noted.

I loaded config.nlm on the good internal webaccess servers and the problematic external webaccess server. I then used Winmerge to compare the log resulting files to check for differences in versions of drivers, nlms, and configuration files. One of the team members noticed a difference in how webaccess was being loaded in protected mode. We tried duplicating the change on the external server, but that didn’t have any impact on the situation.

Next I checked the versions of java and tomcat on all machines:
To get the Java version number: java – version
To view the running instances of Java: java – show
To see Java instance memory utilization: java -showmemoryID where ID is the ID of the instance listed when performing java – show with no space between showmemory and the ID number.
Note: You have to switch to the NetWare console logger screen to see the output of these commands

I didn’t see anything abnormal in the problem server’s Java configuration, so next I looked at the Sys:\Apache2\logs\mod_jk.log file. Inside it I saw the following messages, repeated frequently:

jk_ajp_common.c (1318)]: Error connecting to tomcat. Tomcat is probably not started or is listening on the wrong port. worker=ajp13admin failed errno = 54

jk_uri_worker_map.c (620)]: In jk_uri_worker_map_t::map_uri_to_worker, wrong parameters

jk_ajp_common.c (1483): Timeout with waiting reply from tomcat. Tomcat is down, stopped or network problems

jk_ajp_common.c (1503): Tomcat is down or refused connection. No response has been sent to the client (yet)

These messages made me think communication was definitely failing somewhere. The server administrator in charge of the NetWare servers replaced the server’s patch cable and moved it to another port on the switch, thinking that may help communication. It didn’t.

My co-worker was poking through the NetWare Console Monitor – LAN/WAN drivers – highlight NIC – press tab for stats, and noticed increasing Rx CRC errors, as well as other errors. He replaced the network card, and all of the webaccess errors went away!

Troubleshooting Exchange Error 4.4.7 Delivery Delay and Failures

 

One of our partners keeps receiving the following messages when trying to email certain domains:

This is an automatically generated Delivery Status Notification.

THIS IS A WARNING MESSAGE ONLY.

YOU DO NOT NEED TO RESEND YOUR MESSAGE.

Delivery to the following recipients has been delayed.

user@domain.com

Where user@domain.com is the address he’s trying to send the message to.

Eventually he receives the following message

Your message did not reach some or all of the intended recipients.

The following recipient(s) could not be reached:

user@domain.com on 3/27/2008 9:11 AM

Could not deliver the message in the time limit specified. Please retry or contact your administrator.

<originating.mailserver.hostname #4.4.7>

He’s sending to addresses he’s previously sent to with no problems.

KB 284204 notes the following about the 4.4.7 error message:

Possible Cause: The message in the queue has expired. The sending server tried to relay or deliver the message, but the action was not completed before the message expiration time occurred. This NDR may also indicate that a message header limit has been reached on a remote server or that some other protocol timeout occurred during communication with the remote server.

Troubleshooting: This code typically indicates an issue on the receiving server. Verify the validity of the recipient address, and verify that the receiving server is configured to receive messages correctly. You may have to reduce the number of recipients in the header of the message for the host that you are receiving this NDR from. If you resend the message, it is placed in the queue again. If the receiving server is on line, the message is delivered.

You can see the problem is usually on the recipient’s server. Common causes are the recipients mail server is offline or otherwise unreachable, possibly due to DNS problems.

One thing you can try on the originator’s mail server is to increase the SMTP Virtual Server’s Delay Notification and Expiration Timeout settings.

To access these settings in Exchange 2003, open System Manager and navigate to Servers – Your Mail Server’s Name – Protocols – SMTP. Right click on your SMTP Virtual Server – Properties – Delivery tab.

SMTP Virtual Server Delivery Settings

I changed my Delay notification from 12 hours to 18 hours, and the Expiration timeout from 2 days to 4 days. You will need to tweak these settings to what is appropriate for your particular environment.

Another reason you may have these errors, especially with AOL email recipient may be you don’t have a DNS PTR record (Reverse DNS Record) for your mail server. AOL explains:

“AOL does require that all connecting Mail Transfer Agents have established reverse DNS, regardless of whether it matches the domain.”

This means if your mail server doesn’t have a Reverse DNS record, your messages sent to AOL will fail.

AOL has a page where you can enter your mail server’s IP address to determine if AOL can find it’s corresponding Reverse DNS record. If you’re not sure what the IP address of your mail server is, you can look it up based on your domain name.

Also note that setting up a Reverse DNS record is not the same process you would perform while creating a host name or other record. With forward (regular) DNS you setup your name servers with your domain registrar, like Network Solutions. With reverse DNS you must contact your ISP to have them create and host the record. The reason why is because the ISP is who is ultimately responsible for your IP address, and only they can verify that your mail server does indeed reside at that particular IP address.

 

 

 

 

 

 

Making Groupwise 7 and Blackberry Enterprise Server Communicate

I have a client who wanted to integrate Blackberry Enterprise Server (BES) version 4.1.3 with his Groupwise 7.0.1 system. He already had a Windows 2003 SP2 server ready for me to install BES onto, so I figured it would be a quick job. I was wrong.

The first hurdle appeared when I started to run the BES setup program on the Windows 2003 server. The installer refused to run because the server was running in Terminal Services Application mode, which is not a supported configuration.

We changed our plan and started running the BES installer on a different Windows 2003 SP2 server, but this time the installer quit because we did not have at least SQL 2000 SP3a on the server. Determining which version of SQL 2000 is installed is not the easiest thing to do, so we just went ahead and downloaded and installed SQL 2000 SP4.

After installing SQL 2000 SP4 we were able to install BES without problems, but BES was unable to communicate with Groupwise. We determined the problem was the version of the Groupwise client installed on the BES server was 7.0.1 IR1, which is not a supported configuration – we’d later learn we needed to be on versions 7.0.2 or 6.5.6 FTF4. Utilizing client 7.0.2 would have required upgrading the entire Groupwise system, so we decided to backrev to client 6.5.6.

I uninstalled the 7.0.1 IR1 client, rebooted the server, then installed GroupWise 6.5
Support Pack 6, Update 1
dated June 27, 2006. After rebooting the server again, BES and Groupwise could not communicate.

We uninstalled the GroupWise 6.5 Support Pack 6, Update 1 client, rebooted, then tried GroupWise 6.5 Post SP6 Client Rev 4 dated November 10, 2006. We found this to be the required client version according to KB04164, but it didn’t work for us despite following the special installation instructions listed in TID 2974707. We kept receiving the following error message in gwenv1.dll when executing the client:

Entry point not found. WpfCheckAncestryAnd Read

I figured the problem had to lie with gwenv1.dll, so I checked the file’s date. C:\windows\system32\gwenv1.dll was dated 6/16/2006, while the gwenv1.dll found in GroupWise 6.5 Post SP6 Client Rev 4 was dated 11/6/2006.

I suspected the problem was that files from previous Groupwise client installations were not being overwritten by the new client installations. I uninstalled the GroupWise 6.5 Post SP6 Client Rev 4 client , rebooted, ran Messaging Architects’ GW CleanIT, rebooted, then reinstalled GroupWise 6.5 Post SP6 Client Rev 4 client per the TID’s instructions.

We were finally able to communicate with Groupwise through the BES server!

In hindsight, I wish I would have found Blackberry’s KB KB12662, “Perform basic troubleshooting steps for Novell GroupWise”, prior to beginning the BES installation. It probably would have saved us a few hours worth of work.

SonicWall ViewPoint Administration web site won’t load

About a month ago, my Sonicwall Viewpoint 4.1 administration web site stopped loading. The www service was running just fine on my Windows XP SP2 host, but when I double clicked on the administration web site shortcut, http://localhost/sgms/login, the site never came up.

I tried replacing the localhost with the machine’s actual IP address and with 127.0.0.1, but those didn’t make any difference. No errors were seen in the Sonicwall firewall appliance or Windows XP event viewer, and no alerts were emailed to me from Viewpoint or the Sonicwall firewall device.

I found some interesting entries in the Viewpoint log files located at C:\ViewPoint4\MSDE\Data\MSSQL$SNWL\LOG\

  • 2008-02-06 11:58:15.56 spid51 CREATE/ALTER DATABASE failed because the resulting cumulative database size would exceed your licensed limit of 2048 MB per database.
  • 2008-02-06 11:58:15.71 spid51 Error: 1105, Severity: 17, State: 2
  • 2008-02-06 11:58:15.71 spid51 Could not allocate space for object ‘LOGS’ in database ’sgmsdb’ because the ‘PRIMARY’ filegroup is full..

I searched the Sonicwall knowledgebase and forums, but couldn’t find any information on any of these errors. I was hesitant to contact Sonicwall technical support because of the horrible experiences I’ve had every time I’ve contacted them in the past.

I checked the C:\ViewPoint4\syslogs directory and found that no new syslogs had been written since when the problem started on 01-09-2008 . Much to my dismay, that fact convinced me I had no recent syslog data.

I decided to focus on:

  1. Clearing out the old junk data and getting the program capturing new syslog information once again
  2. Fixing the access to the Viewpoint administration web site.

I found this post (registration required) on the Sonicwall forums where Stephanie recommended running the following from the SQL Query Analyzer, which is a part of SQL Enterprise Manager.

update sgmsdb.dbo.sgms_config
set paramValue = ‘02/05/2008 12:00:00′
where paramName = ’summarydaysLastDeleted’;

I connected to the Sonicwall web/database server from my SQL server that had Enterprise Manager, and ran the above query, making sure the date in set paramValue = ‘02/05/2008 12:00:00′ reflected yesterdays date. This cleared out the old data from the database.

I restarted the Viewpoint/web server machine, and was once again able to login to the Sonicwall Viewpoint administration web site. I waited a few minutes, then manually summarized the fresh data, and was once again able to monitor the traffic on my network.

Troubleshooting when Groupwise GWIA won’t send out mail

The other day my Netware 6.5.7 / Groupwise 7.0.2 server decided to stop sending out email for no apparent reason. Some of the things I tried during the troubleshooting process were:

1) Checked the GWIA log files, which didn’t show any errors occurring even with verbose logging enabled. As a matter of fact, the logs didn’t show the messages ever getting to the GWIA for processing! The MTA and POA log files did show the messages being processed, though.

2) Cleared all the GWIA queue directories, but mail still wasn’t sending out even after restarting the server.

3) I toggled the GWIA subdirectory per TID 10091741

4) I reinstalled GWIA per TID 3674238

5) I created a route.cfg file per TID 10010997

6) I made sure nothing weird was happening with DNS lookup on the Groupwise server.

7) I went through each step in TID 10061085, ” How to troubleshoot GWIA”

8 ) As a last ditch effort, I disabled Gwava (version 3.72), which we use as an inbound spam scanner. As soon as Gwava was disabled, mail started leaving the network. I was pretty stunned, since we only scan incoming mail, and we don’t use Gwava as a virus scanner. I verified in the Gwava config outgoing mail wasn’t set to be scanned. I then re-enabled Gwava, and the mail started piling up again. I had found the culprit, but not the cause of the holdup.

I checked over the server’s Gwava log files and console screens and didn’t see any errors, but did notice a message regarding NGW-VSCAN-CONTROLLER when unloading the MTA. That led me to TID 10069173, which pointed to a corrupt message being stuck in the \domain\MSlocal\gwvscan directory. I unloaded GWIA, GWAVA, and the MTA, and renamed the \domain\mslocal directory. I restarted the server, which recreated the previously renamed directory, and mail started flowing out again.

In my case, I had a bad message stuck in the \domain\MSlocal\gwvscan\4 directory. I moved a few files at a time from the renamed directory to the new \domain\MSlocal\gwvscan\4 directory until mail stopped processing. I then downed Gwava and the MTA, deleted the problem message, then reloaded the MTA and Gwava, and mail flow returned to normal.

Identifying and Clearing Groupwise GWIA Queues of Corrupt Messages

When the Groupwise GWIA gateway has problems sending or receiving mail, it’s often the result of a corrupt message clogging up a queue. The easiest way to troubleshoot the problem and restore mail flow is often to down the GWIA and rename the queue folders.

To accomplish this on a Netware server you can stop the GWIA and MTA by pressing F7. Once they have unloaded, browse to the domain\wpgate\gwia directory and rename the following directories:

  • 000.PRC
  • DEFER
  • GWHOLD
  • GWPROB
  • RECEIVE
  • RESULT
  • SEND
  • WPCSIN
  • WPCSOUT

Restarting the GWIA and MTA will recreate these folders. If mail starting flowing again, you can bet that the cause of the problem was a bad message in one of the renamed folders. Move a few messages at a time from the renamed folders to their corresponding new folder. The message flow should continue until you find the corrupt message, which is often the oldest message.

Once the corrupt message is identified, delete it or move it to a different location. This should allow mail flow to resume as expected.

For additional details, see TID 10075205, TID 10054298 and TID 10008353.

In a worst case scenario you may need to delete and reinstall GWIA per TID 3674238. Don’t forget to apply any applicable patches.

Exchange 2003 Event 2000: “Verify that the Microsoft Exchange MTA service has started. Consecutive ma-open calls are failing with error 3051″

One of the smaller networks I manage consists of a handful of users who connect to a SBS 2003 server. Their server keeps reporting the following in the Windows Application Log:

Event: 2000

Source: MSExchangeIS Mailbox

“Verify that the Microsoft Exchange MTA service has started. Consecutive ma-open calls are failing with error 3051″

This error would lead you to believe that the MTA Stack service wasn’t started when it should be. But if this is the only Exchange server in your organization and you aren’t connecting to an X.400 mail server, the MTA Stack service is not necessary. Previously I had even changed this service’s startup type to disabled, yet the server continued to report this error.

KB 810489 explains that stopping and disabling the Microsoft Exchange MTA Stack service is not sufficient to resolve this error. Two registry entries need to be created on the server for each public or private database on the server.

Open the following key in regedit:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeIS\ServerName

For each private or public database that is listed under this subkey, perform these steps

1) Right click on the database, select NewDWord Value. Name the value Gateway In Threads

2) Set the Gateway In Threads value to 0 (zero)

3) Right click on the database, select NewDWord Value. Name the value Gateway Out Threads

4) Set the Gateway Out Threads value to 0 (zero)

You must restart the Micrsoft Exchange Information Store service for the changes to take effect. The KB also explains:

“When you set the Gateway In Threads value and the Gateway Out Threads value to 0, Store and MTA connection failure events are not logged in the Application log after the MTA Stacks service has been disabled. If you create a new database on the server, you should set the Gateway In Threads value and the Gateway Out Threads value for the new database.”

Resources for Troubleshooting Windows Event 2019: The server was unable to allocate from the system nonpaged pool because the pool was empty

One of my client’s servers needs to be rebooted three times a day because of server hangs. A look in Event Viewer finds the following error corresponding to the time the server stops responding:

Event: 2019, Source: SRV

“The server was unable to allocate from the system nonpaged pool because the pool was empty”

They reboot the server, and the problem goes away for a while.

I performed some searches and found many potential known causes of this error, from Norton Antivirus 7.0-8.0 to Symantec Antivirus 10.x to ARCServe to SQL Server, but none of the suggested fixes resolved the issue. I used the following articles during the troubleshooting process:

KB 133384 for using Performance Monitor but couldn’t isolate the source of the memory leak.

KB 943998 regarding a HP NIC driver issue, KB 294346 regarding an 3com/IBM NIC driver issue

KB 895477 regarding WMI problems that may or may not be releated to SMS and/or SQL

KB 822219 describes filter driver issues relating to backup or antivirus software, specifically ARCserve and Veritas products

KB 870973 descibes a hotfix for a leak in the Volume Shadow Copy service

KB 102985 describes registry settings you can specify for memory usage – see NonPagedPoolSize

Windows 2000 – Evaluating Memory and Cache Usage – Optimizing Your Memory Configuration

I used the following tools to try and diagnose the source of the memory leak:

Poolmon.exe per KB 177415. Warning – using poolmon is not for the faint of heart

Performance Monitor Wizard and Perfmon.exe per KB 248345 for finding memory resource issues. The Performance Monitor Wizard simplifies the process of gathering performance monitor logs. It configures the correct counters to collect, sample intervals and log file sizes for troubleshooting.

Debug Diagnostic Tool 1.1 – Designed to assist in troubleshooting issues such as hangs, slow performance, memory leaks or fragmentation, and crashes in any Win32 user-mode process. The tool includes additional debugging scripts focused on Internet Information Services (IIS) applications, web data access components, COM+ and related Microsoft technologies

User Mode Process Dumper Version 8.1 per KB 241215The User Mode Process Dumper (userdump) dumps any running Win32 processes memory image (including system processes such as csrss.exe, winlogon.exe, services.exe, etc) on the fly, without attaching a debugger, or terminating target processes. Make sure to use the correct version for your CPU.

Windows Server 2003 Performance Advisor – Performance diagnostic tool for Windows Server 2003 and Windows Server 2003 Service Pack 1 (SP1)

Memtriage.exe – Resource Leak Triage Tool, a part of the Windows Server 2003 Resource Kit Tools

gflags.exe – see KB 262386 for example usage for diagnosing memory leaks

Memsnap.exe – This command-line tool takes a snapshot of the memory resources being consumed by all running processes and writes this information to a log file.

Debugging Tools for Windows was over my head, probably not useful except for programmer types.

This article has a nice description of using Process Explorer to determine your system’s maximum values for Paged and NonPaged Pools, while this one talks about troubleshooting memory leaks. This one discusses capturing application crash dumps, which allows for debugging services such as Print Spooler.

After using all these tools, I finally found the source of my problem with plain old Windows Task Manager. This article suggested viewing the Handle Count, with processes over 5,000 being suspect. Once I viewed the Handles column it was blatantly obvious JMBtnMgr.exe was the memory hog. I watched the handle count grow from 2,100 to over 6,000, when the server became unresponsive.

After restarting the server I found a shortcut to JMBtnMgr.exe in the Administrator startup menu. I took the shortcut out of the startup menu, restarted the server one more time, and haven’t found it hung in four days.

I suspect I also could have monitored Task Manager’s Non-Paged Pool Usage as well and would have found similar results. To view the NP Pool usage in Task Manager, click View – Select Columns – Non-Paged Pool

Internet Explorer: “Click to Activate and use this Control” results in blank browser window

Certain computers were getting the “Click to Activate and use this Control” prompt when attempting to view reports generated by an .asp script on one of our software provider’s web server. Even after clicking the new window, the window was blank, as in a totally white browser window. Hitting the space bar or enter key didn’t make a difference.

Both Internet Explorer 6 and 7 users had this issue, but we also had users of both versions of the IE browser that did not get this error. All workstations were XPSP2, so I figured that wasn’t an issue.

I verified all browser security settings allowed for ActiveX controls to be run, and ran Microsoft Update to ensure all the IE browser updates were installed. Next I installed the latest versions of Java, Flash, Shockwave, and Adobe Acrobat Reader to make sure those weren’t an issue, but the problem persisted.

I decided to generate this report, and rather than trying to view it in a browser window, I saved it to my hard drive. It saved as a .htm file, and I opened it with notepad. In the file I saw the following line:

crystal files\activexviewer.cab#version=9,2,0,442

I wondered if IE was using the Crystal Reports Viewer, so I went into IE and viewed the installed add-ons, and saw an entry for Crystal Report Viewer Control 9. If I disabled this control and ran the report, I was told the necessary control was not available… so I had figured out which control was causing the issue. I decided the best thing to do would be to uninstall the control and redownload it.

KB 154850 shows how to remove an ActiveX control, but I couldn’t find the control listed in any of the locations they specified. I resorted to googling for the answer to how to remove the Crystal Reports Viewer Control, and came across this page, which pointed me in the correct direction.

Note that these steps were written for Crystal Reports Viewer 8:

To remove a corrupted, unrecognizable, or older version of the ActiveX Viewer:

1. Right-click the Internet Explorer icon, and click ‘Properties’.

2. Click the ‘Settings’ button.

3. Click the ‘View Objects’ button, right-click ‘Crystal Report Viewer Control’, and then click ‘Remove’.

4. Click ‘Yes’ when prompted to remove the control.

5. Close the ‘Downloaded Program Files’ dialog box, click ‘OK’ on the ‘Settings’ dialog box, and then click ‘OK’ on the ‘Internet Options’ dialog box.

6. Search the computer for the following files and manually deregister them:

CRViewer.dll
SwebRS.dll
SViewHLP.dll
ReportParameterDialog.dll
CSelexpt.ocx
XQViewer.dll

====================
NOTE:

Use the following steps to deregister these files:

1. Search for the DLL file.

2. On the ‘Start’ menu, click ‘Run’.

3. Type “regsvr32 \u” in the ‘Run’ box and then drag the DLL file to the ‘Run’ box. The contents of the ‘Run’ box look similar to the following:

regsvr32 \u c:\myfiles\myDLL.dll

Unfortunately for me, none of the .dll files listed above were present on my machine. Plus, the \u is incorrect – it should be /u to unregister a .dll file.

On a hunch I searched my computer for crview*.*, and found a crviewer9.dll located in c:\program files\common files\crystal decisions\2.0\bin directory. I decided to try to unregister that .dll just to see what happened. To do so, I ran the following from a command prompt:

regsvr32 /u c:\program files\common files\crystal decisions\2.0\bin\crviewer9.dll

I restarted Internet Explorer, verified that Crystal Report Viewer no longer appeared in the list of installed add-ons, and once the control re-downloaded, was able to generate the report as expected!

I guess the moral of my story is that if you receive the “Click to Activate and use this Control” prompt, and all the obvious causes have been eliminated, you need to determine which ActiveX control is causing the issue, then reinstall the faulty control.

[updated 12-21-2007]

KB 945007 describes an IE6 and IE7 update that disables the “click to activate” behavior totally.