Obtaining GroupWise Cores on SLES 12 and SLES 15

When I first started moving GroupWise systems to SLES 12, I had a horrible time just configuring GroupWise to capture a core in the event of a crash. It took a long time to figure it out since it's not as automatic as on previous SLES versions.  I've also found similar challenges on SLES 15. I was finally able to compile a list of needed items to either try to prevent the crash or create a good core once it does. Also I was told by GroupWise support that these cores provided better  info that the cores produced by default on SLES 11. In a situation where I can expect a core, I do this ahead of time as a necessary measure. I don't wait to see if it cores and then try to do this, I just do it when I build a system.

Note: Apparmor tends to interfere with cores so ensure it's turned off or disabled. I had a customer reboot a server last week, then GroupWise crashed, but I had not disabled AppArmor, only unloaded it. So when the server restarted, it loaded again and I failed to get a core at all.

Linux Specific Tasks

ULimit Config

Disable the limit for the maximum size of a core dump file.  I find that this is generally already set, but you can run it again just to be sure.  Run the command:

ulimit -c unlimited

Core Location

Create a location for the dump files. Run the command:

install -m 1777 -d /opt/cores  (This is my preferred path for the Core location, but you can change this)

Configure /etc/sysctl.conf

Configure a fixed location for storing core dumps as well as the file format for the core. The %e.%p will capture process names so it's easy to identify what process caused the core. This also sets the necessary kernel parameters to capture the core.

Add the following lines to /etc/sysctl.conf:

kernel.core_uses_pid = 1
kernel.core_pattern=/opt/cores/core.%e.%p
kernel.suid_dumpable=2

Then issue the following command :

sysctl -p /etc/sysctl.conf -w

noelision

The noelision option is not really about creating a core, it's more about preventing a crash in the first place. There are some applications that are susceptible to an Intel CPU hardware bug.  GroupWise seems to be one of those applications. Reference this SUSE TID: https://www.suse.com/support/kb/doc/?id=7022289

Create a file called:

/etc/ld.so.conf.d/noelision.conf

Add the following line to the /etc/ld.so.conf.d/noelision.conf file you just created:

/lib64/noelision   (This text should be in the file)

Save the file and run the following command from a terminal prompt:

/sbin/ldconfig   (This is a run command)

Disable Apparmor

I've found that when Apparmor is running, it can interfere with the core process. Disable it if you're having crashes and trying to capture cores.

Run the command:

rcapparmor stop

Disable AppArmor in Yast:

Yast --> System --> Services Manager --> apparmor --> Disable (Scroll down the list, it's not in alphabetical order, I find that it's generally toward the bottom)

GroupWise Specific Configuration

The following items should be completed to ensure that GroupWise is configured correctly to send a core to the configured path/filename when there's a crash.

/etc/sysconfig/grpwise

Add the following line to /etc/sysconfig/grpwise:

GROUPWISE_DEBUG_OPTION="on,fence"

Note #1:  It's /etc/sysconfig/grpwise not /etc/init.d/grpwise.
Note #2: This is case sensitive and should be entered exactly as shown

/etc/init.d/grpwise

Check the /etc/init.d/grpwise file, you should see the following string (It's usually there):

"$GROUPWISE_DEBUG_OPTION"

Add the following statement in the /etc/init.d/grpwise script. (put it under other export statement)

export GW_MEMTST=on,fence

What to do if/when GroupWise crashes

If GroupWise crashes, it should now create a crash dump file (core) in the path specified above. The core file can be analyzed by GroupWise support to attempt to determine the root cause of the crash. You should do the following:

Locate the Core File.

  • If you followed this guide, The core file will be in /opt/cores.
  • The file should be named in a way that identifies the process. Based on the instructions above, it should have "GWPOA" in the name since that is the crashing process.
  • The core file should also be timestamped the date/time of the POA crash.

Restart the Post Office

  • Restart the Post Office, it should just start right back up.

Get Support

  • Open an SR (Service Request) with Micro Focus, Reference a Crashed POA that caused a critical outage. Priority High or Critical.
  • Create a tar.gz file of the core file and name it the same as your SR number. For example SR12345678910tar.gz. (The core file can be large so compressing it helps manage the size and upload time)
  • Upload the core file to the Micro Focus support site.  Each support case requires case specific credentials, so you must access this directly from your support ticket.
  • Let the engineer know that you uploaded the core, and request that someone read your core.
  • Wait for Micro Focus Support to send you a new FTF to install and tell you to see if it crashes again (I say this jokingly but we've all been there).