[Users] Postfix Fails To Start On Reboot

Mon Jan 14 00:38:08 CET 2019

Thanks Randy; I think we have different root causes here.

/opt/zimbra/stat is only 200MB on this particular server, built in December.

Before tonight’s reboot I manually stopped Zimbra.

All the best,
Mark

___________________________
L. Mark Stone
Sent from my iPhone

On Jan 13, 2019, at 6:14 PM, Randy Leiker <randy at skywaynetworks.com<mailto:randy at skywaynetworks.com>> wrote:

Hi Mark,

The issue with the Zimbra MTAs I eventually tracked down & solved could be different than what you're describing.  In my case, I experienced these symptoms:

  1.  On Zimbra MTA 8.8.x servers, a "zmcontrol restart" or a reboot of the server would often, but not always, cause the Postfix & zmconfigd services to take anywhere from 10-20 minutes before they would start.
  2.  The disk wait time in "top" was well above its normal baseline.
  3.  The CPU usage was continuously at or near 100%.
  4.  Long running processes using 20-40% CPU each were present in atop with names starting with "postconf -d" with their output piped to the Zimbra configuration folder.
  5.  A unusual number of inotify processes would appear in "atop", each using around 10-15% CPU.

The root cause was due to the Zmstats (https://wiki.zimbra.com/wiki/Zmstats) process generating very large files in the /opt/zimbra/zmstats directory, where files like io.csv & io-x.csv could be in the 3-5 GB size range per file.  Zimbra has a built-in process to periodically rotate & archive all of the performance metrics found in the /opt/zimbra/zmstats directory in to sub-folders, with names based on the archive date.  This archiving process uses the Linux cat command to pipe each file to gzip.  Each time the 3-5 GB files were run through cat to gzip, it resulted in considerable disk I/O.  This in turn caused the CPU usage to peg at near 100%, as the CPU was endlessly waiting for the disk I/O to finish.  Making matters worse, the anti-virus product installed on my Zimbra MTAs was scanning that data as it was being written to the Zmstat archive using the inotify processes.  All of this in turn left very little system resources left for zmconfigd to finish doing its rebuild of the Postfix configuration files (the postconf -d processes), as it normally does for any Zimbra service restart.  This then caused zmconfigd to appear non-responsive & where a "zmcontrol status" would report that it failed to start.

The Zmstat archive folders are never removed by Zimbra & will accumulate indefinitely, eventually consuming a great deal of disk space left unchecked.  I know there was an enhancement request in Bugzilla dating back to Zimbra 7.x to auto delete the old Zmstats archive folders, but that's never been implemented.  This means that Zimbra admins will need to create a separate script, or manually, purge those old archive folders.  It's not clear to me yet why the file size on some of the CSV files in that directory became so large over short periods of time (24-48 hours), but since manually erasing those files, stopping the zmstats service & restarting it, it's been behaving well for about a week now with no further recurrences.

In answer to your question about file permissions & ownership, here's what I show for the /opt/zimbra/data/postfix/spool/pid directory on one of the Zimbra 8.8.x MTA servers here:

-rw-------. 1 postfix postfix  0 Mar 20  2018 inet.[127.0.0.1]:10025
-rw-------. 1 postfix postfix  0 Mar 20  2018 inet.[127.0.0.1]:10030
-rw-------. 1 postfix postfix  0 Mar 20  2018 inet.465
-rw-------. 1 postfix postfix  0 Apr 11  2018 inet.submission
-rw-------. 1 postfix postfix 33 Jan  6 18:00 master.pid
-rw-------. 1 postfix postfix  0 Mar  4  2018 pass.smtpd
-rw-------. 1 postfix postfix  0 Mar 15  2018 unix.bounce
-rw-------. 1 postfix postfix  0 Mar 15  2018 unix.cleanup
-rw-------. 1 postfix postfix  0 Sep 19 23:30 unix.defer
-rw-------. 1 postfix postfix  0 Sep 24 04:09 unix.error
-rw-------. 1 postfix postfix  0 Apr 11  2018 unix.lmtp
-rw-------. 1 postfix postfix  0 Sep 26 23:48 unix.retry
-rw-------. 1 postfix postfix  0 Feb 11  2018 unix.showq
-rw-------. 1 postfix postfix  0 Mar 15  2018 unix.smtp
-rw-------. 1 postfix postfix  0 Mar 20  2018 unix.smtp-amavis
-rw-------. 1 postfix postfix  0 Dec  6 08:58 unix.trace

Randy Leiker ( randy at skywaynetworks.com<mailto:randy at skywaynetworks.com> )
Skyway Networks, LLC
1.800.538.5334 / 913.663.3900 Ext. 100
https://skywaynetworks.com<http://www.skywaynetworks.com>

________________________________
From: "L Mark Stone" <lmstone at lmstone.com<mailto:lmstone at lmstone.com>>
To: users at lists.zetalliance.org<mailto:users at lists.zetalliance.org>
Sent: Sunday, January 13, 2019 4:21:30 PM
Subject: [Users] Postfix Fails To Start On Reboot

Several of us are seeing the issue where Postfix fails to restart on reboot.  Randy I believe had some good information on this on a recent Zeta call.

So, this happened (again) to me today, and I did my usual zmmtactl stop, move the ~/data/postfix/spool/pid/master.pid someplace, then zmmtactl start (which gets saslauthd started but not postfix) then "postfix stop" and "postfix start" and everything is OK again.  All commands executed as the Zimbra user.

Except now I'm seeing the ownership permissions of ~/data/postfix/spool/pid all over the place, so asking what others are seeing and what should the ownership be?

On one system that has never had an issue with Postfix restarting, all of the files are owned by postfix:postfix with 600 perms.

On the system where I just executed my hack, the master.pid, unix.error and unix.trace files are owned by root:root (with all other files owned by postfix:postfix).  The old master.pid was owned by postfix:postfix.

On another system which has had this issue intermittently, unix.trace is owned by root:root and everything else by postfix:postfix.

So before I open a Support Case with Zimbra, I thought I'd ask here what others are seeing, and what your workaround has been.

Thanks,

Mark

_________________________________________________

Another Message From...   L. Mark Stone

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.zetalliance.org/pipermail/users_lists.zetalliance.org/attachments/20190113/f046fa1e/attachment.html>