[Users] Zimbra mailbox project
randy at skywaynetworks.com
Wed Jan 17 00:54:16 CET 2018
Earlier, on today's weekly Zeta Alliance call, I proposed a project to introduce new resiliency capabilities to Zimbra mailbox servers. Here's an outline of what was discussed & a further expansion on that topic:
Dating back to early Zimbra versions, it's always been possible to create a cluster of Zimbra server nodes where individual components of Zimbra can be broken out into separate nodes for both load balancing & reliability benefits. As an example, consider a minimal Zimbra cluster with:
* 2 x LDAP nodes
* 2 x Proxy nodes
* 2 x MTA nodes
* 2 x Mailbox nodes
Within this example cluster, you could afford to lose 1 of each type of node (either due to maintenance, human error, or a disaster event), and the Zimbra cluster would remain mostly operational. The exception, is the mailbox nodes, in that, if you lose a mailbox node, any user mailboxes on that node become immediately unavailable. This limitation exists because, by design, a user's mailbox storage is tightly bound to a given mailbox node.
Shortly after Zimbra was acquired from Yahoo by VMware, there were mentions in various webinars & presentations regarding expanding on Zimbra's high availability (HA) capabilities. Most of those discussions seemed to focus on leveraging HA capabilities that were part of VMware products, but not necessarily in adding native HA capabilities to the Zimbra Suite itself. The VMware approach to HA for Zimbra works, as long as the VM hosting a Zimbra node remains bootable and all of the Zimbra services can start successfully following recovery by VMware's HA feature. However, it's of no help when you need to take a Zimbra mailbox node down for maintenance, perform a mailbox node upgrade, troubleshoot a fault on a mailbox node, or just deprecate a mailbox node for migration to newer hardware, since the end result is down time for mailbox end users.
In more recent years, Zimbra mailbox node HA was on the road map for Telligent, and most recently is on the road map for Synacor, with some suggestions indicating that it might make an appearance in a Zimbra 9.0 release. I think many Zimbra partners understand the significant undertaking that will be needed to decouple the storage of end user mailboxes from the mailbox nodes, and given that effort, the arrival of a true mailbox node HA feature is probably some ways off yet into the future.
I propose building new flexibility around Zimbra mailbox nodes, while leaving the standard Zimbra distribution as-is. This would allow for addressing the current shortcomings outlined above. Above all, the intent of the project is to avoid changing the standard Zimbra distribution, so as not to create future support or upgrade problems, and rather to leverage a combination of freely available, open source tools & built-in Zimbra admin utilities to make it easy enough that Zimbra admins (of all skill levels) have the ability to take Zimbra mailbox nodes on & offline at-will with no disruption to mailbox end users. This doesn't eliminate the need for best practices, such as regular backups of your Zimbra infrastructure, but rather seeks to solve issues that backups don't address.
The project involves placing all Zimbra mailbox nodes within Docker containers, so it's the same mailbox node install you're used to doing, but just within a container instead. Other components of Zimbra (LDAP, MTAs, Proxies, etc.) could continue to run as VMs, physical machines, or perhaps within containers as well. With at least several containers, each running a Zimbra mailbox node, user mailboxes would presumably be evenly distributed over those mailbox nodes. This distribution could be done arbitrarily by a Zimbra admin, or through the course of normal day-to-day provisioning of mailboxes, by allowing Zimbra to choose which mailbox node to provision mailboxes on, from the pool of available mailbox nodes, which is functionality that exists today within Zimbra. With the introduction of the Zextras tools in Zimbra 8.8, using the Zextras backup/restore functionality would be yet another means to migrate customer mailboxes into mailbox nodes, housed within containers.
A script would then be created to allow for a given mailbox node (container) to have all of its mailboxes evacuated automatically. There are many cases where you may need to place a mailbox node into maintenance mode. To name a few:
* A service impacting configuration change is needed. For example, a Zimbra service restart for a new Let's Encrypt SSL certificate, or a zmprov local configuration value change that you want to test.
* A hardware or software (operating system/Zimbra package) fault exists.
* The mailbox node has insufficient hardware resources & is being deprecated for a newer mailbox node with more resources.
* The mailbox node has developed an unknown, difficult to troubleshoot problem, so rather than troubleshoot it, the node is simply replaced.
* A prior configuration error (aka human error) has led to an unstable mailbox node, so rather than fixing it, the node is replaced.
The script would take a few simple inputs, such as the target mailbox node, and the desired action. Available actions might include:
* Evacuating all mailboxes from a node
* Restoring mailboxes to a node
* Evacuating all mailboxes & removing the node from the cluster
* Adding a node to the Zimbra cluster & re-distributing mailboxes to the newly added node
For options requiring evacuation of mailboxes, the script would query Zimbra LDAP to determine which mailboxes are on that server, then using Zimbra's built-in zmmailbox utility, evacuate those mailboxes evenly across the remaining mailbox nodes (containers). For options requiring adding/removal of mailbox nodes, this would have a tie-in with both Docker & perhaps Kubernetes to allow for automating the provisioning & de-provisioning of containers for hosting the mailbox nodes. I think there's room to greatly expand on the available actions for this script, but this could be the first few steps. Care would be needed to ensure that the script handles error conditions well, most likely by alerting a human to a problem encountered carrying out a given action, with a recommendation of what's needed next to resolve it.
I know that's a long email, but wanted to offer some explanation to give you a scope of the project. The project would be free & open source for all in the Zimbra community to use. Several people expressed interest in this project on the weekly call earlier today. I'm curious to hear if others in the Zeta Alliance feel that this would be a worthwhile project that your organization could use and/or contribute to in the form of ideas, development, or testing. Your thoughts & feedback please?
Randy Leiker ( randy at skywaynetworks.com )
Skyway Networks, LLC
1.800.538.5334 / 913.663.3900 Ext. 100
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users