[Users] Zimbra mailbox project
Randy Leiker
randy at skywaynetworks.com
Mon Jan 29 19:09:28 CET 2018
Hi Frederic,
Thanks for your suggestions. I've begun looking into incorporating Ceph into the Zimbra mailbox HA project. I think this approach makes a lot of sense as it makes the message blobs & indexes for each end user mailbox highly available, in addition to avoiding the problem of needing to reliably move large quantities of mailbox data from one Zimbra mailbox node to another in a short period of time. Did you by chance work with BeeZim on this implementation?
Randy Leiker ( randy at skywaynetworks.com )
Skyway Networks, LLC
1.800.538.5334 / 913.663.3900 Ext. 100
https://www.skywaynetworks.com
----- Original Message -----
From: "Frédéric Nass" <frederic.nass at univ-lorraine.fr>
To: "Jonathan Labbé" <jlabbe at neonova.net>, "Randy Leiker" <randy at skywaynetworks.com>
Cc: users at lists.zetalliance.org
Sent: Friday, January 19, 2018 10:52:32 AM
Subject: Re: [Users] Zimbra mailbox project
Hi,
Moving from SAN storage to Ceph object storage (zimbra_class_store) really helped us to keep our Zimbra's infrastructure up all time.
We're now able to empty a store from all its 1500+ mailboxes in about half an hour, only moving mysql metadatas as all stores can access all blobs from the object storage (zimbraMailboxMoveSkipBlobs: TRUE, zimbraMailboxMoveSkipHsmBlobs: TRUE). This allows us to patch or upgrade a zimbra store with no downtime.
Future "always-on" mode will probably only come true when all stores will share unique, yet highly available, storage and metadata services.
Frederic.
Le 17/01/2018 à 17:31, Jonathan Labbé a écrit :
Wish, I did not have to leave in the middle of the meeting. So we take huge advantage of this already. We are currently running 4x LDAP (2 MMR servers and 2 Replica only servers), 8 x MTAs, 8x Proxies, and ~50x Mailbox stores, spread across two virtual data centers. We have F5's in the mix to help with proper load balancing and handling our SSL termination, and our own mailproxy system in front of Zimbra handling our client authentication.
We have been looking at ways to try and maintain balanced numbers on our mailbox stores, with size of mailbox, imap usage (which hopefully the new IMAP servers will be good), etc. The feasibility of just evacuating a mail store for us, as you put it, just doesn't seem feasible. For example, we have a good chunk of our mail stores with 16TB of mail on them. That's a lot to just suddenly move, especially if you're trying to ensure as little or no loss of mail while this happens.
A few questions I have;
How are you evacuating mailboxes off of a "bad" mailstore? How does that user data and their mail get transferred to the other mail stores? I am not aware of any zmmailbox command that just does this, except for zmboxmove, this can be a slow process.
Have you ran a dockerized mailbox store before? How does it perform? How many users were you able to run concurrently on this mail store?
Is this process also taking advantage of Zimbra's backup processes to ensure fast mailstore recovery?
photo
Jonathan Labbé
919-460-3330 • MailScanner has detected a possible fraud attempt from "jlabbe at neonova.net" claiming to be jlabbe at neonova.net
www.neonova.net
On Tue, Jan 16, 2018 at 6:54 PM, Randy Leiker < randy at skywaynetworks.com > wrote:
<blockquote>
Hi Everyone,
Earlier, on today's weekly Zeta Alliance call, I proposed a project to introduce new resiliency capabilities to Zimbra mailbox servers. Here's an outline of what was discussed & a further expansion on that topic:
Project Background
Dating back to early Zimbra versions, it's always been possible to create a cluster of Zimbra server nodes where individual components of Zimbra can be broken out into separate nodes for both load balancing & reliability benefits. As an example, consider a minimal Zimbra cluster with:
* 2 x LDAP nodes
* 2 x Proxy nodes
* 2 x MTA nodes
* 2 x Mailbox nodes
Within this example cluster, you could afford to lose 1 of each type of node (either due to maintenance, human error, or a disaster event), and the Zimbra cluster would remain mostly operational. The exception, is the mailbox nodes, in that, if you lose a mailbox node, any user mailboxes on that node become immediately unavailable. This limitation exists because, by design, a user's mailbox storage is tightly bound to a given mailbox node.
Shortly after Zimbra was acquired from Yahoo by VMware, there were mentions in various webinars & presentations regarding expanding on Zimbra's high availability (HA) capabilities. Most of those discussions seemed to focus on leveraging HA capabilities that were part of VMware products, but not necessarily in adding native HA capabilities to the Zimbra Suite itself. The VMware approach to HA for Zimbra works, as long as the VM hosting a Zimbra node remains bootable and all of the Zimbra services can start successfully following recovery by VMware's HA feature. However, it's of no help when you need to take a Zimbra mailbox node down for maintenance, perform a mailbox node upgrade, troubleshoot a fault on a mailbox node, or just deprecate a mailbox node for migration to newer hardware, since the end result is down time for mailbox end users.
In more recent years, Zimbra mailbox node HA was on the road map for Telligent, and most recently is on the road map for Synacor, with some suggestions indicating that it might make an appearance in a Zimbra 9.0 release. I think many Zimbra partners understand the significant undertaking that will be needed to decouple the storage of end user mailboxes from the mailbox nodes, and given that effort, the arrival of a true mailbox node HA feature is probably some ways off yet into the future.
Project Proposal
I propose building new flexibility around Zimbra mailbox nodes, while leaving the standard Zimbra distribution as-is. This would allow for addressing the current shortcomings outlined above. Above all, the intent of the project is to avoid changing the standard Zimbra distribution, so as not to create future support or upgrade problems, and rather to leverage a combination of freely available, open source tools & built-in Zimbra admin utilities to make it easy enough that Zimbra admins (of all skill levels) have the ability to take Zimbra mailbox nodes on & offline at-will with no disruption to mailbox end users. This doesn't eliminate the need for best practices, such as regular backups of your Zimbra infrastructure, but rather seeks to solve issues that backups don't address.
The project involves placing all Zimbra mailbox nodes within Docker containers, so it's the same mailbox node install you're used to doing, but just within a container instead. Other components of Zimbra (LDAP, MTAs, Proxies, etc.) could continue to run as VMs, physical machines, or perhaps within containers as well. With at least several containers, each running a Zimbra mailbox node, user mailboxes would presumably be evenly distributed over those mailbox nodes. This distribution could be done arbitrarily by a Zimbra admin, or through the course of normal day-to-day provisioning of mailboxes, by allowing Zimbra to choose which mailbox node to provision mailboxes on, from the pool of available mailbox nodes, which is functionality that exists today within Zimbra. With the introduction of the Zextras tools in Zimbra 8.8, using the Zextras backup/restore functionality would be yet another means to migrate customer mailboxes into mailbox nodes, housed within containers.
A script would then be created to allow for a given mailbox node (container) to have all of its mailboxes evacuated automatically. There are many cases where you may need to place a mailbox node into maintenance mode. To name a few:
* A service impacting configuration change is needed. For example, a Zimbra service restart for a new Let's Encrypt SSL certificate, or a zmprov local configuration value change that you want to test.
* A hardware or software (operating system/Zimbra package) fault exists.
* The mailbox node has insufficient hardware resources & is being deprecated for a newer mailbox node with more resources.
* The mailbox node has developed an unknown, difficult to troubleshoot problem, so rather than troubleshoot it, the node is simply replaced.
* A prior configuration error (aka human error) has led to an unstable mailbox node, so rather than fixing it, the node is replaced.
The script would take a few simple inputs, such as the target mailbox node, and the desired action. Available actions might include:
* Evacuating all mailboxes from a node
* Restoring mailboxes to a node
* Evacuating all mailboxes & removing the node from the cluster
* Adding a node to the Zimbra cluster & re-distributing mailboxes to the newly added node
For options requiring evacuation of mailboxes, the script would query Zimbra LDAP to determine which mailboxes are on that server, then using Zimbra's built-in zmmailbox utility, evacuate those mailboxes evenly across the remaining mailbox nodes (containers). For options requiring adding/removal of mailbox nodes, this would have a tie-in with both Docker & perhaps Kubernetes to allow for automating the provisioning & de-provisioning of containers for hosting the mailbox nodes. I think there's room to greatly expand on the available actions for this script, but this could be the first few steps. Care would be needed to ensure that the script handles error conditions well, most likely by alerting a human to a problem encountered carrying out a given action, with a recommendation of what's needed next to resolve it.
I know that's a long email, but wanted to offer some explanation to give you a scope of the project. The project would be free & open source for all in the Zimbra community to use. Several people expressed interest in this project on the weekly call earlier today. I'm curious to hear if others in the Zeta Alliance feel that this would be a worthwhile project that your organization could use and/or contribute to in the form of ideas, development, or testing. Your thoughts & feedback please?
Randy Leiker ( randy at skywaynetworks.com )
Skyway Networks, LLC
1.800.538.5334 / 913.663.3900 Ext. 100
https://www.skywaynetworks.com
</blockquote>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.zetalliance.org/pipermail/users_lists.zetalliance.org/attachments/20180129/d181fe3b/attachment.html>
More information about the Users
mailing list