The tldr; is at the bottom, after the story mode writeup. Feel free to CTRL+END or hyper-scroll your way there, or read the machinations of my troubleshooting on your way there.
In the last entry I wrote about how a critical server failed at Manufacturing Company 2 (AKA MC2) with no support contract, but I managed to acquire a refurbished, same model server relatively quickly and restore all services, with some DNS changes on the subnets and firewalls to get folks up and running immediately.
So now, dear readers, when blessed with 3 identical servers (all under support contracts, of course), all type 1 hypervisors, with no shared storage and the need for resiliency, what do you do? You call your vendor and ask for vSphere licensing, of course.
And to be frank, getting the licensing was the easy part. It just takes a little back and forth with a vendor to sort the details, swallowing your pride and accepting that daddy Broadcom has changed VMWare pricing, and ponying up the cash. That's not really what this story is about, as many others have written about the changes Broadcom hath wrought. No, this is about how to convert 3 production servers to vSAN on the fly, with no interruption of service. to the users. Is this the best way? No, the best way would be to stand up new servers for a new cluster and vMotion them all over from the old servers, but that was not an option afforded to me by MC2, so we got a little wild wild westy and made the changes over a 2 month period.
First order of business: install vSphere. It's really easy to do, and adopting the hypervisors into it is also super simple. It's as simple as right-clicking the data center you created in vSphere, selecting "Add Host" and filling in the appropriate info. Adding all 3 took about 30 seconds each.
Congratulations! Your hypervisors are now in a data center in vSphere! Which means almost nothing...until you add in all the features that vSphere brings you. But, as each server was set up differently at different times, they all have different networking, and they should all be aligned as much as possible. So for a couple of days I added additional 10Gbe connections to each server, re-IP'd each so that they were on the proper VLAN instead of the legacy VLAN and added in vMotion vmnics to each.
Congratulations again! For reals this time! You can now vMotion servers between the hosts, which is actually a pretty big and important step. It means you can move a server during production, because it takes a snapshot, moves the VM files over the network to the receiving host, then deltas the snapshot in. It's a pretty remarkable feature really, and for a number of years VMWare was the only game in town, though pretty much everybody else can also do now, but that doesn't diminish what that change brings: flexibility. I could now move VMs between hosts and datastores, and that's exactly what I needed in order to reprovision all 3 hosts into vSAN mode.
With vMotion available as a tool I immeddiately set out streamlining disk usage on all 3 hosts. There were some old servers that had been retired consuming about 10.5TB of space that I needed to get rid of, so I created backups of the VMs in the new Synology RS I purchased, tested the backups and subsequently erased the VMs from vSphere.
Now I had a hodgepodge of data stores with varying levels of RAID that I need to address. First, I calculated how much storage all the VMs consumed across the 3 hosts, which now totaled about 15TB, then I counted out all of the 3.84TB SAS SSDs we had, which was 24. I knew that would not be enough as with vSAN all the VMs are cached on each host, and disks are usually added in pairs for storage and cache, so I needed 10 per host. I ordered the extras I needed as well as 1 larger, 7.68TB SAS SSD (for the V6 server, which was larger than 1 single 3.84TB SSD). I also ordered the SAS HBAs I needed, as RAID controllers are not usually supported in vSAN.
For those of you who don't know vSAN is software defined storage, similar in some ways to ZFS or BTRFS, so the disks need to be accessed directly from ESXi, not through a RAID controller.
The first host to get reprovisioned was the one with the fewest VMs, so I vMotioned them over to the other 2 hosts, then replaced the RAID controller with the HBA, cleared the foreign RAID config from each disk and then kept only 6 of the storage disks in and ready to claim in vSAN later.
Why only 6 instead of the 10 I'd already determined that I needed? Well, vSAN needs a quorum, so all 3 hosts have to be in vSAN in order for the vSAN datastore to be accessible, so I then added some additional disks (on the opposite side of the disk bays, of course) to use as temporary storage. Then I vMotioned a few servers a day over to the temporary storage until the next host was empty.
Wash, rinse & repeat, with the only difference being that one of the temporary disks used was the 7.68TB, which I vMotioned the V6 server onto. There was a bit of deleting foreign RAID configs needed in iDRAC before all of the disks could be reused, but that was done without the need for any additional reboots (note that it could have been avoided if I'd just deleted the RAID arrays before swapping the controllers, but oh well).
Now, with most of the prerequisites out of the way, the real fun begins: setting up the distributed switches, vnics & port groups. It took a little bit of juggling to do this during business hours, but it was accomplished without issue, and once all of that was ready, it was pretty straightforward turning on vSAN and assigning the services to the corresponding networking in vSphere.
Now I could FINALLY claim the disks for vSAN. 6 on each server, half for storage and half for cache, resulting in ~10TB per host. That's not enough for everything, but it's enough to start.
And then, finally, it was time for the Great vMotioning of Q1 2025 (that's the title I picked. You like?). I was able to vMotion servers over to vSAN little by little, reclaiming additional temporary storage disks as I went. It all went pretty smoothly and, once completed, meant that we had resiliency in storage and compute, more flexibility to patch or upgrade hosts, and didn't have to worry about a server dying due to mysterious flaws in front USB ports, because we could drop a whole server and still function smoothly. I even set up DRS to move things around automatically, based on performance needs, which is another great tool.
Mission accomplished! Near total reconfiguration of the production servers without any impact to the organization.
tldr; slowly (REALLY slowly) reconfigured each host into vSAN by vMotioning VMs between datastores until I could get a quorum, then vMotioned all the VMs into the vSAN datastore and added all the temporary disks into vSAN as well.
Total cost: $5,000 for drives and HBAs + about 2 months of time picking away at it all.