ESXi VSAN HP MicroServer Homelab

So, for my home lab I selected the HP Microserver Generation 8 as the main building block. I have a vested interest in that I work for a HP Gold Partner and I can use this HP equipment to test many proposals I may run into at work. Mainly around the iLO management and federated services, VMware & Microsoft deployment scenarios.

The model I selected was the F9A40A which is the top of the range Xeon processor based unit. I managed to pick up one for £500 and two HP renew units for £400 each.

To keep the lab footprint to a minimum I wanted to keep the storage within the server. With a requirement for shared storage that left me with only two real options, HP VSA or VMware VSAN.

HP StorVirtual VSA can be downloaded FOC as a 1TB VSA Storage appliance with any qualifying product. I have 3 qualifying products so would be entitled to 3 licenses however I have opted for VMware VSAN as it is built into the hypervisor not sitting on top of it plus it is not capped at 1TB. Thats not to say you can't increase the capacity of HP VSA with the purchase of additional licenses. 

Thanks to my work I have access to VMware NFR keys for training purposes. Were it not for this one fact I would not have even bothered. Even dropping to an Essentials plus kit together with the VSAN per-CPU licenses would really make this home lab prohibitively expensive for most home users. 
VMware VSAN has a minimum requirement of three (storage enabled) nodes made up of at least 1 SSD (caching drive) and 1 SSD/HDD (Data drive).

Each VM is stored on at least two separate nodes with the third node acting as a witness. Displayed here is just one vCenter server appliance disk file and its copy.

This ensures that during a host failure the vmdk's exist in at least two locations so that when HA kicks in the VM can be booted.

Please also note that a VSAN enabled host consumes more RAM than one not running VSAN. An ESXi 5.5 host with no VM's at all running on it, 1 SSD and 1 HDD consumes ~3.5Gb of RAM. This may not seem much but may affect a small lab environment.

For the data disks I have selected the 4TB Western Digital Red Pro,
These are a 6Gb SATA HDD operating at 7200 RPM.

WD4001FFSX - £146.02

For the caching SSD's I have selected Crucial 500Gb MX-200's as these offer a good price vs performance breakpoint plus are just over 10% the capacity of the DATA disks as per VSAN best practices.
CT500MX200SSD1  - £130.51

To install these drives in the servers main bays I got myself 
some proper HP 2.5" drive bay adapters.
2.5" to 3.5" SATA SSD HDD Adapter
654540-001 REV B - £17.69

One other thing that needed an upgrade was the RAM.
Unfortunately the Generation 8 Microserver is limited to 16GB of RAM and as they were supplied with 1 x 8Gb of Smart memory I purchased an additional 8Gb stick of Kingston Memory for each node. This will give me a total cluster capacity of 48GB of RAM and is really the one limiting factor in this whole project. Its not much but is enough to work with and enables me to test many of the features and functions I set out to achieve.

Kingston 8GB 1333MHz ECC Module KTH-PL313E/8G - £49.99

Initially I tried to make do with the integrated B120i set to RAID0. Even after downgrading the VMware HP driver to version 88 the SSD's would randomly drop offline which thanks to the fantastic resources at discovered was due to the fact the B120i has next to no queue depth.

I bit the bullet and purchased some VMware VSAN supported H220 SAS controllers with a queue depth of 600 and the problems instantly went away.

H220 SAS HBA 650933-B21 - £110

Because the Microserver switches cannot be stacked (in the logical sense) I would need to supply each server node its own switch making up a failure domain or pod pictured on the left. I can lose a server node or an entire switch from my three node cluster and I should remain up and running.
HP PS1810-8G Switch J9833A - £82.49

The Switches have 8 ports and the servers require 3 connections, one configured as Production / Management, one as VSAN storage and the remainder is for iLO. The remaining switch ports would be used as redundant WAN connection and switch ISL's. 

Of the two production NICs each would be configured as failover for the other as depicted below.

WAN connectivity would be provided by a standard BT/Sky Infinity BB modem. This would be connected to port 1 of at least two of the Switches to ensure availability. The BT/Sky Modem doesn't support spanning tree and would just drop the STP packets. So to ensure a loop free topology I tried relying on a feature called Loop Protection that the switches do support and doesn't require the modem to support it. However in trials I was unable to get  Loop Protection to function as expected so gave up on the dual WAN connection for the time being to be revisited at a later date.

Ports 2,3,4 are used to connect to the server (2 x DATA & 1 x iLO) and port 5,6,7,8 are used as inter switch LACP links. This is quite expensive in terms of port count but the link between the switches must match that or better that of the theoretical maximum of the server or it will be a bottleneck. This is by no means optimal but does make the most of what I have to work with. 

The firewall is a VM and therefore sits on the same resilient infrastructure as everything else. Port 1 on each switch is in VLAN 200 and this is passed through to the pfsense VM. In the event a host fails this will simply boot up on another node and still access the modem via an alternate switch. For now I will have to repatch the modem port one to another switch untill I can fix the loop protection issue. pfsence supports multi WAN so if I did have a second WAN connection this could be factored into the design. I am also an advocate of pfsence as I created the integrated CODE-Red themed interface for the distribution.

Total spend so far excluding any software - £2,910.10p

Update June 2015:

I have upgraded to ESXi v6 from v5.5 on the fly with no issues. One thing to be aware of is that it will upgrade the on-disk format from version 1.0 to 2.0. If you only have three nodes in your cluster (as I have) you need to run the upgrade with the following command, vsan.v2_ondisk_upgrade --allow-reduced-redundancy.
The re-striping will take hours and during this time you will be vulnerable because all copies of the VM's will be migrated to the remaining two nodes,

modified the wiring diagram from that depicted previously to make the most of the few ports I have available. I have two ports from each switch configured as a trunk, giving me a 4Gb backplane (2 x 1GB transmit & receive). I then have 1 port on the two end switches, completing the loop, configured for Rapid Spanning Tree (RSTP). The trunk naturally has a lower metric so is the prefered route but if the middle switch goes down  the RSTP link comes up and keeps the cluster up, albeit at reduced network capacity. 

Yes my WAN connection is non-redundant but this gives me four spare ports to devote to connectivity outside of the cluster. This is a design compromise but we have to be sensible. If I get a modem that supports RSTP I may steal one of the spare ports back.

And for those who are interested here are some snaps of my install.

P1080783  P1080794
P1080793  P1080806  P1080804
P1080805  P1080802

Update February 2017:

Just a little update to let you guys know all is still going well. Now running ESX v6.0U2. I don't think I will bother going to 6.5 as I don't think the extra hit on the limited RAM is worth it.
SInce the last update I have picked up a fourth unit. Rather than adding this unit to the cluster as a 4th node I have configured it as a single node DR cluster.
This node has 4 x 1TB HDDs reclaimed from an old server hanging off a H220 controller. On top of this I am running 4 x 1TB HPE VSA's offering out an iSCSI SAN to a Veeam Virtual backup server running on the same host. This host (ESX 4) can then not only backup the VM's on the main cluster but can also restore VM's locally to a DEV iSCSI datastore presented to the host 4.

A user (Dany) of the HomerServer Show forum asked how I installed the lights in the switches so the following images will explain how.

First obtain some RGB LED strips from ebay. You can buy a meter for a few pounds but get good quality ones as I've had the blue LEDS burn out already.
These strips can be cut down at intervals, every 3 LED's. Cut a strip 12 LED's long. Then solder a short Black, Red & Blue wire on them, these correspond to Black (+) Red LED (-) & Blue LED (-).
If I were to wire up the green LEDs as well the combined light would appear white! but I want pink/purple. Then stick it into the inside front of the switch with 3M tape.

postimage  postimage
Inside the switch the power socket on the back is fed to the front of the circuit board via a lead and plug. The bottom of this socket is where we tap the 12v from.
Plus (+) is the most inner pin with minus (-) the nearest to the corner of the board. We attach the black wire to the (+) pin and the red & blue wire to the (-) pin.

postimage  postimage
Replace the board, but before screwing it all back together test the lights by pluging in the power to the switch. If all is good put the switch back together.

You may have noticed a little something stuck to the side of the second host in the first update image above. Well this is a really cool little Raspberry pi with a little TFT screen attached to it. This not only runs as a USB server for the UPS below but also a Plex media server. So now you know!