Posts

Tegile disk pool metadata monitor

We recently found out the hard way that you must monitor the disk pool metadata use on pools that hold data. Previously we didn't monitor this on a regular basis but after moving a lot of data the cluster basically crashed.  The pool metadata use got to 98% and as far as our ESX hosts were concerned the disks were all full even though there was terabytes of space available.  Our servers stopped in their tracks until Tegile got the meta cleaned up which took only minutes to do but had us down for a few hours. With ZFS it seems that this metadata is the equivalent of the file allocation table on NTFS or FAT.  I'm no Solaris expert but these things are making me learn a lot, and fast. To avoid this in the future I retooled my replication report to detail the pool stats.  I plan to run this as a scheduled task three times a week.  The script is below, perhaps it can help someone else to keep their system healthy. Make sure to create the config file and plac...