SNMP + FMd : english version
This post follows this one and is a translation of what I wrote here.
It does not exist (yet) MIBs for ZFS, and particulary to check failed disks. This is quite annoying. Hopefully, Solaris has (since Solaris 10) Fault Manager.
Quick FMd
fmd in 3 commands :
- fmadm faulty : lists the problems (and their UUID)
- fmadm repair [UUID] : marks the problem as repaired
- fmdump : dump problems list, including repaired ones
Installing SNMPd
pkgadd -d [repsdespackages] SUNWsmcmd SUNWsmmgr SUNWsmagt
Run snmpconf (with the -i switch) to setup easily the behaviour of the daemon.
and of course :
svcadm enable sma
SNMPd & FMd
Add into /etc/sma/snmp/snmpd.conf :
dlmod sunFM /usr/lib/fm/amd64/libfmd_snmp.so.1
to activate the snmp module for fmd
and then restart sma :
svcadm restart sma
Please note that the path is arch dependant (x86 64 bits here)
Crash test it
# prepare a file based zfs pool
mkdir crash
cd crash
# Files must be > 64M
dd if=/dev/zero of=pool1 bs=1024k count=64
dd if=/dev/zero of=pool2 bs=1024k count=64
dd if=/dev/zero of=pool3 bs=1024k count=64
# create the pool
sudo zpool create crashtest raidz /home/nico/crash/pool1 /home/nico/crash/pool2 /home/nico/crash/pool3
# break it
rm pool3
# scrub it (to be sure that the system sees the failure)
sudo zpool scrub crashtest
# check that fmd does its job
sudo fmadm faulty
Now, let’s see what informations we get with SNMP :
snmptable -v2c -c public 127.0.0.1 SUN-FM-MIB::sunFmProblemTable
| sunFmProblemUUID | sunFmProblemCode | sunFmProblemURL | sunFmProblemDiagEngine | sunFmProblemDiagTime | SunFmProblemSuspectCount |
| “96397f16-1cea-463b-e9db-de989cd42e81” | ? | ? | ? | ? | ? |
The module exports 4 tables : sunFmProblemTable, sunFmFaultEventTable, sunFmModuleTable, sunFmResourceTable
the easiest way is to use snmpwalk :
snmpwalk -c public -v 2c 127.0.0.1 SUN-FM-MIB::sunFmProblemTable
SUN-FM-MIB::sunFmProblemUUID.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: “96397f16-1cea-463b-e9db-de989cd42e81″
SUN-FM-MIB::sunFmProblemCode.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: ZFS-8000-D3
SUN-FM-MIB::sunFmProblemURL.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: http://sun.com/msg/ZFS-8000-D3
SUN-FM-MIB::sunFmProblemDiagEngine.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: fmd:///module/zfs-diagnosis
SUN-FM-MIB::sunFmProblemDiagTime.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: 2008-2-21,12:31:2.0,+1:0
SUN-FM-MIB::sunFmProblemSuspectCount.”96397f16-1cea-463b-e9db-de989cd42e81” = Gauge32: 1
Nagios integration
See this post.
See also
All this stuff is based upon this excellent post.