We're hoping to get NagiOS in the near future but in the meantime I am wondering how other people manage their FSCs to ensure they're in a healthy state (if at all)?
At the moment, I tend to just look at standard system resource usage and the "fscadmin ./status" command but I can't really see how you can gather any meaningful information from the status command.
What do you all do (if anything) to check this?
FSC's are probably the most stable component in a 4tier installation. Monitoring the service is running and performing the "status" to ensure that the running service hasn't gone stale are likely the best you can do for an FSC. You can get a better breakdown using "cachesummary" and cachedetail". The harder components to monitor are Server Manager and the Web Tier. You may be interested in gathering metrics by running a script as a scheduled task:
cd /d %TC_ROOT%\fsc
# Status - All Up?
fscadmin -s http://ACME:4544 ./status
# Cache Details
fscadmin -s http://ACME:4544 ./cachedetail >D:\Siemens\Temp\metric_cachedetail.txt
# Filestore Status Details
fscadmin -s http://ACME:4544 ./filestorestatus/detail >D:\Siemens\Temp\metric_filestorestatusdetail.txt
# Performance Counters
fscadmin -s http://ACME:4544 ./perfcounters >D:\Siemens\Temp\metric_perfcounters.txt
fscadmin -s http://ACME:4544 ./perfcounters/reset
Naturally, you would want these metrics emailed to you or add a DATETIME stamp so that they are not overwritten with each run.
Thanks Randy, that's fantastic!
As some follow up questions if that's ok....
1) How often would you run that? and is it the "Failures" that would be the main thing you're looking for in order to judge all is present and correct?
2) In our environment we have 1x Volume server with 4x Pool servers, each with an FSC with the Volume being the master. When I look at the output from the ./perfcounters command I am seeing a massive difference in the "LocalAdminRead - Successes" on the Volume in comparison to all of the pools combined. Is that normal? i.e. 8855 in Volume right now and <60 on all 4 pools combined.
Thank you again!
Long shot but can you recommend some reading material for getting more info on:
"Specifically, to turn off FCC_EnableDirectFSCRouting and to turn on for fscgroups, exitfsc/entryfsc and connect the groups with linkparameters."
Thanks again for your reply.
Thanks @RandyEllsworth, going to read through those resources today!
As a thought on your previous comment, we current run a script that clears out our caches every morning. Would that be the reason why we're seeing so little caching going on the pool servers and why clients are forever making direct contact with the Volume?