Graph EVERYTHING with Graphite!
I believe that a graphing tool should be flexible enough for the user to decide what metrics are important to their environment, and it should be able to accept metric data from any source.
I call these “graphing” tools, because I don’t consider them “monitoring” tools, like Nagios, WhatsUp, or Zabbix. Here is are my thoughts on graphing/monitoring:
Shows trends over time
Contains historic data
“Will run out of disk space in 3 days”
“Ldap queries up 200% in 6 months”
Alerts when thresholds are crossed
States are definite: Ok/Bad/Warning
Notifications are actionable
“OMG OUT OF SPACE”
“LDAP IS DOWN”
Any good engineer knows you need both functions. Zabbix, WhatsUp, and some others handle both functions. Splunk can do threshold detection and make graphs too, but it ain’t no Nagios. Personally I like tools that do one thing, and one thing well. I am not after some sort of monitoring super-tool, and I don’t believe that #monitoringsucks.
What Makes Graphite Awesome
1: So many ways to get data into it
Graphite does not have any kind of polling daemon. It just accepts incoming metrics, it is up to you how you want to get them there, and what to call them. It enforces no naming convention. All it needs you to do is echo your metricname, value into tcp port 2003, it does the rest.
Ways you can get data into Graphite:
Push or pull from your munin-nodes
Pull from your existing Ganglia install
Integrate sending metrics directly into your application with statsd
"mycoolmetric.rate 9000date +%s
|nc graphite.server 2003
Extract metrics from your log files with logster
2. The only limit to visualizing your data is your imagination
Unlike other graphing systems where the style of the graph is pre-determined, Graphite also does not enforce any kind of convention for making its graphs. It is more like a graph api. Composing your graphs is a manual process, and requires a bit of creativity.
Graphite does not present the user with pages and pages of pre-rendered graphs like Ganglia, Munin, etc.
What kind of graphs can you make with Graphite? Lets see some cool examples:
Etsy’s php warnings correlated with code deploys:
Filtering out deviant server loads:
You can see that graphite fits my initial principles. It can suck in metrics from anywhere, and create graphs to fit your imagination. Combine this with a crazy amount of functions that you can apply to your metrics, the sky is the limit!
Cool graphs for you to make:
Ratio of shopping cart items to unique visitors?
Overlay your average page load time with your code deploys?
Graph your average rate of change of your storage?
Plot the mortality rate of your EC2 Instances?