Gartner Blog Network

Monitoring software sucks so I use Nagios, what’s a better approach?

by Jonah Kowall  |  February 6, 2014  |  80 Comments

Speaking to our clients, and other people at conferences and industry events I attend, Nagios is always top of mind. This is a battle covered many times, many people want to use or reduce the usage of Nagios. The question always comes up, what else is good for free? The answer to this question depends on how much expertise you have in managing infrastructure, and what level of monitoring you’d like to do. Open source monitoring requires the use of configuration management tools (chef, puppet, salt) to scale and control the consistency. This requires some level of expertise.

Most users of Nagios use it for basic health monitoring of servers and applications, and I’ve spoken about other low cost tools which build on the open nature of Nagios and leverage the massive and vibrant community. There are plenty of great open source alternatives out there which work, here are a few options:

Quick and easy:

  • PandoraFMS – This project out of Spain is growing in popularity amongst Gartner clients with an easy to implement and configure product. The solution is open source and free, but also has commercial support options if desired. The UI is modern and fresh along with agents or agentless monitoring capabilities.
  • Icinga – Most often compared with Nagios the product shares many open source components, but also includes a more advanced web interface, search capabilities, and better enterprise integration for permissioning and authentication. It’s a bit more complex in terms of getting reporting and other capabilities, but this is free software, work is required. The product is shipped as software or via virtual appliance, it’s worth checking out.
  • Spiceworks – Windows only product, but this freeware provides good basic functionality in the monitoring space, which should serve the needs of many in monitoring of servers, network devices, and other components. The product can’t scale very high, but for SMBs this is a good option.
  • Zabbix – This popular server monitoring product is also free, with commercial support options. The product has more legacy components due to it’s age, but is under active development. This is an improvement over Nagios, but there are better options available.

Needs more time in the oven:

  • Naemon – If you like the Nagios model and configuration (I have no idea why people like it…) then Naemon is the next generation, it’s a new project with time before its mature enough. The offering will include an enhanced GUI (Thruk), removal of legacy components, and a highly scalable engine for the future. OP5 is behind this project, (a Swedish company with a greatly enhanced commercial version based on Nagios) and is funding much of the development. UPDATE: OP5 lets employees work on many open source projects on company time, so the sponsorship is not as direct as it may sound. The most important contributor will be Andreas Ericsson the talent who wrote over 69% of Nagios code in the last 12 months, and works for OP5. This project is one to watch!
  • Munin – This open source product has promise, but needs a bit more development effort to catch up with those above. The advantages of the product are a fully functional and easy agentless implementation.

Cutting Edge:

If you are operating a web-scale infrastructure and dealing with monitoring of large numbers of devices, and wish to have a fully extensible monitoring system to collect not only system metrics, but also custom application metrics I would suggest the following technologies:

  • StatsD – Generic metric collector (can easily collect application metrics or even real user monitoring metrics directly)
  • Collectd – System metric collector
  • Graphite – Backend for metric storage

Some of my favorite visualizers for this data:

  • Descartes
  • Graphsky
  • Graphene
  • Giraffe
  • Orion
  • Tasseo

Please level comments or chat on twitter.

Additional Resources

100 Data and Analytics Predictions Through 2024

Gartner’s annual predictions disclose the varied importance of data and analytics across an ever-widening range of business and IT initiatives. Data and analytics leaders must consider these strategic planning assumptions for enhancing their vision and plans.

Read Free Gartner Research

Category: analytics  apm  data-and-analytics-strategies  devops  eca  it-operations  monitoring  

Jonah Kowall
Research Vice President
3.5 years with Gartner
20 years IT industry

Jonah Kowall is a research Vice President in Gartner's IT Operations Research group. He focuses on application performance monitoring (APM), Unified Monitoring, Network Performance Monitoring and Diagnostics (NPMD), Infrastructure Performance Monitoring (IPM), IT Operations Analytics (ITOA), and general application and infrastructure availability and performance monitoring technologies. Read Full Bio

Thoughts on Monitoring software sucks so I use Nagios, what’s a better approach?

  1. Milos Gajdos says:

    Try sensu man -> Sensu is awesome for not only Cloud monitoring!

    • Jonah Kowall says:

      I’ve heard of it, but I have yet to speak with anyone using it in a production environment. If you have someone I can speak to about that please let me know, I’m interested to learn more and also test it in my lab.

  2. Jason Dixon says:

    We should grab a beer sometime and brain dump on Monitoring. Any chance you’re heading out for this May?

    • Jonah Kowall says:

      I would have gone, but I learned of it in January (too late). My calendar books up 6 months in advance and I am already committed. If I can possibly know in advance I would block off the 2015 dates…

  3. Jason Dixon says:

    Sorry, we don’t have any events planned beyond PDX right now. Is there some other monitoring event happening that week that I’m unaware of?

  4. Jonah Kowall says:

    Nope, I have other client commitments which require travel, and it will not be in that area.

  5. GP says:

    Jonah – thanks for this nice collection of open source monitoring software. I wonder though if we really need both monitoring and logging? Given the recent advent of stream processing I wonder if real-time event collection and processing = monitoring? Do we really need both monitoring and logging? These add a tremendous cost and complexity. See blog post below…

    • Jonah Kowall says:

      Don’t agree with some of the concepts in that blog post, but yes you do need monitoring and logging. With monitoring agents you can parse and handle major exceptions in logs. A better approach is using centralized collection and analysis for real time alerting and troubleshooting combined with lightweight monitoring (no agents). You still need the logs regardless of the complexity involved, in many way handling a heavy monitoring agent is more overhead than logs.

  6. Heya Jonah.

    Nice of you to mention Naemon as a project to watch 🙂

    There are a few fact faults with the article though, so if you could correct that, it’d be awesome.

    op5 is and isn’t behind the Naemon project. They sponsor it by letting me and my colleagues work on it on company time (just as we work on other opensource projects on company time), but Naemon would have kicked off even without op5’s support.

    The twitter account linked to isn’t mine. I haven’t got one yet and probably never will.

    The name of the user interface is Thruk (not Thurk). I suppose typos happen even professionals.

    In terms of commits, I wrote more than 96% of the changes that took Nagios from v3 to v4, and not 69%. I have no idea how much that turns out to be in terms of actual code though, so perhaps I’m wrong in correcting that.

    On behalf of the 4-man stron Naemon team, I hope we won’t disappoint you 🙂

    • Jonah Kowall says:

      No problem at all, I have clarified the op5 item. Of course we understand many of the Nagios based products will eventually switch to Naemon.

      I have fixed/removed the twitter link, and the typo.

      The 69% is the current commit count based on the stats, of course you haven’t been working post Nagios v4 and others have been working on the v4 fixes.

  7. GP says:

    Jonah – thanks for your comments back related to logging and monitoring. Just curious on what specific points make you disagree – here is the way I understand it –

    1. A log is a localized persistence mechanism for collecting events.

    2. If a real-time event collection mechanism that is light weight and efficient is implemented, then there should be no need to have a separate logging construct.

    3. The event capture and processor, will determine whether the event requires to be persisted into a HDFS or like store.

    The LinkedIn engineering team did a good in explaining the concept of the log based data pipeline construct implemented using Kafka and SAMZA a stream processor.

    Overall, I agree with your assertions – 1) the monitoring construct needs to be lightweight 2) there is a need for logging — however currently most IT organizations implement 2 separate solutions. But the overall point is that you don’t have to implement two separate systems – one will do it.

    You can read more…


    Thanks again,

    • Jonah Kowall says:

      Here are the basics:

      Log != event
      Logs can contain many non-event based data points which are useful in the future, or may become useful in the future.

      Engineering your own log collection and analysis system covers the top .5% of users who need that technology. Most clients I speak with cannot engineer their own systems, hence they rely on log analysis products which are purchased versus developed. You are also assuming that users have developers writing the apps which are logging, and that’s very often not the case.

      The reason why monitoring and logging are separate in most cases is the monitoring tools don’t do the type of log analysis people want today, they do the log/event analysis people wanted in 1995.

  8. GP says:

    That is well said. The current crop of monitoring software have some serious challenges hence the interest.

  9. […] Gartner Analyst Jonah Kowall has shared an interesting list of monitoring software alternatives to Nagios. Read more by clicking here…… […]

  10. Felix Egli says:

    Icinga and Naemon are no alternatives to Nagios. They are Nagios with a new web-gui, and some other stuff added. The problem with Nagios is not only the web-interface. The major problem with Nagios is the so called Nagios Core, which is still present in Icinga and Naemon.
    It’s a bad approach to add things to something which is broken. It’s much better, to replace the broken part, which is Nagios Core.

    • Jonah Kowall says:

      Incorrect Felix, have a look at the core changes to the Nagios engine in both projects. I would also read the op5 blog on Naemon as well.

  11. Felix Egli says:

    Jonah, at least with Naemon I’m right for sure. Maybe you’r right with Icinga, but to me it looks like the changes are mainly in the GUI and not in the core.

    This is the Naemon changelog, and the only big change is that the CGIs are replaced with Thruk:

    0.8 – 14 Feb 2014

    Based on nagios 4.0.2
    Rename a lot of things, replace build system, etc.
    The CGIs are gone – use Thruk instead.
    Remove the upstream version check – use your package manager instead.
    New NEB callback, NEBATTRCHECKALERT, when a check generates an alert.
    Allow contactgroups without members but having contactgroup_members.
    No longer spam Naemon log when checks time out.
    All positive values for ACKNOWLEDGE_{HOST,CHECK} means TRUE.
    Check output parsing rewritten.
    Fixes crashes, bugs, and improves performance.
    Log rotation is done by logrotate instead of in-core log rotation.
    Fix misc crashes, speed up misc areas, and other bug fixes.

  12. Regarding the broken part of Nagios Core – there are many different opinions in the outside world how to handle events / check results / performance data / etc, and also move away from the state based alert model towards something new.

    In the end, you’ll decide for the 2 sides of the “let’s do something new” story: 1) throw away old crap and introduce new stuff 2) stay compatible with your product line & users

    Icinga 1.x Core couldn’t throw away much old crap without breaking compatibility. Rewriting the code inherited from Nagios as a fork – well, that code base is a glory mess. Andreas is a hero for rewriting THAT 😉 In terms of better UI (even Classic UI with multiple commands and live search … ) and usability it’s imho much better than Nagios ever was. You may read here, what’s different between Nagios and Icinga (beware – I wrote it):

    Still, it’s not satisfying in terms of improvements, and also throwing away old crap. In terms of “old crap” I’d just say: Try to setup notifications for a service for a user with 3 different types of notification methods: mail, sms, jabber with different notification options & additional escalations.
    It’s just one of those examples, and it’s not only the configuration syntax which sucks (well, it’s easy to write/parse, but hard to fix imho) but rather the handling inside the core too.

    I’m not saying that Icinga 2 Core will solve all the problems we encountered in the past 5 years after forking Nagios, but still someone gotta do it and start from scratch. I find C++ and Boost very convenient to focus on a real architecture rather than cutting my fingers with bloody C, but that’s a religious discussion.
    Users won’t love the new configuration format, or they will. The native cluster stack targets large scale environments, but also introduces capabilities for a better protocol (SSL, IPv4/6, JSON-RPC) among checkers and agents.

    Still, the state change alert notification model is implemented as such. That’s the matter of compatibility and supporting interfaces like status file, DB IDO and livestatus. And not to forget – the plugin API. Without that one Nagios would have never gained so much attraction in the first place – plugins need to be written, run & developed.

    In the end, the community will decide. Chose whatever fits best. And if your Icinga 2 Core writes natively to graphite, Puppet & Foreman generate the configuration, Logstash triggers additional alerts, and Sensu is running there too. Why not – it’s all open source, and the better competition we keep the more the community will benefit from it. Grab some beers at conferences (OSMC 2014!) and have a chat about your systems – you’ll truly find new friends & ideas 🙂

    • Jonah Kowall says:

      Good stuff here, sorry I’m going going to be at OSMC, but I will be at OSCON and Velocity this year… Feel free to email or tweet to get in touch.

  13. gilgamezh says:

    Hi! great post and great comments. 🙂

    One question. What tool would you use to generate alerts from graphite data?
    For example, you have a graphic with info from the login page of an application (success and errors), then you want to fire an alert (in your monitoring panel, send mails, etc) when the Q of errors is over definite threshold.

  14. Jonah Kowall says:

    Cabot is the most common tool for that purpose.

    Other options:

  15. […] Monitoring software sucks so I use Nagios, what’s a better approach? […]

  16. George says:

    Has anybody tried/evaluated (or even better installed and used in a regular basis) PandoraFMS? If yes, what’s your opinion about it? I ‘m seriously considering it for our company instead of the typical Nagios/Nagios based solutions (Icinga, Opsview, etc) plus addons (graphite, PNP4nagios, etc) or Zabbix, ZenOss, OpenNMS, etc

  17. Jonah Kowall says:

    Yes, I’ve spoken to several Gartner clients using PandoraFMS. It’s a newer product and hence has lower adoption, but the feedback I have gotten is positive generally speaking. The company behind it is rather small, and being based in Spain there are some language issues with the product and documentation. I would suggest evaluating the product as I have, but I also suggest looking at OpsView, OP5, Zenoss. I’m not the biggest fan of OpenNMS for various reasons, but I’ve spoken to happy clients using the product.

  18. George says:

    Hello Jonah. Thanks for the answer. Good to know that generally there is some positive feedback by other IT colleagues with regards to PandoraFMS. I know that the Pandora team is located in Spain, but for me this is not a problem (I ‘m located in Europe as well :p). Knowing this I expected things to be much worse when it comes to the language barrier, etc. but it doesn’t seem to be an issue after all. I ‘m really keen on evaluating the product by setting up some kind of a Proof of Concept.

    From the rest of the names, I contacted the Opsview and the Zabbix people (yes, I know Zabbix is open source, but I preferred to have an actual meeting with the Zabbix people. we ‘ve taken really seriously the monitoring infrastructure thing in my team). OpsView is really expensive for our case and our needs, so I would consider out of the picture. I guess that OP5 and Zenoss are not very different with their pricing policies, but are considered as well. Let’s see where the ball will land at the end :p

  19. Jonah Kowall says:

    George, cool. I’m surprised they are too expensive considering the pricing is quite reasonable, I guess you’re hoping for something free. Be warned that you’ll have to do a lot of lifting with config mgmt (chef, puppet, salt, etc) to get the other tools to work properly at scale.

  20. Hi everyone.

    Congratulations Jonah for the post.
    I took some time surfing the net to find the tool easy and comfortable monitoring to monitor my hosts. The first place where I met the Pandora FMS tool was in this blog, downloaded from their website and in the 2 weeks that I testeandola far I’ve managed to configure everything I needed without take me headaches. I recommend it. My previous experience had been Nagios, and all icmp checks were in critical condition, when the machines were ok, the use of “agents” in Nagios is very difficult, in order not convince me. Then I tried Zabbix and the experience was better, but the solution did not offer me everything I needed and now with Pandora FMS experience proved to be correct and final. We are thinking of implementing the Enterprise option to have available their characteristics as some we can be useful and the price is really good.

  21. Tom Kahl says:

    Hi Jonah,

    Have you heard of and or used LogicMonitor? If yes, I what did you think of their stuff?

  22. Jonah Kowall says:

    Tom, we’ve included them in other research. If you are a client I can go much deeper via a written inquiry or a phone inquiry.

  23. George says:

    Hi Jonah,

    Regarding the price issue: It doesn’t need to be necessarily free as in beer. In any case you need to invest time and effort, so you need to balance at the end what you want. And apparently I ‘m not saying anything new here. Anyway, I wouldn’t like to comment any further in this open forum why we considered the price offered from the vendor I mentioned in my previous post too expensive.

    Coming to PandoraFMS now: I ‘ve set it up as a demo and I have the same feelings as expressed by Marius above. I think that even the open source version would suffice in many cases. If one needs only one tool to do the job then the enterprise version may be worthwhile to pay for some of its features (eg, VMware, NetFlow monitoring, etc).

    At the end of the day, more or less you can achieve the same results with any of the popular monitoring tools. I read the religious wars about Nagios (and forks, eg Icinga, Fully Automated Nagios, etc) here and in other sites with some people referring to it as the holy grail, but I ‘m not into it. There are other good tools out there open source or not. Fortunately, we are not in the 90s 🙂

  24. Timir Karia says:

    Anybody have any experience with Monit? We’re trying it out now and it’s fairly simple and seems stable but we have yet to put it through its paces. Any feedback appreciated.

  25. Give NetCrunch from AdRem Software a try. It’s fully automated, and will identify, configure and begin monitoring your network out of the box. It’s all-in-one, with no separate modules for network performance monitoring, server and app monitoring, NetFlow, etc. It’s agentless, and has an embedded SQL database. The goal first and foremost is ease-of-use.

  26. Patrick says:

    Thanks for this great post, Jonah!

    May I know their comparison in term of multi-tenancy support, says running under different projects/tenants of AWS/Openstack etc and allows invidivual projects to monitor only their resources under a centralized monitoring system?

    • Jonah Kowall says:

      Thanks for posting Patrick, I can’t provide custom advice on my blog. That’s what clients pay us for 🙂

      There are tools out there which have multi-tenant support specifically for CSPs and MSPs.

  27. Gerry Johnson says:

    Sorry – seems as though you are conducting a vendetta against Nagios

  28. Jonah Kowall says:

    Gerry, see my other reply. I do believe Nagios needs to die in it’s current form as do many others!

    This is another great presentation on the topic from this year:

  29. […]… […]

  30. Pablo Huiza says:

    Hi, We are evaluating some monitoring tools and we prefer Groundwork. I would like to know if somebody from this forum is using it or know something about it ..Thanks

    • Jonah Kowall says:

      Pablo, we would be pleased to speak with you about GroundWork and the Unified Monitoring space if you are a client. If you are looking to get suggestions from GroundWork users I’d suggest asking them for a reference or you can look at the GroundWork forums here : or on LinkedIn.

  31. David Luiz says:

    For quick and dirt monitoring of network services (Sockets, DB, Mail, General Services), try Meerkat-Monitor ( It’s free, standalone and allows integration with CI tools (and scripts).

  32. John Ramos says:

    Really great tool meerkat-monitor!! Thanks for the tip David. Finally something really easy to get the job done!

  33. Ed Taylor says:

    How well do these solutions provide service level management? Were looking for a solution that will monitor process and service executables combined with other metrics (like cpu, memory, storage, and more) across application and data base servers including the routers and switches that connect them, when combined equal a service we provide to our customers. We want to be alerted, when we’re having a problem before the phone starts ringing. We’ve use Zenoss and BMC Truesight to some extent, but we’re looking at other like SolarWinds hoping to find the best of breed. NetCrunch looks promising, but I don’t see monitoring at a service level being provided.

    • Ayad Aftab says:

      PandoraFMS has just the solution for that. They call it Business Service monitoring (BSM). You can define your services and create service maps which then monitors all the components in the chain of the service. Any component going down will automatically trigger an alert for the service.

  34. Charlie Liang says:

    One of the reasons Nagios sucks is because it’s way too noisy. But that doesn’t necesarily mean you have to stop using it.

  35. Kathy says:

    Has anybody looked at Incident.Moog from Moogsoft vs BMC TrueSight?

  36. […] Monitoring software sucks so I use … – Jonah – thanks for this nice collection of open source monitoring software. I wonder though if we really need both monitoring and logging? Given the … […]

  37. peli says:

    Everything is very open with a precise clarification of
    the issues. It was definitely informative.
    Your website is useful. Thanks for sharing!

  38. hiện như thế nào những điều mà người đẹp đặc

  39. điều mà cô gái trẻ đáng yêu Luka Ichinose

  40. I would have gone, but I learned of it in January (too late). My calendar books up 6 months in advance and I am already committed. If I can possibly know in advance I would block off the 2015 dates…

  41. Best post. Thank you. Any house painting in homeservize

  42. چت روم says:

    Your site is very content.

  43. your content is really good
    نظافت منزل
    house cleaning

  44. i like this article and suugest it

  45. گپ says:

    Thank you for nice web site.

  46. Hi. Very good contents to web site.
    The best posts.

  47. kharid chador meshki ba foroshgahe khane hejabe sadaf
    forooshgahe khane hejabe sadaf anvae chador meshki ra erae midahad
    خرید چادر مشکی به صوت آنلاین از خانه حجاب صدف این امکان را برای شما فراهم کرده است تا در هرکجا که هستید بتوانید از مرکز معتبر چادر مشکی خود را تهیه کنید.

Leave a Reply

Your email address will not be published. Required fields are marked *

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.