by Jonah Kowall | February 14, 2014 | 12 Comments
There is a common issue I deal with when speaking to end users trying to monitor applications. This confusion is partially created by vendors who would like to position themselves in the hot APM market, yet they clearly don’t enable performance monitoring. These vendors are slowly starting to correct the messaging, but many have poor market understanding, and continue to confuse buyers.
There are two types of monitoring technologies, one is availability monitoring and the other performance monitoring. Before embarking on a performance monitoring journey (this applied to both application performance monitoring and network performance monitoring) there should be a good foundation of availability monitoring. Availability monitoring is the easier of the two, it’s inexpensive, effective in catching major issues, and should be the staple of any monitoring initiative. We recommend unified monitoring tools (See Post : http://blogs.gartner.com/jonah-kowall/2013/11/12/unified-monitoring-note-presentation-and-client-interest/) to handle availability monitoring across technologies with a single offering.
When looking at server monitoring tools, they do more than monitor the server and OS components, but also handle the collection of data from instances of applications on the OS instances. The data collected includes metrics and often times log data which shows major issues in application availability or health. This is often what people are looking for, and many vendors call these requirements “APM”, but that’s incorrect. We call this server monitoring and/or application instance monitoring. These are availability tools and not performance monitoring tools.
The area APM tools differ from server monitoring tools is in multiple ways. APM tools live within the application and provide end user experience data from the user through the distributed application components. They are able to monitor and trace transactions across tiers of the application. Similarly other tools which monitor application performance can reside on the network, while these don’t have the level of granularity when tracing transactions and getting internals of applications they certainly can detect performance deviations of application components and often times can handle other application technologies.
Hopefully this helps clear things up, and please reply here or contact me on Twitter @jkowall
Category: APM IT Operations Logfile Monitoring NPM Tags:
by Jonah Kowall | February 6, 2014 | 36 Comments
Speaking to our clients, and other people at conferences and industry events I attend, Nagios is always top of mind. This is a battle covered many times, many people want to use or reduce the usage of Nagios. The question always comes up, what else is good for free? The answer to this question depends on how much expertise you have in managing infrastructure, and what level of monitoring you’d like to do. Open source monitoring requires the use of configuration management tools (chef, puppet, salt) to scale and control the consistency. This requires some level of expertise.
Most users of Nagios use it for basic health monitoring of servers and applications, and I’ve spoken about other low cost tools which build on the open nature of Nagios and leverage the massive and vibrant community. There are plenty of great open source alternatives out there which work, here are a few options:
Quick and easy:
- PandoraFMS – This project out of Spain is growing in popularity amongst Gartner clients with an easy to implement and configure product. The solution is open source and free, but also has commercial support options if desired. The UI is modern and fresh along with agents or agentless monitoring capabilities.
- Icinga – Most often compared with Nagios the product shares many open source components, but also includes a more advanced web interface, search capabilities, and better enterprise integration for permissioning and authentication. It’s a bit more complex in terms of getting reporting and other capabilities, but this is free software, work is required. The product is shipped as software or via virtual appliance, it’s worth checking out.
- Spiceworks – Windows only product, but this freeware provides good basic functionality in the monitoring space, which should serve the needs of many in monitoring of servers, network devices, and other components. The product can’t scale very high, but for SMBs this is a good option.
- Zabbix – This popular server monitoring product is also free, with commercial support options. The product has more legacy components due to it’s age, but is under active development. This is an improvement over Nagios, but there are better options available.
Needs more time in the oven:
- Naemon – If you like the Nagios model and configuration (I have no idea why people like it…) then Naemon is the next generation, it’s a new project with time before its mature enough. The offering will include an enhanced GUI (Thruk), removal of legacy components, and a highly scalable engine for the future. OP5 is behind this project, (a Swedish company with a greatly enhanced commercial version based on Nagios) and is funding much of the development. UPDATE: OP5 lets employees work on many open source projects on company time, so the sponsorship is not as direct as it may sound. The most important contributor will be Andreas Ericsson the talent who wrote over 69% of Nagios code in the last 12 months, and works for OP5. This project is one to watch!
- Munin – This open source product has promise, but needs a bit more development effort to catch up with those above. The advantages of the product are a fully functional and easy agentless implementation.
If you are operating a web-scale infrastructure and dealing with monitoring of large numbers of devices, and wish to have a fully extensible monitoring system to collect not only system metrics, but also custom application metrics I would suggest the following technologies:
- StatsD – Generic metric collector (can easily collect application metrics or even real user monitoring metrics directly)
- Collectd – System metric collector
- Graphite – Backend for metric storage
Some of my favorite visualizers for this data:
Please level comments or chat on twitter.
Category: Analytics APM Big Data DevOps ECA IT Operations Monitoring Tags:
by Jonah Kowall | January 20, 2014 | 1 Comment
Yearly Ronni Colville has been publishing a great note on the ratings and state of the broader ITOM market. As of late 2012 we introduced a sub-slice of her research focused on availability and performance specifically, which often consist of a large number of integrated or partially integrated tools. As many of you are aware we often see the buyers looking between suites, mini-suites, and specialist tools. We also see vendors moving from offering point solutions to bundles of tools and suites by either doing engineering work or acquiring and integrating offerings together. The broader note is found here (clients only):
16 January 2014
Aside from some great analysis it also includes a rating chart which indicates categories of IT Service and Support Management, IT Asset Management, IT Financial Management, Event Correlation / Business Service Management, Application Performance Monitoring, Network Fault Monitoring, Network Performance Monitoring, IT Service View CMDB, IT Service Dependency Mapping, App Release Automation, IT Process Automation, Server Provisioning and Configuration Management, Network Configuration and Change Management, Endpoint Management, Cloud Management Platform, Mainframe Management. The vendors include BMC, CA, HP, IBM, Microsoft, and VMware. The ratings are as follows:
In the drill down version of this document specific to availability and performance monitoring there is analysis of the trends and directions of the market (Client only).
We took a look at the following categories in the ratings Business Service Management (BSM), Event Correlation and Analysis (ECA), IT Operations Analytics (ITOA), Capacity Planning, OS Monitoring (Linux and Windows), Network Fault, Network Performance, Application Performance Monitoring (APM) across end user experience monitoring, transaction monitoring and mapping, and deep dive. Additionally we look at availability and performance monitoring in depth for Databases, Storage, Virtualization, Public Cloud services. Finally we look at the ability to deliver products as a service (SaaS), and the integration of the portfolio.
Similar to the note above we rate on the same scale, but we also build trend lines into the research.
The vendors reviewed in this note include BMC, CA, HP, IBM, Microsoft, Oracle, Dell (Quest), VMware, Compuware, EMC, Ipswitch, ManageEngine, NetScout, Riverbed, SolarWinds, and Kaseya.
Happy new year, and keep reading and commenting.
Category: Analytics APM BSM ECA IT Operations Monitoring NPM SaaS Tags:
by Jonah Kowall | January 20, 2014 | 1 Comment
This nostalgic note was my first published when I started at Gartner nearly 3 years ago. Vivek Bhalla has taken over as the lead of this research, updating and spending more time on this research. Vivek and I see quite a number of calls on Citrix specifically, so this research is still relevant, and needed a fresh coat of paint. This topic of monitoring various Citrix technologies has been pertinent as Citrix popularity has continued, and the use cases have changed including more deployments of HVD on top of Citrix. The challenges come from the complexity of the environments and protocols, finally the tools needed to monitor and understand these tools are different than other technologies. We highlight some of the different approaches needed to understand availability (health), and performance of the components and associated infrastructure required to ensure acceptable user experience. We highlight tooling across multiple vendors for both use cases:
Infrastructure monitoring from AppEnsure, AppSense, Citrix, Compuware, eG Innovations, Fluke, Goliath Technologies, Lakeside Software, Netscout, JDSU (Network Instruments), and Riverbed.
Synthetic availability monitoring from Citrix, Compuware, eG Innovations, Lakeside Software, Login Virtual Session Indexer, HP, IBM, and Teveron.
Load testing tools from Citrix, HP, IBM, Login Virtual Session Indexer, and Tevron.
Finally the most advanced tools which track actual end user experience covered include those from AppEnsure, Citrix, Compuware, Extrahop, Goliath Technologies, Liquidware Labs, and Riverbed.
You can find the research here for clients only:
16 January 2014
Category: APM IT Operations Monitoring NPM Tags:
by Jonah Kowall | December 27, 2013 | 4 Comments
Cameron Haight (@cameron_haight) and myself recently published research on how monitoring is applied to web-scale environments. Companies such as Amazon, Google, and Facebook run their environments using different fundamentals than typical enterprise IT organizations. This includes changes in infrastructure, management software, and the applications running on the infrastructure (among many other things including people and process which we don’t get into in this research).
In this research we cover some of the core fundamentals of both open source and commercial software systems which can support and often times are built with the same fundamental differences that distinguish web-scale environments. Many of these elements have to do with eventual consistency, size/scale, volatility, and the required performance of the applications which customers/consumers demand.
Further in the research we investigate the different ways data is collected, and once collected the elements of visualization, and analytics done by the user and the software to bring forth meaning in the vast amount of data collected.
We were able to build a presentation at the recent Gartner Data Center Conference in early December (in Las Vegas) where we converted this content and material into a presentation which looked at similar topics. We did a bunch of polling, which I should have results from in the next couple weeks. In the presentation we also dug into some of the open source (statsd, collectd, Graphite, and other associated projects for metric collection) and vendor supplied tools including those from AppDynamics, AppFirst, Boundary, Circonus, Data Dog, Librato, New Relic, Sumo Logic, and Splunk.
You can find the research here (sorry clients only) : http://www.gartner.com/document/2633831
Category: Analytics APM Big Data DevOps IT Operations Logfile Monitoring SaaS Trade Show Tags:
by Jonah Kowall | December 20, 2013 | 1 Comment
After long delays regarding the publication of this research due to a vendor escalation through our Ombudsman office we have finally come to an agreement and published the research.
http://www.gartner.com/document/2639025 (Clients only, sorry)
There are vendors who have already put copies online via other website if you aren’t a client.
Our research is always meant to be a starting off point, and clients always get better advice for their specific and unique situations. Everyone has varying application architectures, hence often times little or unknown APM vendors might be the best fit for your environment.
Due to the delays this research is not as fresh as our prior APM Magic Quadrants so Will and I are writing a market update note which will be out next month and bring things a bit more current. What this means for the 2014 research is that we have slipped from a Q3 deliverable to a Q4 deliverable.
If you have questions or comments please leave them here or @jkowall on twitter.
Have a happy holiday.
Category: APM Tags:
by Jonah Kowall | December 6, 2013 | 1 Comment
I wanted to post a little something as we get ready to kickoff our 32nd annual data center conference. I will of course be there, speaking about such topics as Unified Monitoring and APM. I will also be discussing other availability and performance technologies. We’ve seen a lot of growing interest in analytics, and the Network Packet Brokering (NPBs) products so we’ll also be discussing those within our one on one sessions and analyst user round tables.
One big thing we have added for 2013 is the new track on web-scale led by the infrastructure and operations chief of research Cameron Haight. The goal of this track is to provide insight to enterprises and those with traditional infrastructures as to how the likes of Google, Amazon, Netflix, and Facebook actually build and operate infrastructure. I will be presenting on the use of APM and monitoring in these environments, and shedding some light on how those can be translated and applied. There is a heavy amount of open source technologies and code which needs to be written and managed in order to accomplish this. This will have a major impact on the industry as it stands today, and differentiate those who grow and become the next leaders in terms of technologies, and those who cannot keep pace with the demands of rapid innovation.
I look forward to posting the event wrapup in the next few weeks, along with some news about research related to web-scale, APM, and our upcoming Network Performance Monitoring and Diagnostics (NPMD) Magic Quadrant.
Category: Analytics APM DevOps IT Operations Logfile Monitoring NPB NPM NPMD SaaS Tags:
by Jonah Kowall | November 12, 2013 | 1 Comment
Colin Fletcher and myself have been seeing the trend of simplification across the monitoring estate. Investments in complex suites of tools have yielded higher operational costs and complexity in management of the tools themselves. Alternate approaches exist when looking at unified monitoring.
A new approach that enables the unified monitoring of the infrastructure inclusive of the virtualization layers has emerged. A particular, yet evolving, set of technologies is available in single or focused combinations of tools, including free-of-charge and/or low-cost options (see “Leverage Free and Low-Cost Server, Network and Storage Monitoring” ). These solutions commonly feature the following traits:
- Ability to consistently monitor servers, networks, storage and virtualization layers
- Acquisition of the topology of physical, virtual and logical elements, and the relationships between them
- Agentless data collection methods
- Discovery of assets to be monitored, including the collection of basic asset information
- Support for monitoring common application instances on OSs
- Synthetic transactions for checking Web application availability
- Service and infrastructure grouping
- Multitenancy support
- Use of APIs
- Integration layers (e.g., IT service desk tools, IT process automation, alert management, capacity planning and configuration management)
When these tools are paired with log analytics capabilities the need for agents is reduced as well as providing additional benefits of centralized log analysis and monitoring.
Clients can find the note here -> Modernize Your Monitoring Strategy by Combining Unified Monitoring and Log Analytics Tools
The client interest in these tools has been increasing and hence we’ve written an initial note in this arena, we will also be presenting content at the Gartner Data Center Conference in early December. It’s a great show to attend if you are interested in hardware, or the associated management software.
Feel free to engage with us here in comments or on twitter:
Category: Analytics IT Operations Logfile Monitoring OLM Trade Show Tags:
by Jonah Kowall | October 7, 2013 | 2 Comments
Prior to Gartner, I was part of several start-ups, of which a couple were acquired. As a result I have worked in very large environments. I’ve always had a focus on infrastructure and applications, and enjoyed both security and troubleshooting aspects. It was always my goal to learn as much as possible and diversify my skills. When something new came along, as Splunk did, I was instantly interested in how we could use it to resolve issues faster and troubleshoot effectively allowing for collaboration in environments where we tried to keep things secure between groups. Logs are incredibly useful to subject matter experts, whether they be developers, network engineers, server admins, virtualization admins, or storage experts. I was a Splunk customer 4 times over, and always found value in the tools.
One challenge with the product is data volumes are always increasing, and the amount of money spent on Splunk was following suit. The licensing model is consumption based, and costs increased with data ingest unlike most software IT Operations teams deal with. This often rubs customers the wrong way since they are used to buying hardware with this in mind, but not software. The cloud is hopefully changing perception, but as of today it remains an issue.
I had attended the user conference in 2011 (two years ago) and it was interesting to see all of the cool use cases people had come up with for this very open ended technology. With around 500 people it was a good turnout for a growing provider. Fast forward 2 years, and Splunk has grown considerably, gone public, and become an even hotter company. With about 1,600 people turning out for the conference there were far too many sessions to attend! I did catch some good customer sessions on how they were using Splunk, and they were useful, but most of them not too different than my day to day interactions with Gartner clients. I did hear some more innovative stories, but by and large people were using it for the same things I was when I first purchased the product in 2007. There has been considerable progress since then in the product and capabilities, let’s dig into some of those announced at .conf
The major announcements were around Splunk 6.0:
- Data Models – Splunk has always been easy to use, but you had to be a little technical to grasp the powerful search language. The data model concept is taking unstructured data and putting simple search syntax to create dimensions that can then be related to one another. What this means is that you can query the data in a relational way by using Excel type functionality such as Pivot tables that are built into the product. This makes the tool more approachable to non-technical users.
- Performance – In order to scale and handle the impact data models can have on performance and search they have considerably sped up search, as well as allowing users to accelerate parts of the product without making some of the more difficult decisions which used to have to be made. While some of the speed boosts require more disk space (see item 3), the performance is helpful
- Cloud – This one is the most interesting to my coverage, but as I dug into it more it wasn’t really cloud, but it does help enable customers (which is a good thing). This Cloud offering is actually a managed services offering run by Splunk on AWS. They still go through and spec and implement an environment for each customer based on their requirements, so there are services involved. The nice part about the offering is that Amazon has very low storage costs (especially since you can leverage S3 for low cost, and Glacier for even lower cost), and often with high retention requirements the cost of infrastructure can sometimes be higher than the cost of the Splunk licenses. The other interesting bit is that it offers the option to connect with on-premise Splunk deployments for universal visibility across all types of application and infrastructure.
Splunk is focusing on customer enablement, and fighting the hard battle of getting customers to use their data to create a clear and visible ROI. This means building special use cases for each customer that fits the business demands. This creates benefits for customers that were not originally realized when the product was implemented. Additionally it helps stickiness with the product which is under increasing pricing pressure due to the high cost of consumption based licensing.
Category: Analytics Big Data IT Operations Logfile OLM SaaS Trade Show Tags:
by Jonah Kowall | October 4, 2013 | 4 Comments
Due to the popularity of the Nagios threads I wanted to write a bit about how to use Nagios effectively. This equates to how one can best leverage custom scripts (checks) and also the large number of checks developed by the community to effectively monitor infrastructure and application instances (please don’t confuse this with APM). There are several approaches out there ranging from free to inexpensive. The first and most easy transition would be to leverage some of the low cost tools developed on top of the Nagios core, this includes companies who do even more core development than Nagios (which is a company).
Here are some of the more common Nagios based packages:
- Nagios XI – Created by the founders of Nagios, the solution still lags behind others in terms of capabilities, but the name often sends people their direction.
- OP5 – Highly scalable solution with increased functionality including the incorporation of open source baselining technologies, they have committed more core Nagios code in the redesign than any other company. If you believe in open source and community you must have a look.
- Centerity – They have expended Nagios with better dashboarding and network monitoring that is more capable.
- Groundwork – The solution has significantly increased dash boarding and restful APIs allowing programmability and a high degree of scalability.
- OpsView – Much more user friendly than the open source tools, but still Nagios under the hood.
Then there are tools which leverage existing checks from nagios, but get rid of a lot of baggage that comes with using Nagios Core:
- AppFirst – SaaS only, includes a unique collection technology for granularity, full support of statsd, and of course Nagios plugins.
- Server Density – Lightweight and inexpensive SaaS solution with good visualization and plug-in support, including Nagios plugins.
- DataDog – Great place to aggregate multiple technologies including Nagios. The solution offers monitoring delivered via SaaS as well
If the goal is learning “how do I do this better than Nagios, and I want to customize it” investigate the use of some of the tools above, in addition to having a look at Circonus, Librato, Graphite, Statsd.
If the goal is to free up time to do something more useful, look at agentless technologies. The sacrifices will be granularity and customizability, but management is much easier. Products as simple and inexpensive or free as Solarwinds (ipmonitor, orion), Ipswitch WUG, ManageEngine products, as well as free products like Spiceworks often do just enough to free up time and money to spend on high value tools. These may include more mature offerings in the APM space, which focus on transactional context and real user experience metrics.
Category: IT Operations Monitoring Tags: