by Jonah Kowall | November 12, 2013 | Submit a Comment
Colin Fletcher and myself have been seeing the trend of simplification across the monitoring estate. Investments in complex suites of tools have yielded higher operational costs and complexity in management of the tools themselves. Alternate approaches exist when looking at unified monitoring.
A new approach that enables the unified monitoring of the infrastructure inclusive of the virtualization layers has emerged. A particular, yet evolving, set of technologies is available in single or focused combinations of tools, including free-of-charge and/or low-cost options (see “Leverage Free and Low-Cost Server, Network and Storage Monitoring” ). These solutions commonly feature the following traits:
- Ability to consistently monitor servers, networks, storage and virtualization layers
- Acquisition of the topology of physical, virtual and logical elements, and the relationships between them
- Agentless data collection methods
- Discovery of assets to be monitored, including the collection of basic asset information
- Support for monitoring common application instances on OSs
- Synthetic transactions for checking Web application availability
- Service and infrastructure grouping
- Multitenancy support
- Use of APIs
- Integration layers (e.g., IT service desk tools, IT process automation, alert management, capacity planning and configuration management)
When these tools are paired with log analytics capabilities the need for agents is reduced as well as providing additional benefits of centralized log analysis and monitoring.
Clients can find the note here -> Modernize Your Monitoring Strategy by Combining Unified Monitoring and Log Analytics Tools
The client interest in these tools has been increasing and hence we’ve written an initial note in this arena, we will also be presenting content at the Gartner Data Center Conference in early December. It’s a great show to attend if you are interested in hardware, or the associated management software.
Feel free to engage with us here in comments or on twitter:
Category: Analytics IT Operations Logfile Monitoring OLM Trade Show Tags:
by Jonah Kowall | October 7, 2013 | 2 Comments
Prior to Gartner, I was part of several start-ups, of which a couple were acquired. As a result I have worked in very large environments. I’ve always had a focus on infrastructure and applications, and enjoyed both security and troubleshooting aspects. It was always my goal to learn as much as possible and diversify my skills. When something new came along, as Splunk did, I was instantly interested in how we could use it to resolve issues faster and troubleshoot effectively allowing for collaboration in environments where we tried to keep things secure between groups. Logs are incredibly useful to subject matter experts, whether they be developers, network engineers, server admins, virtualization admins, or storage experts. I was a Splunk customer 4 times over, and always found value in the tools.
One challenge with the product is data volumes are always increasing, and the amount of money spent on Splunk was following suit. The licensing model is consumption based, and costs increased with data ingest unlike most software IT Operations teams deal with. This often rubs customers the wrong way since they are used to buying hardware with this in mind, but not software. The cloud is hopefully changing perception, but as of today it remains an issue.
I had attended the user conference in 2011 (two years ago) and it was interesting to see all of the cool use cases people had come up with for this very open ended technology. With around 500 people it was a good turnout for a growing provider. Fast forward 2 years, and Splunk has grown considerably, gone public, and become an even hotter company. With about 1,600 people turning out for the conference there were far too many sessions to attend! I did catch some good customer sessions on how they were using Splunk, and they were useful, but most of them not too different than my day to day interactions with Gartner clients. I did hear some more innovative stories, but by and large people were using it for the same things I was when I first purchased the product in 2007. There has been considerable progress since then in the product and capabilities, let’s dig into some of those announced at .conf
The major announcements were around Splunk 6.0:
- Data Models – Splunk has always been easy to use, but you had to be a little technical to grasp the powerful search language. The data model concept is taking unstructured data and putting simple search syntax to create dimensions that can then be related to one another. What this means is that you can query the data in a relational way by using Excel type functionality such as Pivot tables that are built into the product. This makes the tool more approachable to non-technical users.
- Performance – In order to scale and handle the impact data models can have on performance and search they have considerably sped up search, as well as allowing users to accelerate parts of the product without making some of the more difficult decisions which used to have to be made. While some of the speed boosts require more disk space (see item 3), the performance is helpful
- Cloud – This one is the most interesting to my coverage, but as I dug into it more it wasn’t really cloud, but it does help enable customers (which is a good thing). This Cloud offering is actually a managed services offering run by Splunk on AWS. They still go through and spec and implement an environment for each customer based on their requirements, so there are services involved. The nice part about the offering is that Amazon has very low storage costs (especially since you can leverage S3 for low cost, and Glacier for even lower cost), and often with high retention requirements the cost of infrastructure can sometimes be higher than the cost of the Splunk licenses. The other interesting bit is that it offers the option to connect with on-premise Splunk deployments for universal visibility across all types of application and infrastructure.
Splunk is focusing on customer enablement, and fighting the hard battle of getting customers to use their data to create a clear and visible ROI. This means building special use cases for each customer that fits the business demands. This creates benefits for customers that were not originally realized when the product was implemented. Additionally it helps stickiness with the product which is under increasing pricing pressure due to the high cost of consumption based licensing.
Category: Analytics Big Data IT Operations Logfile OLM SaaS Trade Show Tags:
by Jonah Kowall | October 4, 2013 | 3 Comments
Due to the popularity of the Nagios threads I wanted to write a bit about how to use Nagios effectively. This equates to how one can best leverage custom scripts (checks) and also the large number of checks developed by the community to effectively monitor infrastructure and application instances (please don’t confuse this with APM). There are several approaches out there ranging from free to inexpensive. The first and most easy transition would be to leverage some of the low cost tools developed on top of the Nagios core, this includes companies who do even more core development than Nagios (which is a company).
Here are some of the more common Nagios based packages:
- Nagios XI – Created by the founders of Nagios, the solution still lags behind others in terms of capabilities, but the name often sends people their direction.
- OP5 – Highly scalable solution with increased functionality including the incorporation of open source baselining technologies, they have committed more core Nagios code in the redesign than any other company. If you believe in open source and community you must have a look.
- Centerity – They have expended Nagios with better dashboarding and network monitoring that is more capable.
- Groundwork – The solution has significantly increased dash boarding and restful APIs allowing programmability and a high degree of scalability.
- OpsView – Much more user friendly than the open source tools, but still Nagios under the hood.
Then there are tools which leverage existing checks from nagios, but get rid of a lot of baggage that comes with using Nagios Core:
- AppFirst – SaaS only, includes a unique collection technology for granularity, full support of statsd, and of course Nagios plugins.
- Server Density – Lightweight and inexpensive SaaS solution with good visualization and plug-in support, including Nagios plugins.
- DataDog – Great place to aggregate multiple technologies including Nagios. The solution offers monitoring delivered via SaaS as well
If the goal is learning “how do I do this better than Nagios, and I want to customize it” investigate the use of some of the tools above, in addition to having a look at Circonus, Librato, Graphite, Statsd.
If the goal is to free up time to do something more useful, look at agentless technologies. The sacrifices will be granularity and customizability, but management is much easier. Products as simple and inexpensive or free as Solarwinds (ipmonitor, orion), Ipswitch WUG, ManageEngine products, as well as free products like Spiceworks often do just enough to free up time and money to spend on high value tools. These may include more mature offerings in the APM space, which focus on transactional context and real user experience metrics.
Category: IT Operations Monitoring Tags:
by Jonah Kowall | August 22, 2013 | 2 Comments
With a potential for mobile APM to become larger than traditional APM over time these research notes investigate the current state of the market, and elude to future states. Full featured mobile APM is new, having emerged in 2013, which means they are immature and evolving quickly. With the rapid pace of mobile application development and growth not to mention the increased cycle at which mobile devices are brought to market, the total addressable market is large, and underpenetrated.
This pair of notes begins by investigating how these tools are consumed by specific groups which are separate from the centralized operations teams, similarly to how mobile application development groups are separate from the centralized development teams. Within the operations teams there will be a time when the end to end view of mobile performance will be relevant causing a convergence in the operational teams, while development teams will remain separate (this is according to our mobile application development analysts). We also discuss the current tools and what is implemented today for mobile monitoring, which until recently was limited to synthetic monitoring. The new generation of mobile APM tools have gone much farther, enabling detailed user, device and performance capture. This allows not only for corrective actions to be taken by development and operations teams, but also assists with quality, feature and capacity planning.
The second note focuses on the current market state, investigating solutions delivered today, or soon to be delivered. The vendors have a write-up on their solution and how it’s targeted. We cover the following vendors solutions (if I missed you please contact me and setup a briefing! Try @jkowall on twitter):
Targeted towards application developers (and enterprises):
- New Relic
Targeted towards enterprises:
- Netmotion Wireless
The docs can be accessed by clients here:
Improve Quality Mobile Application Delivery With Mobile Application Performance Monitoring – http://www.gartner.com/document/code/254686
Vendor Landscape for Mobile Application Performance Monitoring – http://www.gartner.com/document/2576317
Please feel free to leave comments!
Category: APM Mobile Monitoring Tags:
by Jonah Kowall | August 1, 2013 | 5 Comments
We have just announced something which has been in the works for a good portion of this year, we are adding another magic quadrant in the area of Network Performance Monitoring and Diagnostics (NPMD). This research item is being led by myself with close support of my colleagues Vivek Bhalla and Colin Fletcher. This is complementary, while being distinct from the current APM research and provides market guidance for those tools which handle performance measurement and diagnostics for the network. The network has been a well-defined entity since modern networking was invented over 30 years ago, which makes it easier to write tools to understand and analyze the way networks are built and operated. While the networks have been well defined the data and applications traversing these networks have changed considerably, requiring tools to evolve. Due to the pervasive standards across the network and much of it being driven by unified standards bodies, tool stagnation has generally occurred (aside from a few emerging/interesting players). Largely consisting of deep packet inspection technologies from by and large centralized network locations, the time to change is now. With the emergence of public and private cloud, and the increased interest in the promises of SDN and NFV (while still being very young) these tools must evolve considerably. In the documents we’ve published this week (clients only):
Introducing the Network Performance Monitoring and Diagnostics Market – http://www.gartner.com/document/2563115
Criteria for the New Magic Quadrant for Network Performance Monitoring and Diagnostics – http://www.gartner.com/document/2563315
We outline the market as its defined today, some sizing information as well as growth. The second document announces the criteria for the upcoming Magic Quadrant. We will be kicking this research off in a month, and the publication time table will be the first quarter (Q1) of 2014. If you are not a client a would like to see the criteria for being included (we include non-clients in all research regularly) please contact me on twitter (@jkowall) or via firstname.lastname @ gartner.com
Category: IT Operations Monitoring NPM Tags:
by Jonah Kowall | August 1, 2013 | 1 Comment
Commonly in a market where there is a commodity (such as availability monitoring) and a hot emerging set of technologies (typically in performance monitoring) the vendors who sell commodity solutions will say they are offering product in the emerging markets where the spend tends to be. This is something which causes confusion in buyers, and they often fail to understand who they should be investigating for a given need. We spend a lot of time trying to sort out this noise and confusion, so in order to provide clients with a good overview of the market I did the unthinkable….
We put together a large taxonomy of the availability and performance monitoring vendors, and if the solutions have partial coverage, complete coverage, or no coverage. We divided the market as follows:
- General (fault monitoring from a general perspective)
- Business Service Management (BSM)
- Event Correlation and Analysis (ECA)
- Notification Management
- Application Instance
- Application Performance Monitoring (APM)
- End User Experience (EUM) – Client
- EUM Mobile
- EUM Network
- EUM Synthetic
- Transaction Tracing
- Deep Dive
- Database (in an APM context)
- Storage (in an APM context)
- SaaS options
- Network Configuration and Change Management (NCCM)
- Unified Communications
- Network Testing
- Out of Band Management
- Network Performance Monitoring
- Unified Communications (performance via DPI)
- Network Packet Broker (NPB)
The difficulty in building this research was ensuring we represented the over 300 vendors in this matrix properly to the market. This resulted in me sending over 700 emails in the 4 day review period, including upsetting a few vendors who tried to position themselves differently than the technology or buyers dictated. One document has the toolkit spreadsheet, while the other describes the challenges and categories in the research.
I hope you enjoy the result of this research, sorry these links are for clients only.
How to Cut Through Vendor Hype and Make Sense of the Availability and Performance Monitoring Market - http://www.gartner.com/document/2563215
Toolkit: Availability and Performance Monitoring Vendor Guide - http://www.gartner.com/document/2560518
Category: APM BSM ECA IT Operations Mobile Monitoring NPB NPM OLM Tags:
by Jonah Kowall | July 10, 2013 | Submit a Comment
Oh yes, I know i am late in delivering this one. I’m a bit tied up with a boatload of new research coming out, and I was at home for 5 days in June… July looks pretty bad, and then the rest of the year eases up thankfully. I was able to attend Cisco live this year during the last week of June in Orlando. When it comes to networking the Interop show has all but died as of 2013, and there isn’t really any other good pure play networking conference. Cisco Live is the show for networking companies (unless of course you compete with Cisco… then maybe VMworld is your spot).
In terms of the content Cisco puts together a great analyst program called C-Scape where we have a chance to meet with and spend time with Cisco executives we normally speak with on the phone. It’s great to discuss and share insight with these folks in less structured environments. They also give us access to many demo stations with product managers to discuss our coverage areas. The analyst relations folks at Cisco spend a lot of time and energy putting together this program, thank you to them!
I was also able to attend a few of the sessions on Cisco’s onePK platform for the strategy around APIs and programmability (SDN). These APIs need to be better standardized (maye OpenDaylight will provide this). There are so many standards and APIs which Cisco supports in order to be open, puzzling these APIs and components together without a standard reference architecture can cause problems down the line. Once these technologies and architectures mature we should see better published best practices in terms of implementation, but it is early. Similarly while the vision around Insieme sounds interesting, there was not enough meat for me to really get a handle on it. I am curious to hear how Cisco will improve and measure application performance with this new technology. We will all have to wait until later this year.
I spent time with the software teams around the Cisco Prime management product lines, the future of Cisco network management appears to be opening up in terms of the Cisco only perspective there once was. I haven’t seen the result of these strategic shifts, but I hope to see Cisco supporting the management of non-Cisco network equipment as they have done well on the UCS side, where they do well in supporting other cloud and virtualization platforms.
Finally in terms of other vendors, I had a lot of meetings and discussions, here are some of the most memorable ones in no particular order:
- Seeing and interacting with the new Netscout nGeniusOne was great to see in person, this launch represents a massive redesign of the UI and architecture of the platform. This launch includes a new packet virtualization technology (ASI 2.0) with major improvements. The re-platforming will result in pricing simplifications which should make the product more attractive. I look forward to getting deeper with the new product as the rest of the modules are incorporated through the year.
- Gigamon shared vision of the product lines, moving from being network packet brokers (NPB) towards being monitoring systems which can analyze and alter the stream of packets in a real-time manner. They have a daunting task ahead of them, but flush with cash after an IPO they could make this happen.
- I got deeper into Corvil’s new products including messaging and positioning towards enterprises as they evolve from being a trading based monitoring technology into a performance monitoring generalist. Very interesting to discuss some of the multi-hop analysis and time synchronization technologies they have created over the years and how those can help re-assemble transactions across segments.
I had lots of other meetings as well, don’t feel bad if I didn’t include you here, I appreciate everyone’s time, and thanks again for Cisco’s hospitality. Please leave comments or track me down on Twitter.
Category: APM IT Operations Monitoring NPB Trade Show Tags:
by Jonah Kowall | July 10, 2013 | 7 Comments
I posted a write-up of “getting rid of” Nagios previously, and it’s generated a staggering number of responses by those who are attached to Nagios and have used it effectively. This has ignited a similar discussion to what you see in the Mac vs PC or *nix vs Windows discussions pervasive across the internet (complete with personal attacks, I thought we were all adults here…). Those who know me, know I’ve been there and done it in my past, and in speaking with thousands of users of monitoring tools over my time at Gartner I can share some common threads.
Many of the readers who commented on the story were managing hundreds of servers, this is typically not the size of enterprises I speak with (although sometimes we do speak with those groups). We often speaking with an enterprise or IT architect who is trying to homogenize the management of the IT landscape. This is not just servers, but applications, network devices, storage systems, and other components. If you do not treat your IT infrastructure with uniformity, the notions of industrialization and standardization are elusive.
The typical Gartner client is medium to large companies, they typically have grown through the years by organic and acquisition of other businesses. This makes their IT landscape quite diverse, and often managed in isolated groups. In the discussions with these isolated organizations, left to their own devices, have selected a wide array of tools to manage their infrastructure and applications. These tools overlap with other business units within the organization using similar or dissimilar tools. For example a client I spoke with last week had over 65 monitoring tools in place, the size of the environment was 75,000 servers, let alone a vast array of other components not in scope for phase 1. As you can imagine there was everything from Icinga, Nagios, Zabbix, Graphite, etc (on the open source front) to what we call the big-4: IBM, CA, HP, BMC, and even islands of VMware vCenter Operations Suite, Oracle Enterprise Manager, and Microsoft System Center Operations Manager. I tried to offer some prescriptive advice, but the task at hand was daunting to say the least. The CIO who created this project was responding to the massive cost of licenses and people the business was essentially wasting by not managing this centrally, and not leveraging it’s scale.
Nagios plays a big role in these organizations, often being implemented several different ways in a single enterprise. Trying to standardize the implementation is a challenge especially with most Nagios users selection various components from the open source and Nagios communities to build their personal preferred implementation. By utilizing aspects of open source the vendors who leverage Nagios, but build standardized ways of implementing and managing the footprint create a likely more successful implementation at scale. (EX: Centerity, Centreon, Groundwork, OP5, Opsview).
Additionally a common issue that is pervasive across monitoring is the over extension of the platform to do more than the core tenants of monitoring. The capture of metrics, notification of issues, and the analysis and correlation of metrics to determine root cause for problem isolation. Monitoring should not run jobs needed to operate and correct infrastructure, yet we see this happening consistently regardless of the platform. This creates lock in to the monitoring tool regardless of the vendor or technology used. When the business is forced to transform for example due to acquisition, new business demands, data center relocation, consolidation, or newer technology which enables easier management they are unable to detach the monitoring from the infrastructure or applications without extensive reverse engineering, which often times is not possible.
So I leave you with a question, how do you avoid these typical scenarios regardless of the tool in place?
Category: Analytics ECA IT Operations Monitoring Tags:
by Jonah Kowall | June 23, 2013 | Submit a Comment
I was able to attend Velocity conference most of the past week. I had a lot of meetings with various clients, prospects, and was able to meet two new mobile APM companies which were not on my radar screen. Look for the Mobile APM research to publish in the next couple weeks. I met a lot of interesting and very smart engineers, entrepreneurs, and thought leaders at the conference.
Velocity is full of very technical engineers who care about performance, this is the best conference for those interested in APM. The main difference is that these attendees aren’t the decision makers, but they are the implementers of large scale environments, and they understand how important performance is. Most of these guys would rather build custom solutions leveraging open source than necessarily buy software in general, it doesn’t detract from the conference, it’s just different than many of the other conferences out there. The talks are incredibly technical, but extremely interesting from my perspective. I’ll run through a few interesting ones.
The week started out with a great presentation by Theo Schlossnagle (@postwait) you can find the slides here : http://www.slideshare.net/postwait/monitoring-and-observability-23201316 He has an interesting perspective on monitoring, and breaks down the difference between push and pull monitoring technologies, he then explains how the focus should be on the execution of business versus the metrics of the systems, but he then goes into a very DYI explaination of accomplishing this. This is a great presentation explaining the fundamental differences between commercial software solutions and the best fit solution (normally crafted by talented engineering).
I then attended at Etsy presentation by Abe Stanway @abestanway and Jon Cowie @jonlives spoke about how Etsy handles continuous deployment and the processing of a staggering number of metrics. They use a combination of open source and custom built tools to collect, process, store, and run analytics on data in order to anticipate and prevent problems. They do 30 deploys per day, and everyone deploys code. They agree (as I do) that Nagios is pretty ineffective on it’s own, but it can be used to collect data in a useful manner for other purposes. Here is the Etsy anomaly detection engine which processes metrics from multiple data sources https://github.com/etsy/skyline , combined with the correlation engine Oculus https://github.com/etsy/oculus you have a pretty powerful high volume metric processing system. The slides can be found here : https://speakerdeck.com/astanway/bring-the-noise-continuously-deploying-under-a-hailstorm-of-metrics
Interesting talk by Dylan Richards @dylanr on how they amazingly built and executed the Obama campaign. I loved the failure testing stories towards the end. The video of his talk is here : http://www.youtube.com/watch?v=LCZT_Q3z520&feature=youtu.be
The last thing I was impressed by was a talk from Ilya Grigorik (Google) @igrigorik about the use of the public domain http://httparchive.org/ data for understanding all kinds of information about the way the web is constructed. I wasn’t even aware of this data set, additionally aside from loading the 500G of data into your own mysql data source google has made this data publicly accessible using bigquery. Ilya builds some interesting queries within bigquery, and then moves onto using Google Docs spreadsheets to drive bigquery searches, graphing, and results. Overall very impressive show of what these tools can do, and what’s available to the public. Video of Ilya’s talk is here : http://www.youtube.com/watch?v=bhUMHKJf3r4
Overall a great conference, sorry I didn’t attend the Gartner Infrastructure and Operations Management (IOM) conference in Orlando, but I will definitely be at Gartner Data Center in December. Off to Cisco Live tomorrow, and finally my month of travel and trade shows ends.
Category: Analytics APM Big Data IT Operations Monitoring Trade Show Tags:
by Jonah Kowall | June 15, 2013 | Submit a Comment
Sorry for the delay in my last post, I assure you I will have much to say over the next few weeks, more on that later.
I was fortunate to attend and speak at the Gartner IT Operations Management (IOM) summit in Berlin, Germany. The event was wonderfully coordinated and saw about 30-40% growth in attendance, which is great on it’s third year. Kudos to the conference chairs and the team that put it together. I spoke about software defined networking (SDN) and the evolution within this market in the past 18 months, there were a lot of questions after this presentation. This is a hot topic in Europe as well as North America. I also presented with my colleague Vivek Bhalla on the use of application performance monitoring (APM) and how that intersects with Network Performance Monitoring (NPM), we went into depth explaining the overlap and the differences between both of these monitoring technologies. Look for more research on NPM to be announced shortly. The vendors had a good presence, and there was a lot of attendees investigating the vendors. My 1:1s were excellent, but generally found that APM adoption and sophistication in Europe is below that in North America, something I also find on client inquiry calls.
Next week I will be attending the excellent Velocity conference which O’Reilly puts on, it’s a progressive audience and has lots of great performance and operations content. The week after Velocity I will be at Cisco Live in Orlando.
Look for updates on these conferences and several new research items to hit the web in the coming weeks.
Category: APM DevOps IT Operations Logfile Monitoring NPM Trade Show Tags: