by Jonah Kowall | August 15, 2014 | 3 Comments
Some changes in the upcoming Q1 2015 delivery of the NPMD Magic Quadrant. Vivek Bhalla will be taking over as the lead author of the research (@vbhalla1) and I’ll be co-authoring along with Colin Fletcher and Gary Spivak. We will be sending vendor surveys out on Monday, if you are a vendor who believes you qualify based on the criteria below and you did not receive a survey on Monday please reach out via email or twitter (firstname.lastname@example.org). We regularly include non-clients in research and this research is no different, we strive to build the most relevant research for our large end user client base.
Thank you, and we look forward to this research deliverable.
NPMD tools allow for network engineers to understand the performance of applications and infrastructure components via network instrumentation. Additionally, these tools provide insight into the quality of the end user’s experience. The goal of NPMD products is not only to monitor the network components to facilitate outage and degradation resolution, but also to identify performance optimization opportunities. This is conducted via diagnostics, analytics and debugging capabilities to complement additional monitoring of today’s complex IT environments.
This market is a fast-growing segment of the larger network management space ($1.9 billion in 2013) and overlaps slightly with aspects of the application performance monitoring space ($2.4 billion in 2013). Gartner estimates the size of the NPMD tools market at $1.1 billion.
Vendors will be required to meet the following criteria to be considered for the 2015 NPMD Magic Quadrant:
- The ability to monitor, diagnose and generate alerts for:
- Network endpoints — Servers, virtual machines, storage systems or anything with an IP address by measuring these components directly in combination with a network perspective.
- Network components — Such as routers, switches and other network devices. This includes SDN and NFV components.
- Network links — Connectivity between network-attached infrastructure.
- The ability to monitor, diagnose and generate alerts for dynamic end-to-end network service delivery as it relates to:
- End-user experience — The capture of data about how end-to-end application availability, latency and quality appear to the end user from a network perspective. This is limited to the network traffic visibility and not within components such as what application performance monitoring is able to accomplish.
- Business service delivery — The speed and overall quality of network service and/or application delivery to the user in support of key business activities, as defined by the operator of the NPMD product. These definitions may overlap as services and applications are recombined into new applications.
- Infrastructure component interactions — The focus on infrastructure components as they interact via the network, as well as the network delivery of services or applications.
- Support for analysis of:
- Real-time performance and behaviors — Essential for troubleshooting in the current state of the environment. Analysis of data must be done within three minutes under normal network loads and conditions.
- Historical performance and behaviors — To help understand what occurred or what is trending over time.
- Predictive behaviors by leveraging IT operations analytics technologies — The ability to distill and create actionable advice from the large dataset collected across the fourth requirement.
- Leverage the following data sources:
- Network-device-generated data, including flow-based data sources inclusive of NetFlow and IPFIX.
- Network device information collected via SNMP.
- Network packet analysis to identify application types and performance characteristics.
- The ability to support the following scalability and performance requirements:
- real-time monitoring of 10 gigabit (10G) Ethernet networks at full line rate via a single instance of the product
- Ingest sampled flow records at a rate of 75,000 flows per second via a single instance of the product
Non-product Related Criteria
- A minimum of 10 NPMD customer references must be included at the time of survey submission.
- Customer references must exclude security-oriented use cases and scenarios.
- Customer references must be located in at least two of the following geographic locations: North America, South America, EMEA, and/or Asia/Pacific/Japan.
- Total NPMD product revenue (including new licenses, updates, maintenance, subscriptions, SaaS, hosting and technical support) must have exceeded $7.5 million in 2013, excluding revenue derived from security-related buying centers.
- The vendor should have at least 75 customers that use its NPMD product actively in a production environment.
- The product, and the specific version, submitted for evaluation must be shipping to end-user clients for production deployment and designated with general availability by October 31st 2014.
The 2015 NPMD Critical Capabilities that will be published subsequent to the 2015 NPMD Magic Quadrant and be a complimentary piece of research.
Your survey submission and demo briefing will also be used for the purposes of writing this document in addition to the Magic Quadrant.
The 2015 NPMD Critical Capabilities will be assessed upon the following criteria:
- End-point, Component and Link Monitoring
- Service Delivery Monitoring
- IT Operations Analytics
In addition to the above criteria, we will be evaluating each vendor’s ability to cross multiple buying centers, as well as its ability to target specific verticals as validated by reference customers.
Category: IT Operations Monitoring NPM NPMD Tags:
by Jonah Kowall | August 14, 2014 | 2 Comments
A long running research project has been underway for many months collecting data from vendors and putting together this research note. Big kudos to my colleague Vivek Bhalla for doing most of the work on this note. It came out great, with lots of valuable insight and vendor analysis.
The network packet broker (NPB) space has been interesting for the last couple years with lots of shifts and market changes happening. We highlight many of these changes in the market guide along with providing write-ups along with strengths and challenges for each vendor profiled. We did this for solutions from Apcon, Arista Networks, cPacket, Cubro, Gigamon, Interface Masters Technologies, IXIA, JDSU (Network Instruments), Netscout, and VSS Monitoring. Gartner clients can access the research here:
07 August 2014 G00263407
Category: Monitoring NPB Tags:
by Jonah Kowall | July 29, 2014 | 5 Comments
I’ve been working on a pretty conceptual research project over the last few months. The research is finally out as of yesterday (7/28/14). The basic premise is that as the environments and technologies continually evolve and become more abstract and complex the monitoring tools need to evolve in the same manner. The main issues are the use of traditional architectures versus big data and streaming architectures. Additionally the ease of deployment and use are the new normal, SaaS is a critical deployment model to facilitate fast time to value.
Additionally we look at tool proliferation issues, and some data behind those problems. The other issue investigated is the general failure of ECA approaches in terms of fixing the complexity of the tools, by simplifying the tools with unified monitoring approaches, combined with ITOA these issues can be more easily handled. On the ITOA front, we share data collected on Unstructured Text Search and Inference (UTSI) or what many call log analytics.
Monitoring tools are beginning to be used for multiple use cases outside of operational visibility, and more of this is investigated in this latest research note. Clients can read more here:
Category: Analytics APM Big Data ECA IT Operations ITOA Logfile Monitoring OLM SaaS Tags: ITOA
by Jonah Kowall | July 29, 2014 | 4 Comments
This was my first year attending the open source focused OSCON conference by O’Reilly. I’m a huge Velocity fan, and get a lot out of that conference, hence I figured I would try another conference. Overall I found this conference far less valuable to me for several reasons. While there was a bit of interesting content, the show lacked focus in general. There were show floor exhibits including do it yourself electronics, non-profit, to commercial vendors. Much of the conference was for recruiting in Portland, where lots of startups trying to keep pace with growth pull talent. Here are some of the better sessions I attended and a little bit about them:
Tutorial Node.js 3 Ways
C. Aaron Cois (Carnegie Mellon University, Software Engineering Institute), Tim Palko (Carnegie Mellon University, Software Engineering Institute)
Since I’ve already done some programming in Node for my Google Glass App, I attended this one more to get a better tutorial than my hack/trial by fire. My programming skills are hackery at best
You can find the content here : http://cacois.github.io/nodejs-three-ways/#/
This was a good primer, but there were a couple people in the room who needed a lot of tech support. Additionally some pre-prep by attendees would have made this much smoother. We did some good WIFI load testing, which showed the network couldn’t handle peak loads.
An ElasticSearch Crash Course – http://www.oscon.com/oscon2014/public/schedule/detail/33571
Andrew Cholakian (Found)
I’ve written a lot about ElasticSearch, the internals of the engine were probably the most useful content I got from OSCON this year. I have a much better understanding of the technology with these fundamentals. Here are some notes:
- Wikipedia is moving to it
- Github code search based on it
- Netflix using it for log data
Indexes can live anywhere, and are split across
Each Index has documents
Every field has an index
Docs are routed via hashing and sharded
Shards are lucene indexes – they are replicated
Deleting and updating indexes are expensive.
Writes are slow
Cannot do transactional operations
Docker – Is it Safe to Run Applications in Linux Containers?
Jerome Petazzoni (Docker Inc.)
This is one thing which is a major issue with putting Docker into production. The lack of control and general security are missing and this presentation was interesting and much needed. Jerome was an excellent presenter and made some very good points. I especially liked the idea of running app instances read only, that avoids most of the security issues.
Tracing and Profiling Java (and Native) Applications in Production – Twitter
Kaushik Srenevasan (Twitter)
Interesting discussion of how Twitter who is a heavy Java and Scala shop handles instrumentation. They run their own OpenJDK JVM distribution with customizations running on CentOS. In summary this is somewhat dated view of instrumentation since commercial BCI and instrumentation on Java has some so far. If you don’t want to pay for something and want to build your own, this is somewhat interesting, but has very limited capabilities in terms of what modern APM can do today. Here are the other notes:
- Java, Scala most popular
- Some C++
- Some Ruby (Kiji), Python
They bundle their own JVMTI agents in the code.
- Low latency garbage collection on dedicated hardware and mesos
- Services are getting larger
- Scala optimizations – functional programming language
- Tools : Contrail, Twitter diagnostics runtime
- Wanted something like dtrace, but they don’t have it on Linux
- Using perf for the linux profiling
Please leave comments here or on twitter @jkowall thanks!
Category: APM DevOps IT Operations Monitoring Trade Show Tags:
by Jonah Kowall | July 22, 2014 | 3 Comments
Splunk has been rising quickly in the ranks of buyers looking to solve complex problems, or those looking to build interesting and new analysis of their data. As they have grown in popularity, and gone public we’ve been given a wealth of new data about the company, operations, and execution beyond hearing from their large customer base. In this research note, we’ve combined analysis of the technology, company, and financials. My colleague Gary Spivak, who’s background is on the financial analysis side led the research, and I contributed some additional analysis of the technology, company, and other elements. The note has only been out for a few days now, but it’s gotten a lot of response from our client base. For those of you who are clients, you can find the document here :
Vendor Insight: Splunk, Separating Hype From Reality – http://www.gartner.com/document/2802724
Category: Analytics APM Big Data IT Operations Logfile Mobile Monitoring Tags:
by Jonah Kowall | July 22, 2014 | Submit a Comment
Just wanted to provide a heads up that we’ve published updated Hype Cycles, there are more publishing now as well. The two which just hit the wire which I worked on.
Hype Cycle for Networking and Communications, 2014 – http://www.gartner.com/document/2804820
- I worked with Will Cappelli on the Application Performance Monitoring profile in this Hype cycle, which has some updates around APM.
- Along with Vivek Bhalla, Colin Fletcher we wrote up a new profile for Network Performance Monitoring and Diagnostics Tools which is new for 2014.
- Vivek lead our efforts around Network Configuration and Change Management (NCCM) tools. This profile wa supdated, and expect some great new research from Vivek on this topic.
- Vivek also led our efforts along with support from Colin and myself on Network Fault Monitoring Tools, this profile has minor updates for 2014.
Hype Cycle for IT Operations Management, 2014 – http://www.gartner.com/document/2804821
- Will Cappelli led efforts on IT Operations Analytics (ITOA) thois year, along with support from Colin and myself. There were minor updates in this profile for 2014.
- We also included the same profiles from the hype cycle above!
Category: Analytics APM Hype Cycle IT Operations Logfile Monitoring NPM NPMD SaaS Tags:
by Jonah Kowall | July 10, 2014 | 2 Comments
This week we are highlighting a new offering from Aternity which began shipping recently. Aternity, is headquartered outside of Boston, MA but along with many other APM companies most of the R&D takes place in Israel. Aternity has been an innovator in desktop end user experience monitoring. The solution while technically differentiated caters towards large enterprise implementations, which had prevented them from moving away from these enterprise installs. While most of our applications today have been moving to being purely web based applications causing increased importance in modern end to end APM solutions and RUM solutions there are still and will remain many critical applications on the desktop. Today’s APM tools do a poor job or otherwise provide a high level perspective (leveraging the network) of handling these non-web applications.
Aternity’s Workforce APM product, had been based on innovative and unique technology which allows for detailed user and workflow capture of any application running on a Windows desktop endpoint. This is not a solution which requires professional services or specialized programming as some of the other entrants in the market do. I have used the tool, it’s pretty easy to learn, but the programming is done with the studio product which needs work including a modern user interface. They have recently launched an improved studio (see video here : https://www.youtube.com/watch?v=hLphXVMCMGo&feature=youtu.be) which helps some of these issues, but it’s still not as clean as alternate solutions when doing custom collection. The desktop capture agent is a small program running on the end point (Windows only, but it can be running on physical or virtualized hardware such as HVD/VDI implementations). The data is fed into a relational database and Tableau is used on top of this data to provide reporting, dashboarding, and most of the user interface.
Moving on to the new offering. Aternity mAPM is a mobile APM product, this product which allows for native application monitoring on Android and iOS. Implementation is done by both post compile wrapping of native mobile applications, or the compilation of the instrumentation into the native mobile application. Unlike today’s Workforce APM implementations, which are mostly deployed as traditional on premises software (although Aternity is seeing more customers opt for SaaS delivery of the enterprise solution) the Mobile APM offering can be deployed using Aternity’s SaaS services or via the traditional on premises deployment.
Here are the high level screens of the free Mobile APM offering, this is targeted at developers.
The product can be fed with simulated data or with actual data, in this case here is simulated data in my portal. The GUI is very usable, there is no scrolling and everything is drillable and filterable:
Here is the crash data where you can download the crash file for debugging.
Some interesting data usage reporting:
I’ve also used the built in support features, and can report Aternity is responsive and helpful even with the free accounts. As you can see a pretty comprehensive offering on the mobile side, now the question remains will Aternity be able to penetrate mobile development organizations or will they continue to sell strictly to the IT Operations buyers. The combined mobile and desktop end user experience monitoring is an interesting concept, but few organizations have the maturity to take advantage of both of these offerings due to fragmentation in most organizations.
I’m pretty tied up reading the thousands of pages and analyzing data for the upcoming APM Magic Quadrant, but I’ll find time next week to write up SOASTAs new mobile offering. On deck after that post will probably be SpeedCurve in early August. Thanks for reading, please leave feedback here or on twitter @jkowall.
Category: APM Mobile Pick of The Week SaaS Tags:
by Jonah Kowall | June 30, 2014 | 3 Comments
Always one of the more enjoyable conferences for me to attend, I don’t get worked as hard as Gartner conferences which are also really enjoyable, but I spend time doing the educating versus listening to other smart people. Velocity is a practitioner focused conference and is very geeky (in a good way for those of us who are pretty deep technologists). I’ll highlight some of the great sessions I attended and other technologies I discovered.
The conference is put in my a competitor of course, since we do our own events, but they had over 2,400 registered attendees and over 100 sponsors. There seems to be growth here, and the conference is always larger. Here are some session bullets I found interesting. You’ll notice a pretty wide spread from performance of the front-end, application middleware, and backbends.
Webpagetest deep dive – http://twitter.com/PatMeenan – http://cdn.oreillystatic.com/en/assets/1/event/113/WebPagetest%20Power%20Users%20Presentation.pdf
This is a great open source tool for measuring and diagnosing front-end performance. I’ve used the tool, but had been mostly ignoring it since it wasn’t evolving too much. That was quite a mistake since it’s evolved considerably since I’d last really used it.
- Good to dig into the new features in the advanced settings tab
- Run more than one test when measuring, always
- Very cool advanced visual comparison
- Filmstrip view has been improved
- Can do mobile runs, which show it in a mobile browser (very cool)
- Browser CPU usage stats can be overlaid on waterfall
- Can export tcpdump (use in wireshark or cloudshark)
Docker – https://twitter.com/kartar
Content was good for those who hadn’t used docker. I’ve done some basic work on it, and find it interesting, but also quite basic in nature. Some of the discussion hit on issues around security, support for other containers, and overall limitations in this immature, but evolving technology.
- The room was packed.
- Dockerfile instructions (kind of like a init.d script), I hadn’t used these before, but they are critical when using docker at scale.
RUM Comparison and Use Cases – https://twitter.com/bbrewer https://twitter.com/bluesmoon
The team at SOASTA presented a non-vendor biased view of RUM. While I found the landscape they laid out basic, and partially incomplete, but still a valiant effort by the team there. The key takeaway is more users are trying to tie business metrics to RUM data, for example e-commerce companies tying and analyzing revenue to users and performance.
Google – Jeffrey Dean (http://research.google.com/pubs/jeff.html)
Interesting discussion by Google’s Jeffrey Dean, the most interesting part I found was his analysis of data replication to extra nodes to reduce latency, and of course the multiple-write technologies many use to deal with that replication closer to the source of the data
Keynote systems – https://twitter.com/keynotesystems
Ben investigated what page load times look like, some of the interesting data he presented was what fast was varied by country and other demographic data. He also used the video capture features of webpagetest.
Speedcurve – https://twitter.com/MarkZeman - Blog and Video of the Keynote - http://speedcurve.com/blog/velocity-responsive-in-the-wild/
This was one company I hadn’t heard of (well more like a 1 man show), interesting company which does a nice frontend and comparative analysis using a webpagetest backend. Some notes:
- Sits on top of webpage test
- Competitive benchmarking, runs once a day, multiple runs
- Complements RUM
- Shows filmstrips
- Formats the data much better
- Helps find savings, etc
- Can get to webpagetest views as well
- Showed some interesting research on visualizing data
Understanding Slowness – http://www.twitter.com/postwait : https://speakerdeck.com/postwait/understanding-slowness
Always a highlight of Velocity for me, Theo is a unique and extremely bright individual. He always brings good analysis and practical content, he’s an ops guy through and through. There is no marketing or other fluff you often see with content at conferences. Some high level notes:
- Document your architectures
- Have a plan
- Use redundant vendors, don’t put your eggs in one basket (easier said than done, but for some things a good idea)
- Measure latency (performance
- Quantiles over histograms
- Observation – takes state, watches
- Dtruce, truss, tcpdump, snoop, sar, iostat, etc
- Synthesis – Run a test to enable diagnostics (replicate an issue)
- Manipulation – test hypothesis
Some Simple Math to get Some Signal out of Your Ops Data Noise – https://twitter.com/tboubez - http://www.slideshare.net/tboubez/simple-math-for-anomaly-detection-toufic-boubez-metafor-software-velocity-santa-clara-20140625
Not sure I’d call this simple math at all, but here is a very new company we awarded a Cool Vendor this year for APM and ITOA who focuses on ITOA use cases with their solution. They have a lot of growing up to do as a company, but they have some compelling analytics technologies. Mr Boubez applies and brings the readers through a journey of math, what we’ve tried (which doesn’t work too well) and some techniques which do work much better. Clearly worth a look.
- Gaussians don’t work with data center data
- Use histograms (even though Theo says they may not be the best visual analysis tool)
- Kolmogorov-Smirnov test allows for better data
- Handles periodicity in the data
- Box Plots / Tukey
- Doesn’t rely on mean and stddev
- IQR moving windows
Sitespeed.io - https://twitter.com/soulislove
Early phase tool for running rules against frontend optimization, which is a cool idea. I’m going to wait for lab time until version 3 written in node.js comes out in 3 weeks
Category: APM Monitoring Trade Show Tags:
by Jonah Kowall | June 30, 2014 | 5 Comments
Just wanted to share some news and new research. We’ve seen a lot of changes in the market within both APM (lots of new entrants in the past month, and many more coming in the next few months). Expect some more reviews and content on the blog for some of the more interesting vendors. I’ve been posting less this month since I’ve not been home much. The first half of June was spent attending two Gartner infrastructure and operations management conferences one taking place in Berlin, and the second in Orlando. I just arrived back from the Velocity conference, which I will be posting about this week. I wanted to share some of the new research and content we presented.
The research note led by Colin Fletcher which I was able to co-author published June 24th, the presentation was given earlier this month, both titled:
Apply IT Operations Analytics to Broader Datasets for Greater Business Insight – http://www.gartner.com/document/2778217 <- (subscribers only)
The research highlights the flexibility and use cases for leveraging ITOA investments to combine IT and non-IT data to provide insight and extract relevant metrics across systems. Critical elements include the ability to explore, dream, and learn from the data collected driven by combining the right data sets. The following Strategic Planning Assumption: By 2017, approximately 15% of enterprises will actively use ITOA technologies to provide insight into both business execution and IT operations, up from fewer than 5% today. Highlights this trend, and the need to deliver better analytics capabilities to the business.
We explain the different data sets which must be collected, and how insight can be derived. We also make corollaries to other IoT and IT/OT research at Gartner.
Sample List of ITOA Vendors
AccelOps; Appnomic Systems; Apptio; Bay Dynamics; BMC; Evolven; Hagrid Solutions; HP; IBM; Loggly; Metafor Software; Moogsoft; Nastel Technologies; Netuitive; Nexthink; OpTier; Prelert; Savision; SAS; SL; Splunk; Sumerian; Sumo Logic; Teleran; Terma Software Labs; VMware; XpoLog
Expect a post of a cool vendor of the week tomorrow, and another one next week. We’ll be moving from a focus on ITOA log analytics vendors towards Mobile APM vendors.
Category: Analytics IT Operations Tags:
by Jonah Kowall | June 4, 2014 | 3 Comments
Thanks for those who came out to the Gartner IOM conference in Berlin which wrapped up yesterday. Interesting things happening and discussions with clients and attendees. We have the US version of this conference next week in Orlando and I will be there!
Posting this from lovely Budapest.
Keeping on the theme of log analytics, which comes up a lot in conversations related to both unified monitoring (infrastructure availability), and APM we are seeing this technology as particularly applicable across monitoring disciplines and silos within organizations.
There is yet another company we haven’t highlighted which has been flying under the radar since being founded in Israel in 2003 and has been building index and search technology which differentiates itself by doing deeper automated analysis of the data before the user is involved in querying the data.
The software discovers patterns and problems within the log data and is more proactive than other log analytics tools used by IT Operations. Data is searched by the user, and additional layers are placed on top of the data providing context. There are no rules needed to enable these features. The product has its own indexing and storage system, but also can support Hadoop data stores as well.
Version 5.0 was recently launched which improves upon the user interface slightly and also adds in native support of logstash (you can read my other posts on the ELK stack).
Once additional data is found you can add that insight to the query:
In the Screenshot below you can see some of the unique ways data is layered within the visualization timeline.
The product could use a more modern and usable user interface, easier implementation of collection agents and technologies to help get data into the system. These basic functions would help exploit a very impressive analytics engine.
The company has not had particularly good visibility due to being a self-funded technology focused company, they have not invested in marketing or sales efforts to date. This doesn’t mean they haven’t done well, they have some large installs of the technology which are impressive. This has resulted in less growth than competitors have had, but they are looking to change that.
Category: Analytics IT Operations Logfile Monitoring Pick of The Week Tags: