by Jonah Kowall | September 4, 2014 | 5 Comments
Thanks to VMWare for inviting me to attend VMWorld this year, I was looking forward to learning more about the progress on vCenter Operations Manager (especially Log Insight and Hyperic). I was also eager to learn more about NSX and the management packs around NSX which are critical to adoption. My colleagues who cover NSX closer to engineering and architecture get a lot more calls than we do on the management side of things. Similarly when I attended sessions on NSX deep dive, versus NSX management using vCenter Operations Manager there were about 12x the amount of people in the sessions showing you people are learning versus implementing at this stage.
There were other interesting announcements around OpenStack, which I think will fundamentally cause issues around where parent company EMC makes most of their money… selling enterprise class storage hardware. The true question is why not contribute to existing distributions versus adding fragmentation to an already complex project. The containerization announcements were also interesting, and caused many of us to think about the future of virtualization and the hypervisor as we are better able to abstract software and configuration into a more lightweight model.
As I attended sessions, spoke with attendees, and saw the vendors present in the exhibit halls, it struck me that VMware is lodged under the stuff which matters most to IT. Within my inquiry and coverage there is a major shift taking place where IT is trying to move up from the infrastructure components towards the end user experience and application level visibility. The growth and importance in APM and application level visibility is what’s driving interest. VMware seems challenged to address this interest. Lydia’s blog is spot on http://blogs.gartner.com/lydia_leong/2014/08/25/bimodal-it-vmworld-and-the-future-of-vmware/. The history of Zimbra, SpringSource (now Pivotal), and other software acquisitions directed at addressing developers and moving above the infrastructure have resulted in failure and divestiture. VMware can still innovate, but unfortunately today is seems to be occurring below the level of what matters, which are the applications.
Who will adapt and move, and which IT organizations will be largely irrelevant as people work around them.
Category: IT Operations Monitoring Trade Show Tags:
by Jonah Kowall | September 2, 2014 | 3 Comments
HP has been making pretty significant efforts in building a SaaS delivery product. In fact the SaaS delivery was called out in the recent earnings call as a highlight for an otherwise struggling HP Software business. Within the Pronq (I know the name is a bit odd) brand they include several SaaS delivered products with a lightweight sales motion including try and buy with online credit card transactions. These offerings include Fortify on Demand (Security), Agile Manager (Dev), Vertica (Analytics), LoadRunner (Performance Testing), HP Anywhere (Mobile Dev/Distribution), StormRunner (Mobile performance testing), Virtualization Performance Viewer (for VMWare vSphere and Microsoft Hyper-v), and App Pulse (synthetic monitoring and full featured APM), and finally the focus of this posting App Pulse Mobile (Beta). The product is very easy to sign up for and get a trial:
Within the Pronq products HP does share pricing until you get into larger deal sizes in which case you do need to contact them. For AppPulse mobile, the pricing will be released when the product is made generally available. The pricing, according to HP will be competitive and based on the standard monthly active user(MAU) we see most Mobile APM products using. The product works by downloading a wrapper which instruments an iOS or Android applications, which can then be loaded onto the device or via a MDM app store. This is something other vendors also offer. In this screenshot you configure a new mobile app:
I took an apk file for Android and instrumented it with the wrapper below:
After installing the app on my device, and starting to use it a few times, data started flowing in 15 minutes, since I did my testing HP informed me this was reduced to about 5 minutes, and will be 1 minute when the product is released and generally available
Once you drill into the app here is the dashboard. This is after using the app for a while and getting some good data.
I had some poor experience, and this is how the dashboard looks:
This is the drill into why the “Fundex” was dropping (not a fan of the name, but the concept of a performance index makes sense). The idea of the index is user experience focused, similar to what Apdex was intended to do around user experience measurement on the web, HP has taken this concept further by looking at slow user actions, crashes, and device level resource consumption. By simplifying APM to these concepts and focusing on ease of use HP will better appeal to less technical users.
In my case even though the app was crashing, it wasn’t detected. After speaking with HP this was a bug which was fixed in the latest push. HP is releasing code every two weeks to production. HP’s crash detection includes the list of steps the user did before the crash helping developers understand the path of actions needed to replicate the crash.
More details from within the app. HP records and extracts user actions from within the app. This is much closer to what we expect to see with end user experience monitoring within web applications, but applied to native mobile apps.
Overall this is a good product offering for mobile APM, but it could go much deeper on the network analysis, device, carrier, and location based analysis. This is a beta, hence there will be more features added along with having a larger install base once this is released. HP is moving from a traditional software delivery cadence towards a SaaS first delivery method including weekly hotfix releases, monthly content updates, and quarterly major releases. Additional features seen in some other products include mobile app store integration and a deeper overall visibility from the mobile operations perspective. Once again the focus here seems more around MDM versus developer centric Mobile APM.
Category: APM Mobile SaaS Tags:
by Jonah Kowall | August 15, 2014 | 3 Comments
Some changes in the upcoming Q1 2015 delivery of the NPMD Magic Quadrant. Vivek Bhalla will be taking over as the lead author of the research (@vbhalla1) and I’ll be co-authoring along with Colin Fletcher and Gary Spivak. We will be sending vendor surveys out on Monday, if you are a vendor who believes you qualify based on the criteria below and you did not receive a survey on Monday please reach out via email or twitter (firstname.lastname@example.org). We regularly include non-clients in research and this research is no different, we strive to build the most relevant research for our large end user client base.
Thank you, and we look forward to this research deliverable.
NPMD tools allow for network engineers to understand the performance of applications and infrastructure components via network instrumentation. Additionally, these tools provide insight into the quality of the end user’s experience. The goal of NPMD products is not only to monitor the network components to facilitate outage and degradation resolution, but also to identify performance optimization opportunities. This is conducted via diagnostics, analytics and debugging capabilities to complement additional monitoring of today’s complex IT environments.
This market is a fast-growing segment of the larger network management space ($1.9 billion in 2013) and overlaps slightly with aspects of the application performance monitoring space ($2.4 billion in 2013). Gartner estimates the size of the NPMD tools market at $1.1 billion.
Vendors will be required to meet the following criteria to be considered for the 2015 NPMD Magic Quadrant:
- The ability to monitor, diagnose and generate alerts for:
- Network endpoints — Servers, virtual machines, storage systems or anything with an IP address by measuring these components directly in combination with a network perspective.
- Network components — Such as routers, switches and other network devices. This includes SDN and NFV components.
- Network links — Connectivity between network-attached infrastructure.
- The ability to monitor, diagnose and generate alerts for dynamic end-to-end network service delivery as it relates to:
- End-user experience — The capture of data about how end-to-end application availability, latency and quality appear to the end user from a network perspective. This is limited to the network traffic visibility and not within components such as what application performance monitoring is able to accomplish.
- Business service delivery — The speed and overall quality of network service and/or application delivery to the user in support of key business activities, as defined by the operator of the NPMD product. These definitions may overlap as services and applications are recombined into new applications.
- Infrastructure component interactions — The focus on infrastructure components as they interact via the network, as well as the network delivery of services or applications.
- Support for analysis of:
- Real-time performance and behaviors — Essential for troubleshooting in the current state of the environment. Analysis of data must be done within three minutes under normal network loads and conditions.
- Historical performance and behaviors — To help understand what occurred or what is trending over time.
- Predictive behaviors by leveraging IT operations analytics technologies — The ability to distill and create actionable advice from the large dataset collected across the fourth requirement.
- Leverage the following data sources:
- Network-device-generated data, including flow-based data sources inclusive of NetFlow and IPFIX.
- Network device information collected via SNMP.
- Network packet analysis to identify application types and performance characteristics.
- The ability to support the following scalability and performance requirements:
- real-time monitoring of 10 gigabit (10G) Ethernet networks at full line rate via a single instance of the product
- Ingest sampled flow records at a rate of 75,000 flows per second via a single instance of the product
Non-product Related Criteria
- A minimum of 10 NPMD customer references must be included at the time of survey submission.
- Customer references must exclude security-oriented use cases and scenarios.
- Customer references must be located in at least two of the following geographic locations: North America, South America, EMEA, and/or Asia/Pacific/Japan.
- Total NPMD product revenue (including new licenses, updates, maintenance, subscriptions, SaaS, hosting and technical support) must have exceeded $7.5 million in 2013, excluding revenue derived from security-related buying centers.
- The vendor should have at least 75 customers that use its NPMD product actively in a production environment.
- The product, and the specific version, submitted for evaluation must be shipping to end-user clients for production deployment and designated with general availability by October 31st 2014.
The 2015 NPMD Critical Capabilities that will be published subsequent to the 2015 NPMD Magic Quadrant and be a complimentary piece of research.
Your survey submission and demo briefing will also be used for the purposes of writing this document in addition to the Magic Quadrant.
The 2015 NPMD Critical Capabilities will be assessed upon the following criteria:
- End-point, Component and Link Monitoring
- Service Delivery Monitoring
- IT Operations Analytics
In addition to the above criteria, we will be evaluating each vendor’s ability to cross multiple buying centers, as well as its ability to target specific verticals as validated by reference customers.
Category: IT Operations Monitoring NPM NPMD Tags:
by Jonah Kowall | August 14, 2014 | 2 Comments
A long running research project has been underway for many months collecting data from vendors and putting together this research note. Big kudos to my colleague Vivek Bhalla for doing most of the work on this note. It came out great, with lots of valuable insight and vendor analysis.
The network packet broker (NPB) space has been interesting for the last couple years with lots of shifts and market changes happening. We highlight many of these changes in the market guide along with providing write-ups along with strengths and challenges for each vendor profiled. We did this for solutions from Apcon, Arista Networks, cPacket, Cubro, Gigamon, Interface Masters Technologies, IXIA, JDSU (Network Instruments), Netscout, and VSS Monitoring. Gartner clients can access the research here:
07 August 2014 G00263407
Category: Monitoring NPB Tags:
by Jonah Kowall | July 29, 2014 | 5 Comments
I’ve been working on a pretty conceptual research project over the last few months. The research is finally out as of yesterday (7/28/14). The basic premise is that as the environments and technologies continually evolve and become more abstract and complex the monitoring tools need to evolve in the same manner. The main issues are the use of traditional architectures versus big data and streaming architectures. Additionally the ease of deployment and use are the new normal, SaaS is a critical deployment model to facilitate fast time to value.
Additionally we look at tool proliferation issues, and some data behind those problems. The other issue investigated is the general failure of ECA approaches in terms of fixing the complexity of the tools, by simplifying the tools with unified monitoring approaches, combined with ITOA these issues can be more easily handled. On the ITOA front, we share data collected on Unstructured Text Search and Inference (UTSI) or what many call log analytics.
Monitoring tools are beginning to be used for multiple use cases outside of operational visibility, and more of this is investigated in this latest research note. Clients can read more here:
Category: Analytics APM Big Data ECA IT Operations ITOA Logfile Monitoring OLM SaaS Tags: ITOA
by Jonah Kowall | July 29, 2014 | 4 Comments
This was my first year attending the open source focused OSCON conference by O’Reilly. I’m a huge Velocity fan, and get a lot out of that conference, hence I figured I would try another conference. Overall I found this conference far less valuable to me for several reasons. While there was a bit of interesting content, the show lacked focus in general. There were show floor exhibits including do it yourself electronics, non-profit, to commercial vendors. Much of the conference was for recruiting in Portland, where lots of startups trying to keep pace with growth pull talent. Here are some of the better sessions I attended and a little bit about them:
Tutorial Node.js 3 Ways
C. Aaron Cois (Carnegie Mellon University, Software Engineering Institute), Tim Palko (Carnegie Mellon University, Software Engineering Institute)
Since I’ve already done some programming in Node for my Google Glass App, I attended this one more to get a better tutorial than my hack/trial by fire. My programming skills are hackery at best
You can find the content here : http://cacois.github.io/nodejs-three-ways/#/
This was a good primer, but there were a couple people in the room who needed a lot of tech support. Additionally some pre-prep by attendees would have made this much smoother. We did some good WIFI load testing, which showed the network couldn’t handle peak loads.
An ElasticSearch Crash Course – http://www.oscon.com/oscon2014/public/schedule/detail/33571
Andrew Cholakian (Found)
I’ve written a lot about ElasticSearch, the internals of the engine were probably the most useful content I got from OSCON this year. I have a much better understanding of the technology with these fundamentals. Here are some notes:
- Wikipedia is moving to it
- Github code search based on it
- Netflix using it for log data
Indexes can live anywhere, and are split across
Each Index has documents
Every field has an index
Docs are routed via hashing and sharded
Shards are lucene indexes – they are replicated
Deleting and updating indexes are expensive.
Writes are slow
Cannot do transactional operations
Docker – Is it Safe to Run Applications in Linux Containers?
Jerome Petazzoni (Docker Inc.)
This is one thing which is a major issue with putting Docker into production. The lack of control and general security are missing and this presentation was interesting and much needed. Jerome was an excellent presenter and made some very good points. I especially liked the idea of running app instances read only, that avoids most of the security issues.
Tracing and Profiling Java (and Native) Applications in Production – Twitter
Kaushik Srenevasan (Twitter)
Interesting discussion of how Twitter who is a heavy Java and Scala shop handles instrumentation. They run their own OpenJDK JVM distribution with customizations running on CentOS. In summary this is somewhat dated view of instrumentation since commercial BCI and instrumentation on Java has some so far. If you don’t want to pay for something and want to build your own, this is somewhat interesting, but has very limited capabilities in terms of what modern APM can do today. Here are the other notes:
- Java, Scala most popular
- Some C++
- Some Ruby (Kiji), Python
They bundle their own JVMTI agents in the code.
- Low latency garbage collection on dedicated hardware and mesos
- Services are getting larger
- Scala optimizations – functional programming language
- Tools : Contrail, Twitter diagnostics runtime
- Wanted something like dtrace, but they don’t have it on Linux
- Using perf for the linux profiling
Please leave comments here or on twitter @jkowall thanks!
Category: APM DevOps IT Operations Monitoring Trade Show Tags:
by Jonah Kowall | July 22, 2014 | 3 Comments
Splunk has been rising quickly in the ranks of buyers looking to solve complex problems, or those looking to build interesting and new analysis of their data. As they have grown in popularity, and gone public we’ve been given a wealth of new data about the company, operations, and execution beyond hearing from their large customer base. In this research note, we’ve combined analysis of the technology, company, and financials. My colleague Gary Spivak, who’s background is on the financial analysis side led the research, and I contributed some additional analysis of the technology, company, and other elements. The note has only been out for a few days now, but it’s gotten a lot of response from our client base. For those of you who are clients, you can find the document here :
Vendor Insight: Splunk, Separating Hype From Reality – http://www.gartner.com/document/2802724
Category: Analytics APM Big Data IT Operations Logfile Mobile Monitoring Tags:
by Jonah Kowall | July 22, 2014 | Submit a Comment
Just wanted to provide a heads up that we’ve published updated Hype Cycles, there are more publishing now as well. The two which just hit the wire which I worked on.
Hype Cycle for Networking and Communications, 2014 – http://www.gartner.com/document/2804820
- I worked with Will Cappelli on the Application Performance Monitoring profile in this Hype cycle, which has some updates around APM.
- Along with Vivek Bhalla, Colin Fletcher we wrote up a new profile for Network Performance Monitoring and Diagnostics Tools which is new for 2014.
- Vivek lead our efforts around Network Configuration and Change Management (NCCM) tools. This profile wa supdated, and expect some great new research from Vivek on this topic.
- Vivek also led our efforts along with support from Colin and myself on Network Fault Monitoring Tools, this profile has minor updates for 2014.
Hype Cycle for IT Operations Management, 2014 – http://www.gartner.com/document/2804821
- Will Cappelli led efforts on IT Operations Analytics (ITOA) thois year, along with support from Colin and myself. There were minor updates in this profile for 2014.
- We also included the same profiles from the hype cycle above!
Category: Analytics APM Hype Cycle IT Operations Logfile Monitoring NPM NPMD SaaS Tags:
by Jonah Kowall | July 10, 2014 | 2 Comments
This week we are highlighting a new offering from Aternity which began shipping recently. Aternity, is headquartered outside of Boston, MA but along with many other APM companies most of the R&D takes place in Israel. Aternity has been an innovator in desktop end user experience monitoring. The solution while technically differentiated caters towards large enterprise implementations, which had prevented them from moving away from these enterprise installs. While most of our applications today have been moving to being purely web based applications causing increased importance in modern end to end APM solutions and RUM solutions there are still and will remain many critical applications on the desktop. Today’s APM tools do a poor job or otherwise provide a high level perspective (leveraging the network) of handling these non-web applications.
Aternity’s Workforce APM product, had been based on innovative and unique technology which allows for detailed user and workflow capture of any application running on a Windows desktop endpoint. This is not a solution which requires professional services or specialized programming as some of the other entrants in the market do. I have used the tool, it’s pretty easy to learn, but the programming is done with the studio product which needs work including a modern user interface. They have recently launched an improved studio (see video here : https://www.youtube.com/watch?v=hLphXVMCMGo&feature=youtu.be) which helps some of these issues, but it’s still not as clean as alternate solutions when doing custom collection. The desktop capture agent is a small program running on the end point (Windows only, but it can be running on physical or virtualized hardware such as HVD/VDI implementations). The data is fed into a relational database and Tableau is used on top of this data to provide reporting, dashboarding, and most of the user interface.
Moving on to the new offering. Aternity mAPM is a mobile APM product, this product which allows for native application monitoring on Android and iOS. Implementation is done by both post compile wrapping of native mobile applications, or the compilation of the instrumentation into the native mobile application. Unlike today’s Workforce APM implementations, which are mostly deployed as traditional on premises software (although Aternity is seeing more customers opt for SaaS delivery of the enterprise solution) the Mobile APM offering can be deployed using Aternity’s SaaS services or via the traditional on premises deployment.
Here are the high level screens of the free Mobile APM offering, this is targeted at developers.
The product can be fed with simulated data or with actual data, in this case here is simulated data in my portal. The GUI is very usable, there is no scrolling and everything is drillable and filterable:
Here is the crash data where you can download the crash file for debugging.
Some interesting data usage reporting:
I’ve also used the built in support features, and can report Aternity is responsive and helpful even with the free accounts. As you can see a pretty comprehensive offering on the mobile side, now the question remains will Aternity be able to penetrate mobile development organizations or will they continue to sell strictly to the IT Operations buyers. The combined mobile and desktop end user experience monitoring is an interesting concept, but few organizations have the maturity to take advantage of both of these offerings due to fragmentation in most organizations.
I’m pretty tied up reading the thousands of pages and analyzing data for the upcoming APM Magic Quadrant, but I’ll find time next week to write up SOASTAs new mobile offering. On deck after that post will probably be SpeedCurve in early August. Thanks for reading, please leave feedback here or on twitter @jkowall.
Category: APM Mobile Pick of The Week SaaS Tags:
by Jonah Kowall | June 30, 2014 | 3 Comments
Always one of the more enjoyable conferences for me to attend, I don’t get worked as hard as Gartner conferences which are also really enjoyable, but I spend time doing the educating versus listening to other smart people. Velocity is a practitioner focused conference and is very geeky (in a good way for those of us who are pretty deep technologists). I’ll highlight some of the great sessions I attended and other technologies I discovered.
The conference is put in my a competitor of course, since we do our own events, but they had over 2,400 registered attendees and over 100 sponsors. There seems to be growth here, and the conference is always larger. Here are some session bullets I found interesting. You’ll notice a pretty wide spread from performance of the front-end, application middleware, and backbends.
Webpagetest deep dive – http://twitter.com/PatMeenan – http://cdn.oreillystatic.com/en/assets/1/event/113/WebPagetest%20Power%20Users%20Presentation.pdf
This is a great open source tool for measuring and diagnosing front-end performance. I’ve used the tool, but had been mostly ignoring it since it wasn’t evolving too much. That was quite a mistake since it’s evolved considerably since I’d last really used it.
- Good to dig into the new features in the advanced settings tab
- Run more than one test when measuring, always
- Very cool advanced visual comparison
- Filmstrip view has been improved
- Can do mobile runs, which show it in a mobile browser (very cool)
- Browser CPU usage stats can be overlaid on waterfall
- Can export tcpdump (use in wireshark or cloudshark)
Docker – https://twitter.com/kartar
Content was good for those who hadn’t used docker. I’ve done some basic work on it, and find it interesting, but also quite basic in nature. Some of the discussion hit on issues around security, support for other containers, and overall limitations in this immature, but evolving technology.
- The room was packed.
- Dockerfile instructions (kind of like a init.d script), I hadn’t used these before, but they are critical when using docker at scale.
RUM Comparison and Use Cases – https://twitter.com/bbrewer https://twitter.com/bluesmoon
The team at SOASTA presented a non-vendor biased view of RUM. While I found the landscape they laid out basic, and partially incomplete, but still a valiant effort by the team there. The key takeaway is more users are trying to tie business metrics to RUM data, for example e-commerce companies tying and analyzing revenue to users and performance.
Google – Jeffrey Dean (http://research.google.com/pubs/jeff.html)
Interesting discussion by Google’s Jeffrey Dean, the most interesting part I found was his analysis of data replication to extra nodes to reduce latency, and of course the multiple-write technologies many use to deal with that replication closer to the source of the data
Keynote systems – https://twitter.com/keynotesystems
Ben investigated what page load times look like, some of the interesting data he presented was what fast was varied by country and other demographic data. He also used the video capture features of webpagetest.
Speedcurve – https://twitter.com/MarkZeman – Blog and Video of the Keynote – http://speedcurve.com/blog/velocity-responsive-in-the-wild/
This was one company I hadn’t heard of (well more like a 1 man show), interesting company which does a nice frontend and comparative analysis using a webpagetest backend. Some notes:
- Sits on top of webpage test
- Competitive benchmarking, runs once a day, multiple runs
- Complements RUM
- Shows filmstrips
- Formats the data much better
- Helps find savings, etc
- Can get to webpagetest views as well
- Showed some interesting research on visualizing data
Understanding Slowness – http://www.twitter.com/postwait : https://speakerdeck.com/postwait/understanding-slowness
Always a highlight of Velocity for me, Theo is a unique and extremely bright individual. He always brings good analysis and practical content, he’s an ops guy through and through. There is no marketing or other fluff you often see with content at conferences. Some high level notes:
- Document your architectures
- Have a plan
- Use redundant vendors, don’t put your eggs in one basket (easier said than done, but for some things a good idea)
- Measure latency (performance
- Quantiles over histograms
- Observation – takes state, watches
- Dtruce, truss, tcpdump, snoop, sar, iostat, etc
- Synthesis – Run a test to enable diagnostics (replicate an issue)
- Manipulation – test hypothesis
Some Simple Math to get Some Signal out of Your Ops Data Noise – https://twitter.com/tboubez – http://www.slideshare.net/tboubez/simple-math-for-anomaly-detection-toufic-boubez-metafor-software-velocity-santa-clara-20140625
Not sure I’d call this simple math at all, but here is a very new company we awarded a Cool Vendor this year for APM and ITOA who focuses on ITOA use cases with their solution. They have a lot of growing up to do as a company, but they have some compelling analytics technologies. Mr Boubez applies and brings the readers through a journey of math, what we’ve tried (which doesn’t work too well) and some techniques which do work much better. Clearly worth a look.
- Gaussians don’t work with data center data
- Use histograms (even though Theo says they may not be the best visual analysis tool)
- Kolmogorov-Smirnov test allows for better data
- Handles periodicity in the data
- Box Plots / Tukey
- Doesn’t rely on mean and stddev
- IQR moving windows
Sitespeed.io – https://twitter.com/soulislove
Early phase tool for running rules against frontend optimization, which is a cool idea. I’m going to wait for lab time until version 3 written in node.js comes out in 3 weeks
Category: APM Monitoring Trade Show Tags: