There is a ton interest in Big Data and it is currently the #2 most searched keyword on gartner.com. We are starting to get a trickle of Big Data related questions within the networking team and there are two high-order questions you must answer regarding Big Data and the network…
#1 – Where is your Big Data?
#2 – Have you built an Ethernet Fabric?
Where is your Big Data? If you’re doing Big Data on-prem, you can basically move to question #2. However, if you’re doing it off-prem and/or in the public Cloud, keep reading. Perhaps your existing connectivity to the Internet is “good enough”…but it may not be. The key activity here is to optimize WAN connectivity to the Big Data location, and we have research that identifies best practices in optimizing WANs for both SaaS and IaaS.
Have you built a Fabric? This question assumes you’re doing Big Data on-prem in your corporate data center. And if so, it is important to note that Big Data behaves unlike many current data center workloads. In highly virtualized data centers, you have multiple apps sitting atop 1 physical server that is typically SAN connected and obviously includes a hypervisor. Conversely, a single Hadoop workload/instance is often spread across multiple physical servers that are not SAN-connected and usually don’t have a hypervisor. In addition, Big Data workloads typically perform best on high-bandwidth and relatively low-latency networks, supporting a high percent of server-to-server flows. So this is a perfect fit for an Ethernet Fabric, which typically provides a 1- or 2- tier mesh or partial mesh network topology optimized for deterministic and lower latency. However, we’ve seen limited adoption of Ethernet fabrics in mainstream enterprise to date, so if you’re sitting on a 3-tier traditional network, there is likely work to be done.
While this is a quick and high-level summary of Big Data networking issues, we do have published research that dives into this topic much deeper (co-authored by @nheudecker): Is Your Network Ready for Big Data?
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.