Big Data is significantly changing the way organizations gather and use business intelligence. The ability to gain insight significantly improves productivity, innovation and competitiveness. But deploying a Big Data implementation is not without challenges. Big Data initiatives result in big, flat, horizontally scalable databases deployed on clusters that are capable of scaling to thousands of servers. The network connects these servers and how that network is designed can help or hinder the launch and growth of Big Data initiatives. Here are four questions to ask yourself and your vendor to make sure your Big Data projects stay on track.
1. How will I ensure data reliability?
Two ways: There’s the Hadoop Distributed File System (HDFS) that places data on one data node in a rack and then replicates another two copies on different data nodes in different racks—but replicating that data in the event of failure can take hours. And If the network doesn’t have enough bandwidth, there’s always the risk that the data node will lose connectivity with the name node and HDFS will become unreliable. An alternative option to consider is the use of a fabric-based system. A 10Gbps solution can replicate 3TB disk within minutes while supplying plenty of bandwidth along with a consistent connection between the name and data nodes.
2. How can I be sure performance will be good enough?
Traditional multi-tier networks are inherently inefficient when it comes to supporting the performance of multi-rack Big Data applications. The latency of a typical top-of-rack (TOR) switch may only be one microsecond, but the latency of distribution and core switches is significantly higher. In a true fabric system there are no intermediate switches, so every server is just one hop away, decreasing latency significantly. Eliminating intermediate switches reduces oversubscription substantially. This means less traffic needs to be buffered, which increases data throughput.
3. How can I make sure my Big Data project will scale?
When it comes to Big Data initiatives, scalability is key. However, the larger they get, the more the restrictions of traditional networks introduce latency and oversubscription. There is also another consideration—the collection of data. Big Data initiatives have to rapidly collect structured and unstructured data in real-time. As a result, the network must support direct, optimal access to storage networks, which may not be possible with some network designs. Choosing a fabric system that can scale and can be optimized will ensure that as requirements grow, so does your peace of mind.
4. How can I manage Big Data more easily?
Traditional multi-tier networks are inherently complex. As the number of servers and racks increase, so does the number of intermediate switches. Consequently, both network management and trouble-shooting become increasingly complex. A true fabric, however, acts as a single, converged Ethernet switch and can be managed like one. It is also based on a single, sophisticated operating system. As a result, provisioning, management and trouble-shooting are greatly simplified.
Ask the right questions
As Big Data initiatives come to fruition there is no doubt they will deliver the insights needed to drive new businesses and new ways of doing business. To ensure networks don’t stand in the way, organizations need to make sure they have asked the right questions about their network design.
Tags: Applications, Security, Storage Networking