January 8th, 2006 neteng
I originally started this article as a focused write-up on network redundancy. As I began to flesh things out, I realized that while redundancy is an important factor in many networks, that is all it is. Redundancy should not be seen in and of itself as a final goal, but likened more to a step in the path towards the greater aim of network reliability.
As any good systems engineer knows, servers are not the only devices worthy of a 99.99% uptime solution. They may be the treasure chest of valuable data, but that box will do you absolutely no good when you’ve lost all ability to extract information from it in a reasonable manner. Sure, SneakerNet might work for that text file you have sitting on a network share. But that is even assuming you have all means available of gaining physical access to the server (and if you’re Joe Schmoe from Accounts Payable, it’s doubtful). The network is a key agent in supporting a worthwhile technological infrastructure. While the human body can adapt and survive without a limb here or there, it’s in some major trouble without a fully-functional nervous system to transmit important messages within.
Before I dive too far into the details, I’d like to share the following definition for “Network Reliability” that I came up with:
Network reliability ensures a highly-available path that maintains data and policy integrity between communicating endpoints.
With that in mind, I’ve broken down key aspects for network stability into the two parent categories of redundancy and integrity, each housing separate child categories and so on:
Redundancy
I’d like to start with network redundancy. Before we get started, let’s figure out exactly what network redundancy isn’t. Network redundancy is not the backing up of end-station data to tape, the striping of an array of disks, or even a cold standby server loaded with cloned configuration and services. While these are all significant and important in their own right, they do not provide a stable network infrastructure, only stable and trustworthy network services.
One can restrict redundancy of a device to specific parts or to the unit as a whole.
Redundant Parts
- Power - A usual no-brainer is to make sure that there are at least two redundant power supplies for a given unit. What most people seem to forget is to take the extra step of making sure these power supplies are being fed from separate power sources. The advantages of having an extra power supply are completely mitigated if they’re both on a single circuit which fails in the middle of the business day.
- Interfaces - Whether it be dual-NICs on a server or multiple uplinks from access switches to distribution switches, this can also be a lifesaver in the event of equipment/link failure. Most of the time, this type of redundancy can be set up to be automated. And in a lot of cases, the additional links can have the added benefit of being utilized for load-balancing.
- Configuration - Let’s not forget that these pieces of network equipment, which we’re so concerned about keeping functional, would just be fancy paperweights without their software configuration. It’s a very good idea to make sure backups are made of your configs. Whether you choose to do this monthly, weekly or daily, do what works to meet your business needs. You don’t have to get fancy and set up a CVS site containing text differentials (though I’ve done this at work and will talk more about it in a future article). A simple copy and paste or whatever configuration export your devices support should be sufficient. Just make sure that you keep them organized with some sort of time/date stamping.
Redundant Wholes
- Cold Standby - A cold standby is in most cases an non-configured (or configured, but rarely updated) piece of hardware that will usually work best for non-critical network devices, especially “dumb” units such as hubs and small workgroup switches.
- Warm Standby - A warm standby will most often have a mirror-image configuration of the primary device and require a manual switchover in case of failure. Good candidates for warm standby devices are access layer switches and anything else that is urgent to get in place, but not business-critical.
- Hot Standby - Plain and simple, this is a necessary configuration for those devices which are at the heart of your network infrastructure. Hot standbys are automated to take over network duties as quickly as possible, should the primary unit fail. Oftentimes, a “heartbeat” connection is set up between the primary and secondary devices so that a failure is pacified almost immediately.
The standby setups vary in scale from one organization to another. Many small companies have a very modest standby model while some of the larger businesses will actually have a complete replica of their network on warm or hot standby. I’m sure it comes as no surprise that you find a lot of the latter in the financial industry.
Misc
- Multiple ISPs - Folks, listen up. If the Internet is an integral part of your business model whatsoever, then a backup ISP is a must-have. My company uses VPN connectivity over the Internet for the majority of our customers. In the event of a major outage at our ISP, the resulting outcome would be devastating. As in, Oh-Crap-We-Just-Lost-Our-Five-Largest-Clients devastating. Make sure this potentially customer-killing hole is plugged.
- Human Beings - We technical folk sometimes lose site of the fact that “wet systems” (i.e. human beings) are still a necessary component to operational success. Not all companies can afford to keep your clone on staff, so it is very important that you keep good documentation and maintain a system of knowledge transfer in cases of emergency where you might not be available. Try to make sure some of your most mission-critical documentation can be comprehended by the receptionist that might happen to be the only staff on-location during a meltdown.
Integrity
- Policies and Rules - If you’ve ever had to deal with any sort of government or financial audit, you know the importance of having proper network policies and controls in place. Companies have built complete business models out of penetration testing and ethical hacking. You want to make sure that you’re running as ‘clean’ of a network as possible. This is almost always done by applying the “deny-all-but-necessary” approach with your network rules. If you’re lucky enough to get management to buy into the importance of security, you can even toss some extra goodies in there (though these are slowly becoming necessary goodies) such as IPS devices and higher-level, stateful inspection firewalls.
- Encryption & Digital Signatures - This isn’t always necessary, but many organizations have a real need to encrypt and ensure data maintains it’s integrity across the wire. From traditional IPSec-based VPNs to the nifty little product I came across the other day, this is an area in networking where many options are abound.
Taken together, ensuring redundant connectivity and data integrity throughout the wire will provide the necessary infrastructure for what I’d call a reliable network.
I really hope that this little article has been of assistance to all of you on the other side of the screen. I’m always open to improvement, so please, if you disagree with anything written here, have suggestions for things to add, etc… Please let me know. You can leave it in the comments section or email me at neteng@humanmodem.com.
Thank you,
neteng
Buy Me a Beer! Help me keep my sanity as I write more articles.
Posted in HOWTO | 4 Comments »