Are you saying that it will only fail once in 1000 years or the risk of failure is very low even over the course of 1000 years?
RussellHltn wrote:Another question - are you keeping stats of the equipment that fails? I assume that you're expecting that failed components are changed out within a certain time frame, and that drives will fail at a certain rate.
I've seen how often failed components need to be replaced in a big SAN. Emergencies seem to happen often. But Bit Mountain isn't like that; it doesn't care if you leave failed components in the system. What's important is always having spare space. As long as Bit Mountain has enough space to recover, you can postpone drive replacement as long as you want. Even lack of recovery space is not really an emergency, though it's an integrity risk.
RussellHltn wrote:However, drive failure isn't random. As the collection of drives approach end of life, the number of failures in a unit of time will go up. Possibly quite dramatically. (I'm not sure if anyone gives the standard deviation to the MTBF.) How will you monitor the situation to warn that the statical probability of failures is reaching an unacceptable risk and that either parts must be changed more quickly or that aging components need to be proactively changed?
As far as I'm concerned, the MTBF is a educated guess based on accelerated testing and reality might well be different. The question is how much are you betting the farm on the manufacture supplied MTBF number?
I don't yet have a system for tracking expected media lifetime, since it seems to be quite unpredictable. For example, I have an 11 year old desktop hard drive that has gone through periods of 24/7 operation yet still works great. So currently I'd rather base the calculations on pessimistic estimates, then just let the storage devices fail over time. If the failure rate turns out higher than expected, then we'll put the actual failure rate into the spreadsheet and it will tell us whether we need to increase the number of error correction segments to compensate. This is certainly an area where we need more experience to make a good judgment.