Bit Mountain

Discussions around miscellaneous technologies and projects for the general membership.
Shane Hathaway-p40
New Member
Posts: 10
Joined: Sun Mar 04, 2007 1:31 pm

Bit Mountain

#1

Post by Shane Hathaway-p40 »

I have been working on a project in the family and church history department called Bit Mountain. Bit Mountain is a distributed system designed to reliably store petabytes of archival information. I believe I have discovered and solved the primary issues. Some of the issues I have encountered:

- Standard replication is not reliable enough. Statistically, even if we kept two or three copies of all the data and refreshed the copies yearly, we would still lose a lot of data. The chances are too high that all of the copies would be lost. Bit Mountain employs a better technique called error correction, which yields far greater statistical reliability.

- We can't rely on system administrators to replace media. Our staff is already overtaxed. Therefore, Bit Mountain repairs itself when media is lost. It rebuilds the lost files and puts them on new media. During the rebuild, Bit Mountain transparently recovers files on the fly as they are requested.

- We have to choose the media carefully. Hard drives are easy but consume significant power. DVDs don't hold much data--we would need 200,000 to build a petabyte. Tapes require a big robot. MAID (massive array of idle disks) is my favorite technology of the bunch, but I don't have access to any MAID systems to test. Bit Mountain currently uses spinning disks but it should be possible for it to use any of the other technologies.

I have built several small Bit Mountain clusters now and they are all working well. One of the clusters even weathered the simultaneous loss of 8 of the 20 hard drives without any downtime or loss. The hardware in the two nodes holding the drives became unstable, causing the nodes to crash and eventually shut down, while the cluster kept the data 100% intact and initiated recovery. I don't know of any RAID system that can achieve that level of reliability.

Does massive and highly reliable yet inexpensive distributed data storage interest the people in this group? Speak up!

I want Bit Mountain to gain more exposure and feedback from others who are solving similar problems. As nice as Bit Mountain might be, I don't want our department to use a system that no one else uses. Therefore, I want a community around the software so that archivists can help each other and build a great system for archiving data.

I believe that making Bit Mountain open source is the best way to start that community. Unfortunately, I'm having trouble getting attention from the people who could authorize an open source release. I believe some of them don't see the wisdom in releasing open source software. What should I do?
MichaelCHunsaker
New Member
Posts: 8
Joined: Tue Feb 13, 2007 4:54 pm

What would Google do?

#2

Post by MichaelCHunsaker »

I have no idea what you should do. However, within the past six months I have read several articles about Google and Microsoft buidling huge data centers. They must have the same issues. I wonder how they are solving the problem?
User avatar
WelchTC
Senior Member
Posts: 2085
Joined: Wed Sep 06, 2006 8:51 am
Location: Kaysville, UT, USA
Contact:

#3

Post by WelchTC »

Shane, get authorization from your managers on what technical information you can or cannot share and let's get a dialog around this. We may be able to come up with some interesting ways that the community can contribute and this may help you argue the case of open source. The project sounds very interesting.

Tom
User avatar
mkmurray
Senior Member
Posts: 3266
Joined: Tue Jan 23, 2007 9:56 pm
Location: Utah
Contact:

#4

Post by mkmurray »

This is cool indeed. I think I'm gonna show a few of my friends that have been in the business quite a bit longer and see what they think about the technology. They stradle business and open source pretty well.
russellhltn
Community Administrator
Posts: 34384
Joined: Sat Jan 20, 2007 2:53 pm
Location: U.S.

#5

Post by russellhltn »

I've got a few of questions:

- What are the single source failures that could occur that would take out the system? Keep in mind, that not all devices fail "dead" but may in fact malfunction in such a way as to scramble data.

- Second, this sounds like a fancier array. The big problems with arrays is that they don't protect from things like viruses, accidental erasure or other accidents or malfunctions of the application software. So what is the backup strategy?

- When you say that you are not going to rely on the media being replaced, are you saying that you're not relying on it being replaced to begin rebuilding or that it will never be replaced?

- If you are going to allow for replacement of media, are you going to be able to accommodate different models of drives? Any given drive model only seems to last a few years before it's discontinued. Can the system take a mix of 500GB, 750GB or whatever size drives you use now and expect to be available in the future? What about drive interface? Right now we're transitioning from IDE to SATA. Maybe you're dodging that by going with SCSI, but can SCSI last forever? How are you future proofing the product?

- The last question is really a social engineering question. How do you prevent all your safeguards that create a high reliability environment from fostering complacency such that the system no longer gets the service response that you specify as part of the assurance that it won't fail?
blackrg
Member
Posts: 75
Joined: Mon Feb 12, 2007 1:31 pm
Location: Utah

#6

Post by blackrg »

How much storage is lost to redundancy using your method?
Shane Hathaway-p40
New Member
Posts: 10
Joined: Sun Mar 04, 2007 1:31 pm

#7

Post by Shane Hathaway-p40 »

tomw wrote:Shane, get authorization from your managers on what technical information you can or cannot share and let's get a dialog around this. We may be able to come up with some interesting ways that the community can contribute and this may help you argue the case of open source. The project sounds very interesting.
Thank you. I presented a paper on it last year at the family history workshop at BYU, so quite a bit of technical information has already been released. I am especially interested in finding out what the community would like to do with it.
Shane Hathaway-p40
New Member
Posts: 10
Joined: Sun Mar 04, 2007 1:31 pm

#8

Post by Shane Hathaway-p40 »

mkmurray wrote:This is cool indeed. I think I'm gonna show a few of my friends that have been in the business quite a bit longer and see what they think about the technology. They stradle business and open source pretty well.
That would be helpful, especially if some of your friends work at large companies. While I've witnessed personally how a small company can make open source work in their favor, big corporations seem to have a lot more trouble with the idea of open source.
Shane Hathaway-p40
New Member
Posts: 10
Joined: Sun Mar 04, 2007 1:31 pm

#9

Post by Shane Hathaway-p40 »

RussellHltn wrote:I've got a few of questions:

- What are the single source failures that could occur that would take out the system? Keep in mind, that not all devices fail "dead" but may in fact malfunction in such a way as to scramble data.

- Second, this sounds like a fancier array. The big problems with arrays is that they don't protect from things like viruses, accidental erasure or other accidents or malfunctions of the application software. So what is the backup strategy?

- When you say that you are not going to rely on the media being replaced, are you saying that you're not relying on it being replaced to begin rebuilding or that it will never be replaced?

- If you are going to allow for replacement of media, are you going to be able to accommodate different models of drives? Any given drive model only seems to last a few years before it's discontinued. Can the system take a mix of 500GB, 750GB or whatever size drives you use now and expect to be available in the future? What about drive interface? Right now we're transitioning from IDE to SATA. Maybe you're dodging that by going with SCSI, but can SCSI last forever? How are you future proofing the product?

- The last question is really a social engineering question. How do you prevent all your safeguards that create a high reliability environment from fostering complacency such that the system no longer gets the service response that you specify as part of the assurance that it won't fail?
Good questions.

- The system is not very susceptible to data corruption, missing sectors, or hardware glitches in the storage nodes, since it periodically compares the data with an MD5 hash (any other hash is also possible) and automatically falls back to other storage nodes if the primary node does not reply within a configurable time limit. The only parts administrators should worry about are the network and the central database, although it's possible to build a redundant network and the database is replicated asynchronously.

- The right backup strategy is to build two repositories in separate locations and keep them in sync.

- When Bit Mountain decides a storage device has failed, it stops using it, marking it offline and unhealthy. An administrator can choose to bring that device back into service by marking it online and healthy, but Bit Mountain will never do that on its own.

- Each storage node in Bit Mountain is an ordinary computer with its own storage media and filesystems. Bit Mountain works at a level that's independent of media size, format, or electrical interface. Even if we all switch to flash drives and Infiniband, Bit Mountain may not need any changes.

- If the Bit Mountain health monitor process stops, or the system runs out of places to store recovered files, the probability of losing data rises. Therefore administrators are still not absolved of their responsibility to run Nagios.
Shane Hathaway-p40
New Member
Posts: 10
Joined: Sun Mar 04, 2007 1:31 pm

#10

Post by Shane Hathaway-p40 »

gblack wrote:How much storage is lost to redundancy using your method?
You get to choose! There's a simple pair of numbers you need to set up: the number of data and error correction segments per chain. Then the system will maintain the files at the level you specified. I have a spreadsheet that helps you choose those numbers. I have projected that if you choose 20 data and 10 error correction segments, put the data on hard drives, let Bit Mountain verify the media once a week, and always ensure Bit Mountain has extra space for recovered data, you can be quite confident the data will remain perfect for 1000 years.

Note that simple duplication requires you to buy 1 GB extra for every 1 GB of data, while a 20 + 10 Bit Mountain configuration requires only .5 GB extra for every 1 GB of data. In fact, the Bit Mountain configuration is statistically far more reliable. It's surprising, but I've spent weeks on the math and I believe it's correct.
Post Reply

Return to “Other Member Technologies”