WAN sync of large repository

We need to sync 10TB of data from one North American coast to the other.

Our tenative plans are to sneakernet the data and then use some form of sync to catch up the delta.

Aside from bandwidth constraints, we found that rsync quickly craps out with large numbers of files.

What tools have you used to do this?

[335 byte] By [WRWindsora] at [2007-11-26 19:12:45]
# 1

For that amount of data I would definitely sneaker net the original transfer. After that what you might consider doing is rsyncing subsets of the data, perhaps by directory. This also has the added benefit of not having more than one rsync potentially running at the same time and syncing the same data.

bosconeta at 2007-7-9 21:11:34 > top of Java-index,General,Sys Admin Best Practices...
# 2

Depends of the value of "craps out" and "large numbers of files" - I've never had a problem but the biggest transfer I've ever attempted with rsync was less than 1 TB.

There is the strategy of rsyncing chunks further down the directory tree, or maybe the rsync "-W" option will help at the expense of bandwidth of you have lots of little files. You might look at rsync with "top" - it can sit for a long time without producing any output, while in fact it is doing real work deciding what to do,.

I don't know of any other OSS tools other than rsync - a google for "real time data replication" brings up a host of commericla vendors, this is a very active business.

wsandersa at 2007-7-9 21:11:35 > top of Java-index,General,Sys Admin Best Practices...