Some people may not like to hear this, but to be honest, I think the best way to handle the massive duplicates that we are otherwise going to see when I (and each of my many relatives, and anyone else who connects into my line of ancestors) upload a gedcom file into the New FamilySearch, is to NOT allow us to upload our gedcom into the New FamilySearch. It will be a little painful to enter the information from my gedcom one-by-one into the new system, but as I enter it, each individual can be merged into the big tree one at a time, prompting me for possible duplicates. I can also be urged to provide sources for what I am entering, and hints can be given to me as I do things, and explanations can be given to me as to why it is important to merge my person with the information already in the tree when possible, why it is important to provide sources, etc. and I will be less likely to propegate unreliable information that hampers everybody else's research.
This will be cumbersome for the first couple of generations' worth of my tree, but after a couple of generations, I (and most other people) will link into common ancestors that somebody else has entered. Instead of having to merge all of those common ancestors together with everybody else's versions and following many un-merged lines, I can just compare to the one (or very few) versions already in the tree, instead of potentially hundreds of unmerged versions of mostly the same information uploaded by different people who haven't gotten around to going through and combining the individuals from their gedcom with all of the other info in the tree.
I am willing to accept that in order to make the new tree more reliable for us, and managable for people who are new to family history, we need to NOT allow people to upload massive amounts of uncombined information into the system. It is MUCH easier to prevent as many duplicates from the start than to have to go back and try to constantly be combining individuals later, evey time another person decides to upload data that should connect into my line. Automation will never be able to take over the final decision of who really is to be combined or not, and every time somebody else decides to upload a gedcom, there will be another set of duplicates to deal with if the person who uploads doesn't properly clean up after themself.
I think that applies also to the synching of third-party applications that use the API to interface with NFS. An application shouldn't be able to upload large amounts of data into NFS via the API.
If I were a person just getting started with genealogy, and I logged on to NFS and found that there were dozens of versions of each of my ancestors, all with basically the same data, but following different "lines" (because they aren't merged together yet), some having more data than others, and some having more accurate data than others, it would be frustrating and difficult to see what is already known and what isn't.
Please note, this does not mean we shouldn't be able to have our own view of what is correct or not, provide alternate views, etc., or that things shouldn't be able to be combined or uncombined. I am just talking about getting the data in there to begin with. I see no problem with allowing people to download massive amounts of data (other than bandwidth) for synching in their offline applications, but as far as keeping the NFS tree in a usable form, I think we are asking for a mess if we just allow people to upload all of the data they have and count on them taking the time to make sure duplicates get merged in or combined.