The biggest problem with nFS

russellhltn · #11

huffkw wrote:I think most would agree that today’s system is optimized for removing duplicates from old research, a very large task. ... I expect members doing new research will often choose not to use the current nFS system as a repository for their new work product. They might fear being caught in a thicket of unexpected work before they can proceed.

One of the main purposes of this new system is to greatly reduce the duplicate temple work being done. Members will be required to enter names into this new system and do a search for duplicates. So unless we all stop sending names to the temple, this database will get new information. It may not be on the cutting edge of individual research, but it will move forward.

huffkw wrote:Is anyone willing to consider in detail the strengths and weaknesses of alternate methods, or are we locked into one method for the next few years?

I've got no clue as to what your suggestion was. We each do our own genealogy. We each contribute our part and tie into others as soon as we find common links and then work together to add more. Keep in mind the current nFS is no where near finished. What we have now is only Tree Search. Record Search (which contains documents such as census) has not yet been integrated. They are thinking about ways to tie two together. Personally, I think it's quite likely they we will be able to combine a person with the records that document that person making source citation easy and painless.

I don't know as any discussion on methods here would have any effect.

huffkw · #12

RussellHltn wrote:I've got no clue as to what your suggestion was. .

I believe all our goals as to what we went to accomplish are pretty much the same. It is only the possible techniques and efficiency levels we might want to ponder, since this system is plowing a lot of new ground. It would be quite amazing if the development team had considered all possibilities and then got it all right the first try.

Since Jan 2007 I have made a few posts to these forums which talk about my suggestion in general terms. That suggestion involves some changes in philosophy and goals, and some related changes in method. It is not the sort of thing that can be summarized in a sentence or two. Even a complex diagram would not explain all the reasoning. I guess I will have to check back in a year or two and see if everybody is still happy with the system, or if some new ideas could be explored.

russellhltn · #13

I don't mean to chase you off, if that's how it came across. It's just that no one has asked our opinion on how things should be and others have been called to do the work. Those who are doing the work don't necessarily hang out here. Further more, I don't know as they have adequately shared their direction and rational in such a way that we can intelligently comment on it. At this point we're only beginning to see the results of their work which is my no means complete and they are in the midst of rolling it out. I'm leery about trying to engage in high-level thinking to devise a best system of how the church should do it, because unless we've been invited, it seems like it could easily become ark-steadying. Especially at this stage of the process.

There are indeed issue with the nFS. And I'm sure much feedback has been given. I for one am willing to see what they come up with. Personally I'm hoping that it becomes a source-based system (unlike the conclusion-based systems we have now) and I can see a path on how that might happen.

huffkw · #14

Ark-steadying, and fools rushing in.

I will try to describe some important points of my opinion and suggestion, although it won’t be easy for either writer or reader. I am working on a longer version I could email sometime if anyone wants a little more depth. I assume those people working on the system will not be happy to hear more “constructive criticism,” they probably get plenty, but millions of people will be affected by their work product, and I hope they will be able to consider outside opinions.

My opinion is simply that we need to start over in our approach to handling centrally held genealogical data. Our past thinking and practices and data stores are constraining us too much.

I believe the hopes we place in the nFS system are more than it can deliver, and a separate, similar system will be needed to fulfill the rest of those hopes. It is useful to tidy up the past temple work in the nFS, but that might best remain a low-priority “background” operation, while also scheduling new temple work. Trying to make that system the single, all-purpose answer to all future genealogy data activity, for members and non-members, is asking too much. There are too many complicating factors and too many data problems.

The best way to avoid duplicates being sent to the temple is not to try to head them off at the pass just as they are figuratively going in the temple door, as in past times, when only primitive means of cooperation among researchers was possible. With the Internet available, a far superior way to stop duplicates headed for the temple, before they even start, is to help members (and non-members) avoid any wasted or redundant research (all that work which happens long before any names are sent to a temple). Stopping that redundant research can only happen either 1) by a fully finished and trusted nFS (a very long way off, perhaps nearly impossible task) or 2) by a separate database that is not encumbered by 1.5 billion duplicates, and is designed from the beginning to only contain (or show) the best data, even if that is only a few percent of the total submitted.

I believe the data in nFS is not the best data available to act as a base for all future research activity. Data submitted for temple work has often originally been in full family group sheet form, but was then broken up for the ordinances to be done separately. Much genealogical data was lost in the process. Rather than try to put all those fragments back together and cull or merge the duplicates, an extremely difficult task to do correctly, it would give a superior quality and far faster result to just start again with the original full family group sheet version, which might have been enhanced since first submitted. Resubmitting that data and further enhancing it would be a far better use of member time than trying to correct everyone else’s past errors.

As it is, the only good way I can think of to get a clean nFS database is to compare every item of it to those prior, external, complete family group sheets. Guessing and merging names on the fly without that reference will almost certainly add and continue more errors. And if we have that superior reference in hand, it then seems pointless to painstakingly go through and correct and compress the nFS. It would be far easier and more accurate to just resubmit the whole thing. Someone will say that we have probably then just reintroduced another whole pile of duplicates. But that is not true if the database has a “magical” part that shows only what is likely to be the best data, mostly determined, Google-like, by which data has the most relationship links among relatives, usually found in a descendent structure. The database system changes no data and merges no data. It merely highlights the data that is most likely to be well-researched, well-documented, and complete, and therefore most likely to be accurate.

As it is, even after the estimated 125 million hours of work is done on the nFS duplicate record removal project, it may still be the second best version of the data. Granted, if there is some way to get a complete, consolidated and trusted name structure, from then on it could be used to check to see if research and temple work has been done for specific people. But even then it will likely have the limitation that it favors current church members and their ancestors. Others will have a more difficult time using it because they will typically have no common place to plug their family data into it.

The database system I suggest would also clarify the personal or family responsibility for data accuracy for certain sets of names in a way that does not appear to be addressed in the nFS. It appears that the old Ancestral File problem of potentially having multiple people modifying the same data, often alternating from one version to another on update cycles, without the various groups knowing about it and no one coordinating it, could happen in the nFS system. This direct communal sharing of update access to common ancestors is likely to lead to a number of problems that an indirect method would avoid. Everyone who wished to have their say, could do so, but no one could modify anyone else’s data.

As some final, broader thoughts, I see the estimated 125+ million hours of member labor, if used correctly in other forms of genealogy work, as the equivalent of four years of all our full time missionaries’ work, meaning 1 million new members are in the balance. That labor is by no means free, or only available to do one kind of church work. In that same vein, getting up to 4 million new non-member genealogists involved with an exciting new Church system ought to be one of the top priority benefits of all this new software development work.

russellhltn · #15

huffkw wrote:Rather than try to put all those fragments back together and cull or merge the duplicates, an extremely difficult task to do correctly, it would give a superior quality and far faster result to just start again with the original full family group sheet version, which might have been enhanced since first submitted.

nFS has more then just the temple work. AF and PRF is in there. I'm not sure about those original group sheets, but they might well be in there as well. Also, it's envisioned that members will use their own groups sheets to help put the data together rather then just working with the IGI data.

garysturn · #16

huffkw, I think it would be good time spent for you to learn about newFamilySearch. Many of the points you are bringing up as weaknesses of newFamiySearch are actually things that newFamilySearch has already addressed and incorporated into the new system. If you would learn about the new system you would see that they are already doing most of the things that you are recommending they need to do.

newFamilySearch combines all of the Legacy Databases, including the original Family Group Records submitted in earlier programs, it includes Ancestral File, Pedigree Resource File, LDS Membership Records, all the Temple Records, the IGI, and many of the older submitted databases. The system creates a Folder for each individual then organizes them into pedigrees. The system has already combined many duplicates into these folders. Duplicates are combined into the folders (not merged into one record) and become a collection of documments about each individual. There are still more duplicate records to be combined and those records need to be combined by each of us for our own families. The time it takes to combine duplicates depends on several factors. Most duplicates only take a few seconds to combine and in some screens multiple duplicates can be done at once in less than a minute. I have assisted patrons that have done 4 or 5 generations of combining in less than an hour. Some people will find that there are some individuals with 20-30 duplicates and others with only 1 or 2 or no duplicates. Most of the duplicates are already combined into families and can be combined in groups. Duplicate Children, Spouces, Mothers, and Fathers can be combined in groups. If there are four duplicate children in the same family they can all be combined together at once.

All the sources and notes are in newFamilySearch from the original submissions which were not included in the Ancestral File or were not available online in the Pedigree Resource File. You can go to the folder of an individual and find all the sources and notes submitted from each of the records in the folder. Future upgrades will allow images of source documents like Birth Certificates, Death Certificates, Census records etc. to be added to these individual folders. You can already try out beta versions of some of these features at FamilySearchLabs. Families are responsible for cleaning up the data and new tools like these will make that task much easier. No one can change someone elses submission but they can correlate with each other to get the information corrected. You can dispute incorrect information and add notes to someones submission but only they can change it. You can also uncombine records which are combined into the wrong folders.

The system is very effective in preventing future duplication of research and Temple work. Once every Temple district is using newFamilySearch it will be the only way to clear names for the Temple. newFamilySearch searches for duplicate ordinaces before clearing any name even if all the duplicates have not been combined in the folders prior to clearing a name for the Temple. The development of this new system is being done by inspiration and under the direction of Prophets.

Gary Turner

huffkw · #17

It sounds like you have the scoop on the internals, and that is very helpful to me. I attended the LDSTech meetings a year ago, and the BYU genealogy conference last fall, and got some insights in those meetings, including one very general diagram, but I have not seen any attempt to make accessible a document that goes into the internals and their intended consequences. So it has not been that easy to learn the details of the system. And Utah County will be among the last to learn about it first hand. I prefer not to use the trial-and-error method to figure what a system can do and is intended to do. I would rather read the theory and the strategy. Then it can be useful to see if the prototype actually does it.

Maybe, as a past developer on large-scale computer systems, I am one of the few people outside the Church offices who cares about such things. I have seen some really big systems flop ($200 million, $40 million, $30 million, etc.) or end up a shadow of their original intended scope. (The $1 billion system made it). It would be nice if the Church were immune from negative outcomes, but I am a hopeful skeptic on that point.

I guess the real test will come when a large number of regular members, some fairly tech-savvy, are expected to pitch in and use the system and do the planned work. If they are excited about what the system can do for them, they will use it extensively. If they are less excited, they will use it less. I am trying to anticipate what features would lead people, in and out of the Church, to have the highest levels of enthusiasm for the system.

#18

huffkw wrote:...
I guess the real test will come when a large number of regular members, some fairly tech-savvy, are expected to pitch in and use the system and do the planned work. If they are excited about what the system can do for them, they will use it extensively. If they are less excited, they will use it less. I am trying to anticipate what features would lead people, in and out of the Church, to have the highest levels of enthusiasm for the system.

If it helps you feel better about it, I'm told the beta test in early 2007 had about 3000 users. I suspect I was not the only one who considers himself to be tech-savvy. I was and am WILDLY excited about nFS. I have described it as being beyond marvelous.

huffkw · #19

I guess it is all a matter of viewpoint. I don’t believe I am making a snap judgment here. I have spent about 5,000 hours studying the problem and experimenting with solutions over the past 20 years. What we have in progress now certainly promises to be better than anything that went before. However, I have been looking for the best possible solution which is also feasible. I see some clever work in the nFS system, but many features I would like to see in a central system appear to be inconsistent with what has been done already. But again we get back to value judgments that differ as to what is “best.” Two of the extra features I would like to see are 1) for data suppliers, storing data within protected accounts for an individual or a family, making responsibilities for quality and completeness clear (something difficult to do in a communal database), 2) and, for searchers, highlighting the “apparently best-researched” data, with the option to ignore the rest.

But this is all academic for the time being. It will apparently be a long time before anyone is looking for more ideas to implement.

#20

The nFS API should give you or anyone else the ability to implement much/most/all of what you're describing here.

Are you acquainted with the way nFS already keeps track of who supplied which data? At least in the early 2007 beta, from each piece of information, you could see who submitted it, including contact information if the submitter chose to make contact information accessible. For each user, it gives priority to information submitted by that user. If an application wants to give priority to a group of "trusted" other contributors, that should be feasible to do.

For highlighting data based on a "best-researched" or any other criteria, all the application has to do is download the data from nFS and apply the application's own highlighting algorithms.

Does that help?