I am approaching the end of development on my own script-based solution to the problem of geocoding ward addresses and mapping the output in Google Earth. The complexity of the task grew to the point that it really ought to have a real user interface and a proper SQL back-end, but that is more ambitious than I set out to be. I am wondering what others think about how I should proceed.
My system right now consists of two VBScripts written to be run standalone, using flat files as a "database." The process assumes that the user wants to deal with exceptions that fall out of the batch-geocoding process because of quality problems, so they can be corrected or manually geocoded if possible. All this works fine for me as developer/user, but I hesitate to release it into the wild as-is. It really ought to be a bulletproof app.
My process, tested only on my U.S. urban ward, works like this:
1. A critical step is to scrub the addresses in the ward MLS system to meet postal standards, both initially and as an ongoing maintenace task. The importance of this step cannot be overemphasized. Almost all errors have as their root cause missing or malformed address records upstream. It is the clearest example I can think of Garbage In, Garbage Out. Getting the data right is the core of the challenge, and no one will be happy for long with batch solutions that ignore this problem.
2. Obtain regular, authorized extracts from MLS. One file that is a core requirement is PalmFamily.csv. This will support generation of a basic ward map of families plotted as a .kml or .kmz file (assuming WinZip or equivalent is available for the latter.) To support optional views of home-teaching and visiting-teaching assignments and "routes," it is necessary also to obtain a copy of Membership.csv, HomeTeaching.csv and VisitingTeaching.csv. All extracts must occur at the same time. I suggest weekly.
3. Place all these files in a dedicated folder. This folder also will be used by the scripts to write and read database files of its own, at least one of which is a quasi-permanent table in the sense that it should persist between updates.
4. Run the batch geocoding script. It does some basic parsing and scrubbing for a few common format errors, then submits addresses as individual transactions to the Yahoo geocoding API. The results, including three error and warning fields that describe the quality and uncertainty of the geocoding are written to a new database file.
5. The geocoding script also writes and maintains an exceptions file of addresses that are populated but could not be geocoded by Yahoo. The initial entry is appended to the exceptions file with blank lat/lon coordinates. These are addresses that Yahoo fails to geocode even though they are valid and complete. It is likely that the handful of such exceptions can be manually geocoded from other free services, such as Google or Mapquest. The requirement is to store these manually generated lat/lon coordinates so this work does not have to be repeated in subsequent batch updates.
6. There likely will be other errors and warnings captured in the geocoded database for each batch. The quick-and-dirty way to browse the results is in a spreadsheet, using common sort and filter functions. The right way to fix these records, if possible, is to fix the addresses upstream in MLS and get the corrections in the next export.
6(a) Some of these records have no address, have fatally incomplete snippets, or undeliverable addresses -- likely keyed in by some membership clerk sometime, somewhere. They cannot be geocoded without research. I reject records for which Yahoo provides a lat/lon based on anything less than street-address matching. Zip centroids, city centroids, etc., are rejected. (This key distinction is missing from the otherwise cool implementation at batchgeocde.com, which drove me to undertake this project in the first place. Zip-centroid "geocoding" for my purposes is worthless, but that web site passes through these errors without warning.)
6(b) Other records get geocoded by Yahoo but with warnings, in which Yahoo describes the action it took to make a good guess. In my experience, a large majority of these quesses are correct and benign -- fixing misspellings of streets, for example. But a few will be wrong. The user should browse the warnings and, almost always, fix the bad data upstream in MLS.
7. After this review, and after manually geocoding any newly identified lat/lon coordinates in the permanent exceptions file, the user is finally ready to generate the .kml files for Google Earth. My second script does this, and will generate three .kml files:
* A Families.kml file, showing every household in the ward that could be geocoded. The only personal information in this file is a subset of that available for download from the directory on LDS.org -- basically name, address and phone -- so I think it is suitable for dissemination to any ward member. I believe that could even be done securely via LDS.org with no programming.
* A HomeTeaching.kml file, showing each family broken down according to home-teaching assignment, along with the assigned home teachers. The folders that display on the left side of the Google Earth Interface follow the basic hierarchy of HomeTeaching.csv: quorum, then district, then companionship, then teachers and families-taught grouped together. Unassigned familes have their own folder. The icon shapes distinguish between teachers and families taught, and all icons are color-coded: blue for EQ famlies, green for HP families and orange for Unassigned. This tool is primarily designed for quorum presidencies to help maintain their companionships and assignments. There is even native functionality within the Google Earth interface to allow leaders to cut-and-paste families from one companionship to another in what-if fashion. The results can be exported, and the export file ultimately could be imported into Excel for the leaders to use. (TBD by scripting, but later versions of Excel will read the file and flatten it into a table.)
* A VisitingTeaching.kml file, with similar functionality for Relief Society leaders. (Honestly, this function is a work in progress, but I see only a few hours work to complete it.)
Creating these files is the cool part, but it is relatively easy. It's just code. The core complexity of this whole process involves the user's steps between the geocoding script and the other script to produce the final output. It would have been easy to use a platform such as MS Access, and tie everything together neatly. But I am reluctant to do that for sharing in venues such as this because I don't like depending on such a proprietary tool. Most or all units, for example, do not have Access on the clerk's computer.
Also, I have been aiming for a minimal installation footprint. Right now, it does not use anything that does not ship with Windows.
I think if I were to take on the task of creating a real application, I would convert the VBScript to Visual Basic for the front end, and use a public-domain SQL engine on the back end. That would require installing one or more .dlls. It also is more work than I really want to take on immediately, but I may bite the bullet.
If you were me, given what you might think LDS IT might be up to, would you undertake that effort? Or should I release the current scripts with their shortcomings?