Using Web Tools to Map the Members

Use this forum to discuss issues that are not found in any of the other clerk and stake technology specialist forums.
rmrichesjr
Community Moderators
Posts: 2723
Joined: Thu Jan 25, 2007 11:32 am
Location: Dundee, Oregon, USA

Postby rmrichesjr » Tue Jun 17, 2008 7:34 pm

boomerbubba wrote:...
Of the six .csv files available for download, four are designed with column headings that are pretty much self-explanatory. The other two files, PalmFamily.csv and PalmIndividual.csv, don't have column headings. I have always been able to divine the data elements I need. There is probably documentation for these file layouts somewhere, but I have not stumbled across it. The detail data in the two Palm files is not all neatly fielded, but concatenated together for display with annoying embedded Line Feed characters. It can be parsed programatically, but with some effort. The other files are easier to deal with.

Hope this helps.


The line feed characters are the normal line delimiter on some operating systems. There is a tiny program called unix2dos that converts from unix linefeed-based line delimiters to DOS's convention. A Google search for the program's name found a site that claims to have the program. (It also found a web site that claims to do the conversion, but that would require sending the MLS-based data to that website.)

User avatar
mkmurray
Senior Member
Posts: 3241
Joined: Tue Jan 23, 2007 9:56 pm
Location: Utah
Contact:

Postby mkmurray » Tue Jun 17, 2008 8:05 pm

rmrichesjr wrote:The line feed characters are the normal line delimiter on some operating systems. There is a tiny program called unix2dos that converts from unix linefeed-based line delimiters to DOS's convention. A Google search for the program's name found a site that claims to have the program. (It also found a web site that claims to do the conversion, but that would require sending the MLS-based data to that website.)

But yes, Wordpad would also do the trick. ;)

User avatar
aebrown
Community Administrator
Posts: 15123
Joined: Tue Nov 27, 2007 8:48 pm
Location: Sandy, Utah

Postby aebrown » Tue Jun 17, 2008 8:51 pm

rmrichesjr wrote:The line feed characters are the normal line delimiter on some operating systems. There is a tiny program called unix2dos that converts from unix linefeed-based line delimiters to DOS's convention. A Google search for the program's name found a site that claims to have the program. (It also found a web site that claims to do the conversion, but that would require sending the MLS-based data to that website.)


The last three posts don't seem to address the complication of the line feed characters, as they are focused only on the challenges mentioned with the Notes field, but ignore the overall file format.

The file format is a Windows (essentially DOS) text file in comma separated format. That means that each field may be surrounded by optional double quotes (the MLS export chooses to always include the double quotes), and between each field is a comma. Each record is terminated by a CR/LF combination (ASCII 13 followed by ASCII 10). So setting aside the Notes field for now, that means each record is on one line of the text file.

The Notes field introduces a complication in that it is a multi-line field. That doesn't fit so well within a file structure where each record occupies just one line. The solution that Palm came up with (don't blame the MLS developers for this one) is to have simple LF characters (ASCII 10, equivalent to a Unix newline) between the lines of the Notes field. Since the CR/LF pair is the record delimiter, the LF characters don't break the record.

It could be argued that being inside quotes should cause any line break to be treated as part of the field, and not a record terminator, but many applications that process CSV files don't respect that convention. Excel will, but I'm not sure about other applications. So converting the LF to CR/LF pairs may simplify some parsing, but since you have to give special handling to line breaks inside quotes as opposed to those outside quotes, I'm not sure you've actually made your life any easier.

MikeJ-p40
New Member
Posts: 5
Joined: Thu Jun 12, 2008 9:53 pm
Location: Huntington Beach, CA

Postby MikeJ-p40 » Tue Jun 17, 2008 11:09 pm

Alan_Brown wrote:The last three posts don't seem to address the complication of the line feed characters, as they are focused only on the challenges mentioned with the Notes field, but ignore the overall file format.


I thought that Boomerbubba was interested in reading the file so he could figure out its contents. For that Wordpad or any other program that can format the document would suffice. Certainly when it comes to actually using the file then the embedded control characters must be handled properly.

MikeJ-p40
New Member
Posts: 5
Joined: Thu Jun 12, 2008 9:53 pm
Location: Huntington Beach, CA

Postby MikeJ-p40 » Tue Jun 17, 2008 11:22 pm

I got a stripped down version of the MLS file with a single line containing my information. When I run it in mls2GoogleEarth I do not get a lat/lon position for my address. The Status line reads "Finished! (0 addresses found.)". The KML output is

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0">
<Document>
<name>Our Ward</name>
<Style id="blueicon">
<LabelStyle>
<color>ffffffff</color>
<scale>1</scale>
</LabelStyle>
<IconStyle>
<scale>1</scale>
</IconStyle>
</Style>
<Folder>
<name>Locations</name>
<open>1</open>
<!-- Line 1 is blank.-->
</Folder></Document></kml>


Any ideas what is going on?

Mike

RossEvans
Senior Member
Posts: 1346
Joined: Wed Jun 11, 2008 8:52 pm
Location: Austin TX
Contact:

Postby RossEvans » Tue Jun 17, 2008 11:57 pm

MikeJ wrote:I thought that Boomerbubba was interested in reading the file so he could figure out its contents. For that Wordpad or any other program that can format the document would suffice. Certainly when it comes to actually using the file then the embedded control characters must be handled properly.


I do parse the Palm files programatically. It just takes writing some code. (BTW, thanks to Alan for the documention link.) My warning above was that no one should expect to write programs using the contents without such extra effort. Parsing flat files with only discrete column delimiters is much simpler.

If you just want to browse the contents by eye, the embedded LFs are actually benign. Just double-click on PalmFamily.csv and it will open in Excel. The large detail field with the embedded line feeds displays with the intended formatting because Excel honors LFs to break lines within a cell.

RossEvans
Senior Member
Posts: 1346
Joined: Wed Jun 11, 2008 8:52 pm
Location: Austin TX
Contact:

Pondering where to go on geocoding / mapping development

Postby RossEvans » Wed Jun 18, 2008 8:42 pm

I am approaching the end of development on my own script-based solution to the problem of geocoding ward addresses and mapping the output in Google Earth. The complexity of the task grew to the point that it really ought to have a real user interface and a proper SQL back-end, but that is more ambitious than I set out to be. I am wondering what others think about how I should proceed.

My system right now consists of two VBScripts written to be run standalone, using flat files as a "database." The process assumes that the user wants to deal with exceptions that fall out of the batch-geocoding process because of quality problems, so they can be corrected or manually geocoded if possible. All this works fine for me as developer/user, but I hesitate to release it into the wild as-is. It really ought to be a bulletproof app.

My process, tested only on my U.S. urban ward, works like this:

1. A critical step is to scrub the addresses in the ward MLS system to meet postal standards, both initially and as an ongoing maintenace task. The importance of this step cannot be overemphasized. Almost all errors have as their root cause missing or malformed address records upstream. It is the clearest example I can think of Garbage In, Garbage Out. Getting the data right is the core of the challenge, and no one will be happy for long with batch solutions that ignore this problem.

2. Obtain regular, authorized extracts from MLS. One file that is a core requirement is PalmFamily.csv. This will support generation of a basic ward map of families plotted as a .kml or .kmz file (assuming WinZip or equivalent is available for the latter.) To support optional views of home-teaching and visiting-teaching assignments and "routes," it is necessary also to obtain a copy of Membership.csv, HomeTeaching.csv and VisitingTeaching.csv. All extracts must occur at the same time. I suggest weekly.

3. Place all these files in a dedicated folder. This folder also will be used by the scripts to write and read database files of its own, at least one of which is a quasi-permanent table in the sense that it should persist between updates.

4. Run the batch geocoding script. It does some basic parsing and scrubbing for a few common format errors, then submits addresses as individual transactions to the Yahoo geocoding API. The results, including three error and warning fields that describe the quality and uncertainty of the geocoding are written to a new database file.

5. The geocoding script also writes and maintains an exceptions file of addresses that are populated but could not be geocoded by Yahoo. The initial entry is appended to the exceptions file with blank lat/lon coordinates. These are addresses that Yahoo fails to geocode even though they are valid and complete. It is likely that the handful of such exceptions can be manually geocoded from other free services, such as Google or Mapquest. The requirement is to store these manually generated lat/lon coordinates so this work does not have to be repeated in subsequent batch updates.

6. There likely will be other errors and warnings captured in the geocoded database for each batch. The quick-and-dirty way to browse the results is in a spreadsheet, using common sort and filter functions. The right way to fix these records, if possible, is to fix the addresses upstream in MLS and get the corrections in the next export.

6(a) Some of these records have no address, have fatally incomplete snippets, or undeliverable addresses -- likely keyed in by some membership clerk sometime, somewhere. They cannot be geocoded without research. I reject records for which Yahoo provides a lat/lon based on anything less than street-address matching. Zip centroids, city centroids, etc., are rejected. (This key distinction is missing from the otherwise cool implementation at batchgeocde.com, which drove me to undertake this project in the first place. Zip-centroid "geocoding" for my purposes is worthless, but that web site passes through these errors without warning.)

6(b) Other records get geocoded by Yahoo but with warnings, in which Yahoo describes the action it took to make a good guess. In my experience, a large majority of these quesses are correct and benign -- fixing misspellings of streets, for example. But a few will be wrong. The user should browse the warnings and, almost always, fix the bad data upstream in MLS.

7. After this review, and after manually geocoding any newly identified lat/lon coordinates in the permanent exceptions file, the user is finally ready to generate the .kml files for Google Earth. My second script does this, and will generate three .kml files:

* A Families.kml file, showing every household in the ward that could be geocoded. The only personal information in this file is a subset of that available for download from the directory on LDS.org -- basically name, address and phone -- so I think it is suitable for dissemination to any ward member. I believe that could even be done securely via LDS.org with no programming.

* A HomeTeaching.kml file, showing each family broken down according to home-teaching assignment, along with the assigned home teachers. The folders that display on the left side of the Google Earth Interface follow the basic hierarchy of HomeTeaching.csv: quorum, then district, then companionship, then teachers and families-taught grouped together. Unassigned familes have their own folder. The icon shapes distinguish between teachers and families taught, and all icons are color-coded: blue for EQ famlies, green for HP families and orange for Unassigned. This tool is primarily designed for quorum presidencies to help maintain their companionships and assignments. There is even native functionality within the Google Earth interface to allow leaders to cut-and-paste families from one companionship to another in what-if fashion. The results can be exported, and the export file ultimately could be imported into Excel for the leaders to use. (TBD by scripting, but later versions of Excel will read the file and flatten it into a table.)

* A VisitingTeaching.kml file, with similar functionality for Relief Society leaders. (Honestly, this function is a work in progress, but I see only a few hours work to complete it.)

Creating these files is the cool part, but it is relatively easy. It's just code. The core complexity of this whole process involves the user's steps between the geocoding script and the other script to produce the final output. It would have been easy to use a platform such as MS Access, and tie everything together neatly. But I am reluctant to do that for sharing in venues such as this because I don't like depending on such a proprietary tool. Most or all units, for example, do not have Access on the clerk's computer.

Also, I have been aiming for a minimal installation footprint. Right now, it does not use anything that does not ship with Windows.

I think if I were to take on the task of creating a real application, I would convert the VBScript to Visual Basic for the front end, and use a public-domain SQL engine on the back end. That would require installing one or more .dlls. It also is more work than I really want to take on immediately, but I may bite the bullet.

If you were me, given what you might think LDS IT might be up to, would you undertake that effort? Or should I release the current scripts with their shortcomings?

The_Earl
Member
Posts: 278
Joined: Wed Mar 21, 2007 8:12 am

Postby The_Earl » Thu Jun 19, 2008 8:55 am

boomerbubba wrote:
I think if I were to take on the task of creating a real application, I would convert the VBScript to Visual Basic for the front end, and use a public-domain SQL engine on the back end. That would require installing one or more .dlls. It also is more work than I really want to take on immediately, but I may bite the bullet.

If you were me, given what you might think LDS IT might be up to, would you undertake that effort? Or should I release the current scripts with their shortcomings?


Sounds like a good tool.

You are probably OK to release it to this forum, most of the tools here are not any more polished than what you describe. Your tool fills a common need, and would benefit from a strong following this early.

A few suggestions:

You might try ODBC for your database backend. Then you can use whatever engine you want, without worrying about tying it to a specific one. A second choice would be SQL Server Express, as it is free, and probably ties nicely with VB.

It would also be nice if your app did not re-geocode known addresses. Since you have already looked some people / addresses up, it seems a waste to do that again. It also minimizes the times church data is sent to a third party. You probably would need the DB connection above first to be able to track what addresses you already have, so I put them in that order.

I would not hold your breath for CHQ to pick up your tool. Projects from this site do make it into development at CHQ, but your app is more useful at the local level, and would probably take a few months to scale for use church wide. You are better off getting the community involved to make it a well-polished tool.

Thanks
The Earl

RossEvans
Senior Member
Posts: 1346
Joined: Wed Jun 11, 2008 8:52 pm
Location: Austin TX
Contact:

Postby RossEvans » Thu Jun 19, 2008 9:47 am

The Earl wrote:You are probably OK to release it to this forum, most of the tools here are not any more polished than what you describe. Your tool fills a common need, and would benefit from a strong following this early.


As of today I am leaning in that direction. I likely will do so within a week or so.

The Earl wrote:You might try ODBC for your database backend. Then you can use whatever engine you want, without worrying about tying it to a specific one. A second choice would be SQL Server Express, as it is free, and probably ties nicely with VB.


ODBC and its cousin OLEDB would be my choice. The question is what ODBC/OLDEB should connect to. (Actually, even my current VBScripts use ODBC. They connect to Microsoft's native Jet 4.0 engine, which allows me to run SQL queries against .the csv files directly. But I can't write to the files with an INSERT or UPDATE statement. And I have to juggle all those .csv files in file system.)

For a real application I would use another engine. I have considered MS SQL Server Express, but it has a bigger footprint and more complexity than I want, requiring installing it and running it as a service. I think all that client-server stuff at the Windows level is unecessary for such a small app. I may use public-domain SQLite, which is made for embedding and available with ODBC or OLEDB wrappers. See http://www.sqlite.org/

The Earl wrote:I would not hold your breath for CHQ to pick up your tool. Projects from this site do make it into development at CHQ, but your app is more useful at the local level, and would probably take a few months to scale for use church wide. You are better off getting the community involved to make it a well-polished tool.


I have been intrigued by hints in several threads that church IT is working on something to do with geocoding and/or the production of GIS-like output. Ideally, MLS should be the repository of lat/lon coordinates. The current free-form "Geocode" text field in MLS is a rather antiquarian legacy.

So one thing that keeps me from plunging into development of a full-blown app is the prospect that this significant effort would be obsoleted soon.

The other piece of this that ideally should be integrated into MLS would be standardizng the addresses. I realize there are problems supporting this for all countries, but in the U.S. there are well-established USPS standards and licensed tools for validating and standardizing addresses in the postal format. The most efficient way to do that is at the church level, not the ward, and the functionality would be great within MLS.

The_Earl
Member
Posts: 278
Joined: Wed Mar 21, 2007 8:12 am

Postby The_Earl » Thu Jun 19, 2008 10:46 am

boomerbubba wrote:As of today I am leaning in that direction. I likely will do so within a week or so.



ODBC and its cousin OLEDB would be my choice. The question is what ODBC/OLDEB should connect to. (Actually, even my current VBScripts use ODBC. They connect to Microsoft's native Jet 4.0 engine, which allows me to run SQL queries against .the csv files directly. But I can't write to the files with an INSERT or UPDATE statement. And I have to juggle all those .csv files in file system.)

For a real application I would use another engine. I have considered MS SQL Server Express, but it has a bigger footprint and more complexity than I want, requiring installing it and running it as a service. I think all that client-server stuff at the Windows level is unecessary for such a small app. I may use public-domain SQLite, which is made for embedding and available with ODBC or OLEDB wrappers. See http://www.sqlite.org/


Oh, I had assumed you meant something a bit bigger. I had assumed you intended to use an engine that had GUI tools like Postgres or MySQL, etc. SQLite is a good choice.

boomerbubba wrote:I have been intrigued by hints in several threads that church IT is working on something to do with geocoding and/or the production of GIS-like output. Ideally, MLS should be the repository of lat/lon coordinates. The current free-form "Geocode" text field in MLS is a rather antiquarian legacy.

So one thing that keeps me from plunging into development of a full-blown app is the prospect that this significant effort would be obsoleted soon.

The other piece of this that ideally should be integrated into MLS would be standardizng the addresses. I realize there are problems supporting this for all countries, but in the U.S. there are well-established USPS standards and licensed tools for validating and standardizing addresses in the postal format. The most efficient way to do that is at the church level, not the ward, and the functionality would be great within MLS.


The geocode field in MLS is explicity for local use, and does not sound like it will ever contain lat/long. If I were grouping people in MLS, I would explicity put something not lat/long in the geocode field, as lat/long groups are only useful with boundaries, and are not easy to interpret.

I would also not worry about obsolecence. The 'hints' that I have seen do not imply a tool similar to the one you are making, nor are they likely to come out soon.

The Earl


Return to “General Clerk Discussions”

Who is online

Users browsing this forum: No registered users and 1 guest