Page 2 of 3

CSV delimiter changed in other languages?

Posted: Sat Oct 24, 2009 4:18 pm
by RossEvans
Today I heard from a German MLS user that in his CSV export files, the field delimiter is a semicolon rather than a comma.

I looked in our own MLS system settings and export settings, and I see nowhere to change the delimiter. Does anyone know if this difference is hard-coded for certain languages?

Posted: Sat Oct 24, 2009 9:23 pm
by mkmurray
boomerbubba wrote:I looked in our own MLS system settings and export settings, and I see nowhere to change the delimiter. Does anyone know if this difference is hard-coded for certain languages?
Just a few thoughts...

Perhaps commas in some certain field (like names) is more common in that culture (making a different delimiter more sensible)? Or perhaps when MLS sees a comma in a field, it switches the delimiter automatically (this kind of logic would have been hard coded)?

Posted: Sat Oct 24, 2009 9:38 pm
by aebrown
boomerbubba wrote:Today I heard from a German MLS user that in his CSV export files, the field delimiter is a semicolon rather than a comma.

I looked in our own MLS system settings and export settings, and I see nowhere to change the delimiter. Does anyone know if this difference is hard-coded for certain languages?
It is definitely not hard-coded for any language I have tried to this point with the test data; I can run MLS in French, Spanish, Portuguese, or German, and the field delimiter is still a comma. So simply changing the language in the MLS.properties file does not change the field delimiter.

However, there are locale options that are set by the area office. Some of them might not be visible from System Options, and a field delimiter could be among them. Might those options be different if MLS was installed with a different language? Or might there be something different about installing the test data when you are running in a different language? Or could MLS in non-English languages run differently with live data than the test data? Those options would take longer to test than I have right now.

Posted: Sat Oct 24, 2009 9:45 pm
by RossEvans
mkmurray wrote:Perhaps commas in some certain field (like names) is more common in that culture (making a different delimiter more sensible)? Or perhaps when MLS sees a comma in a field, it switches the delimiter automatically (this kind of logic would have been hard coded)?

The behavior reported in German MLS does not affect the commas within fields, such as the Preferred Name field where the internal format remains "LastName, FirstNames". Rather, it affects the delimiter between fields.

What is odd is that the CSV format in my English MLS exports follows a consistent format of always enclosing every field in double quotes, regardless of data type and regardless of whether it includes commas or not. (That is different from the Excel way of doing CSV, which uses quotes only when necessary). So the presence of commas in the data really does not explain the need for semicolons.

What I don't get is where this change is set in MLS. If it is hard-coded by language, which languages use the semicolon? And are there other conventions used by other languages?

This is the sort of thing that should be documented, if not by MLS developers then at least in the Wiki article.

Posted: Sat Oct 24, 2009 10:58 pm
by russellhltn
Alan_Brown wrote:However, there are locale options that are set by the area office. Some of them might not be visible from System Options, and a field delimiter could be among them. Might those options be different if MLS was installed with a different language? Or might there be something different about installing the test data when you are running in a different language? Or could MLS in non-English languages run differently with live data than the test data? Those options would take longer to test than I have right now.
In addition, there could be differences based on the Windows language (I've found that there are differences beyond just the language.) There is also the Regional settings in Windows. And we can't rule out the possibility of a different distribution of MLS in other countries.

boomerbubba wrote:What is odd is that the CSV format in my English MLS exports follows a consistent format of always enclosing every field in double quotes, regardless of data type and regardless of whether it includes commas or not. (That is different from the Excel way of doing CSV, which uses quotes only when necessary).
I work with CSV quite a bit. I find Excel's way to be the "odd" one. But I agree that I can't think of a reason where a semicolon would be necessary or preferred over the comma.

Posted: Sun Oct 25, 2009 8:26 pm
by rmrichesjr
RussellHltn wrote:...
I work with CSV quite a bit. I find Excel's way to be the "odd" one. But I agree that I can't think of a reason where a semicolon would be necessary or preferred over the comma.
If I understand correctly, much of Europe uses the comma to separate the integer and fractional parts of a decimal number, writing "12,5" where the US convention would be to write "12.5". Given that Excel is rather commonly used, might it be European convention to use semicolons? Might CSV mean semiColon Separated Variable in Europe?

Posted: Sun Oct 25, 2009 8:33 pm
by russellhltn
rmrichesjr wrote:If I understand correctly, much of Europe uses the comma to separate the integer and fractional parts of a decimal number, writing "12,5" where the US convention would be to write "12.5". Given that Excel is rather commonly used, might it be European convention to use semicolons?
Good point! Given that some implementations of CSV (like Excel export) would not quote numbers, the use of a comma for a decimal point would be an issue. In that case, I'd suspect the way it's exported depends on the regional settings in Control Panel.

Posted: Sun Oct 25, 2009 8:53 pm
by aebrown
RussellHltn wrote:Good point! Given that some implementations of CSV (like Excel export) would not quote numbers, the use of a comma for a decimal point would be an issue. In that case, I'd suspect the way it's exported depends on the regional settings in Control Panel.
Do we know for certain that MLS exports with semicolons in Germany?

We've had a long discussion with all sorts of speculation, but I cannot get MLS to export CSV with semicolons. I've tried altering the Regional Settings to specify German as the language. When I do this, Excel will indeed start exporting CSV with semicolon delimiters, but MLS does not, even if I also change the MLS language to German.

Is it possible that the original second hand report comes from someone who used Excel or OpenOffice on the exported file, which transformed commas to semicolons, and then examined the file to see the field separator?

Posted: Mon Oct 26, 2009 7:45 am
by RossEvans
Alan_Brown wrote:Do we know for certain that MLS exports with semicolons in Germany? ...

Is it possible that the original second hand report comes from someone who used Excel or OpenOffice on the exported file, which transformed commas to semicolons, and then examined the file to see the field separator?

Thank you for your work. This may have been a red herring.

The original incident was reported in a current thread at the Ward Tools support group. (Ward Tools is English-dependent, and some international users have been trying to edit their CSV files to look like English MLS.) There have been multiple warnings there to users not to use spreadsheets to edit the CSV data. But I see that in an overnight comment there, the user refers to the use of an "Excel macro." I have inquired about this in the other forum, but for now I share your hypothesis that it is Excel that substituted the semicolons for commas.

EDIT: The German MLS user now has confirmed that the MLS export files do use a comma. The semicolons came from Excel. Sorry about the wild goose chase.

Wiki updated to document MLS as of v 3.1.4

Posted: Fri Jan 15, 2010 9:17 am
by RossEvans
I have belatedly updated the wiki documentation for the ward MLS export files as of v 3.1.4. Most significantly, this update captures the very significant changes that were made in the exports beginning with v 3.1.0.

Apologies for being so late with this update. The editors of the wiki will just have to dock my pay. :)