Page 1 of 1

Triple combo in Spanish and in text/ascii?

Posted: Mon Apr 25, 2016 8:36 am
by christensenrb
Hi, I would like to use a text version of the triple combination to compile a vocabulary list in Spanish. I have been able to find a pdf in Spanish, but converting it to something that I can sort and remove duplicates is frustrating. The pdf to rtf conversion is too good and leaves graphics and lines in the text. Converting that to ASCII is then another problem. Any one have any ideas?

rc

Re: Triple combo in Spanish and in text/ascii?

Posted: Tue Apr 26, 2016 11:01 am
by sbradshaw
You can open the PDF and do select-all, copy (depending on how fast your computer is this might take a minute or two to select all and a minute or two to copy), then paste into a plain text document (not RTF). Pasting as plain text will remove all of the graphics, lines, and formatting.

After pasting, if the accents are messed up you can do some find-alls and replaces:

Replace "a ́" with "á"
Replace "A ́" with "Á"
Replace "e ́" with "é"
Replace "E ́" with "É"
Replace " ́i" with "í"
Replace " ́I" with "Í"
Replace "ı ́" with "í"
Replace "I ́" with "Í"
Replace "o ́" with "ó"
Replace "O ́" with "Ó"
Replace "u ́" with "ú"
Replace "U ́" with "Ú"
Replace "n ̃" with "ñ"
Replace "n ̃" with "ñ"
Replace " ́a" with "á"
Replace " ́A" with "Á"
Replace " ́e" with "é"
Replace " ́E" with "É"
Replace " ́o" with "ó"
Replace " ́O" with "Ó"
Replace " ́u" with "ú"
Replace " ́U" with "Ú"
Replace "- " with "" (take care of hyphens that split words between lines)
Replace " " with " " (remove double spaces)

You might need to manually clean up page numbers and footnotes, as well as the footnote letters in the verses, but this might help you get started.

Re: Triple combo in Spanish and in text/ascii?

Posted: Tue Apr 26, 2016 11:47 am
by sbradshaw
Even better, you can start with the Braille text file available on this page:
https://www.lds.org/topics/disability/materials/braille?lang=spa

1) Fix the accents.

Replace "(" with "Á"
Replace "!'" with "É"
Replace "/" with "Í"
Replace "+" with "Ó"
Replace ")" with "Ú"
Replace "\" with "Ü"
Replace "]" with "Ñ"

2) Convert everything to lowercase.

3) Convert letters with periods in front of them to capital letters.

Replace ".a" with "A"
Replace ".b" with "B"
Replace ".c" with "C"
etc. Don't forget the accented letters!
Replace "." with "" (clean up double periods)

4) Fix the punctuation.

Replace "'" with "."
Replace "1" with ";"
Replace "2" with ","
Replace "3" with ":"
Replace " 5'" with "¿"
Replace "5'" with "?"
Replace " 6'" with "¡"
Replace "6'" with "!"
Replace "8'" with """
Replace "9 " with ""
Replace "--" with "—"
Replace "<" with "("
Replace ">" with ")"

5) Fix the numbers.

I'm not sure the most efficient way to do this, because numbers have multiple digits, but here's a start:

Replace "#a" with "1"
Replace "#b" with "2"
Replace "#c" with "3"
Replace "#d" with "4"
Replace "#e" with "5"
Replace "#f" with "6"
Replace "#g" with "7"
Replace "#h" with "8"
Replace "#i" with "9"
Replace "#j" with "0"

Replace "1a" with "11"
Replace "1b" with "12"
Replace "1c" with "13"
etc.

Re: Triple combo in Spanish and in text/ascii?

Posted: Tue Apr 26, 2016 12:26 pm
by sbradshaw
Sorry for posting three times in a row! If all you want is a list of words, after step 3, you can do this:
4) Replace all punctuation (using the codes above) with spaces.
5) Replace all spaces with line breaks
6) Select all, copy, and paste into an Excel document
7) Sort the column alphabetically
8) Remove any rows you don't want
9) Use a formula to return only the unique rows