Page 2 of 2

Re: Why is there No Simple Text File of the Standard Works?

Posted: Mon Jul 11, 2016 8:25 am
by marianomarini
There are also free software to extract simple text from PDF file.
Give a look to Internet.

Re: Why is there No Simple Text File of the Standard Works?

Posted: Mon Jul 11, 2016 5:46 pm
by rmrichesjr
Marianomarini, thanks for that mention. Perchance, are you aware of any of those free programs that can unravel the two-column format of the PDF files? Last I checked, the downloadable scripture PDF files are in two-column format, the same as the paper scriptures. The pdftotext tool I tried early this year did extract plain text, but the text was somewhat tangled due to the two-column format. To have usable plain text would have still required manual or automated untangling.

Re: Why is there No Simple Text File of the Standard Works?

Posted: Tue Jul 12, 2016 3:05 am
by marianomarini
You can check VeryPDF, it can maintain even two column in the text file . I'll give a look.

Re: Why is there No Simple Text File of the Standard Works?

Posted: Mon Oct 30, 2017 10:16 am
by ross.rick
No one answered the question 'why'. Lacking an answer from the church, I'd guess that it has to do with control of the copyrighted text.

There are lots of ways to 'hack' a text copy of the scriptures based on files you can download from [url]lds.org/media-library/ebooks[/url] I will explain on relatively easy method below that doesn't require any programing nor scripting knowledge. But I will preface that with my worry that it is hard to guarantee the accuracy of the output. I would hate, for example, to generate an ascii copy of the text, and have there be errors in it, especially if sharing it with someone else. Anyway...

First, grab the epub version of the scriptures from lds.org. It appears that the epub versions don't have footnotes etc. to start with, which is what seems to be desired here.

Second, get a free and open-source program called 'calibre' . It is an e-reader/ebook manager that many people apparently like. I was able to easily install it on my linux based computer. It appears they have versions for macOS and Windows as well.. I installed in LInux using my distribution-provided package, something that the folks at calibre apparently recommend against, but it worked fine for me for this purpose.

Use the command line as follows:

Code: Select all

ebook-convert input.epub output.txt


where input and output are the base filenames. For example, to convert the book of mormon:

Code: Select all

ebook-convert book-of-mormon-eng.epub book-of-mormon-eng.txt


That will take the epub file book-of-mormon-eng.epub (downloaded from lds.org) and output a text file of the same name except .txt extension instead of .epub

You should be able to do that for each of the 4 books of the standards works, resulting in 4 text files. You could merge those 4 resulting text files into a single file using various means.

Note if you try to use the .pdf file as the source in the above conversion, the results may not be what you want, becuase it tries to deal with all the footnotes; and it appears there may be other problems as well.

Note I read about this calibre method here: https://askubuntu.com/questions/102458/how-can-i-convert-epub-files-to-plain-text#102475

I hope this helps someone.

Re: Why is there No Simple Text File of the Standard Works?

Posted: Mon Oct 30, 2017 11:26 am
by ross.rick
Update to my previous post:

The epub book of Mormon file I downloaded from lds.org today 30Oct2017 has no footnotes and converted beautifully to plain text as per my above post.

However the other epub scripture files on lds.org today all do have footnotes (contrary to my poor assumption). And the table entries for Doctrine and Covenants, Pearl of Great Price, and Triple combination all link to the same Triple combination epub file.

I'm in the process of trying to convert the Holy Bible epub to text but it is taking my less- than- powerful PC a very long time. I suspect it will try to handle the footnotes so may not produce the clean plain text.

I think there may be other non-lds resources out on the internet to get plain text versions of the Bible including the king James version.

For doctrine and covenants and pearl of Great Price, you may have to hack a little harder to strip out the footnotes etc

For what it is worth.

Re: Why is there No Simple Text File of the Standard Works?

Posted: Mon Oct 30, 2017 7:53 pm
by rmrichesjr
Excellent! Thank you for posting those pointers and instructions.

Re: Why is there No Simple Text File of the Standard Works?

Posted: Mon Oct 30, 2017 8:36 pm
by ross.rick
At least with epub it should be easier to strip out the footnotes and superscripts--and you don't have to deal with the two column format of the PDF files. Epub is basically a zipped archive of html files, based on XML. I think python ( and other languages) should be very capable to parse out the text.

Re: Why is there No Simple Text File of the Standard Works?

Posted: Mon Oct 30, 2017 8:40 pm
by rmrichesjr
ross.rick wrote:At least with epub it should be easier to strip out the footnotes and superscripts--and you don't have to deal with the two column format of the PDF files. Epub is basically a zipped archive of html files, based on XML. I think python ( and other languages) should be very capable to parse out the text.


Based on the epub files reportedly being HTML on the inside, another possible solution would be the html2text utility available for Linux and likely for other systems.

Re: Why is there No Simple Text File of the Standard Works?

Posted: Wed Nov 01, 2017 8:51 am
by aebrown
ross.rick wrote:No one answered the question 'why'. Lacking an answer from the church, I'd guess that it has to do with control of the copyrighted text.

There are lots of ways to 'hack' a text copy of the scriptures based on files you can download...

These are helpful techniques, and it's nice that you shared them. But everyone should be aware that content derived from copyrighted content is still under the same copyright restrictions as the original. So text obtained this way cannot be used in apps or published in other ways without specific permission from the Church Intellectual Property Office. But if it's for personal use, then you can use the derived text in the same ways you might use the original text.

Re: Why is there No Simple Text File of the Standard Works?

Posted: Sat Jul 27, 2019 4:31 pm
by nahomie10
brettbkg wrote:I wish I had the skills to do write such a script. I work in healthcare and I'm not deeply technical, so I'm afraid I just had to do it the long and hard way. The work has been done (It took a few hours). I'm happy to make it available to others in my situation who can't write scripts to automate it (I could link to it in a public folder in my Dropbox) -- I just don't know if anyone would frown on that from a legal/copyright standpoint.



I would love a copy of your spreadsheet if you are sharing. : ) My email is nahomie10@gmail.com