Why is there No Simple Text File of the Standard Works?

Moderator: hedrickef

User avatar
marianomarini
Senior Member
Posts: 619
Joined: Sat Jan 19, 2008 3:13 am
Location: Vicenza. Italy

Re: Why is there No Simple Text File of the Standard Works?

Postby marianomarini » Mon Jul 11, 2016 8:25 am

There are also free software to extract simple text from PDF file.
Give a look to Internet.

rmrichesjr
Community Moderators
Posts: 2024
Joined: Thu Jan 25, 2007 11:32 am
Location: Dundee, Oregon

Re: Why is there No Simple Text File of the Standard Works?

Postby rmrichesjr » Mon Jul 11, 2016 5:46 pm

Marianomarini, thanks for that mention. Perchance, are you aware of any of those free programs that can unravel the two-column format of the PDF files? Last I checked, the downloadable scripture PDF files are in two-column format, the same as the paper scriptures. The pdftotext tool I tried early this year did extract plain text, but the text was somewhat tangled due to the two-column format. To have usable plain text would have still required manual or automated untangling.

User avatar
marianomarini
Senior Member
Posts: 619
Joined: Sat Jan 19, 2008 3:13 am
Location: Vicenza. Italy

Re: Why is there No Simple Text File of the Standard Works?

Postby marianomarini » Tue Jul 12, 2016 3:05 am

You can check VeryPDF, it can maintain even two column in the text file . I'll give a look.

ross.rick
New Member
Posts: 14
Joined: Sun Mar 06, 2011 6:28 am

Re: Why is there No Simple Text File of the Standard Works?

Postby ross.rick » Mon Oct 30, 2017 10:16 am

No one answered the question 'why'. Lacking an answer from the church, I'd guess that it has to do with control of the copyrighted text.

There are lots of ways to 'hack' a text copy of the scriptures based on files you can download from [url]lds.org/media-library/ebooks[/url] I will explain on relatively easy method below that doesn't require any programing nor scripting knowledge. But I will preface that with my worry that it is hard to guarantee the accuracy of the output. I would hate, for example, to generate an ascii copy of the text, and have there be errors in it, especially if sharing it with someone else. Anyway...

First, grab the epub version of the scriptures from lds.org. It appears that the epub versions don't have footnotes etc. to start with, which is what seems to be desired here.

Second, get a free and open-source program called 'calibre' . It is an e-reader/ebook manager that many people apparently like. I was able to easily install it on my linux based computer. It appears they have versions for macOS and Windows as well.. I installed in LInux using my distribution-provided package, something that the folks at calibre apparently recommend against, but it worked fine for me for this purpose.

Use the command line as follows:

Code: Select all

ebook-convert input.epub output.txt


where input and output are the base filenames. For example, to convert the book of mormon:

Code: Select all

ebook-convert book-of-mormon-eng.epub book-of-mormon-eng.txt


That will take the epub file book-of-mormon-eng.epub (downloaded from lds.org) and output a text file of the same name except .txt extension instead of .epub

You should be able to do that for each of the 4 books of the standards works, resulting in 4 text files. You could merge those 4 resulting text files into a single file using various means.

Note if you try to use the .pdf file as the source in the above conversion, the results may not be what you want, becuase it tries to deal with all the footnotes; and it appears there may be other problems as well.

Note I read about this calibre method here: https://askubuntu.com/questions/102458/how-can-i-convert-epub-files-to-plain-text#102475

I hope this helps someone.

ross.rick
New Member
Posts: 14
Joined: Sun Mar 06, 2011 6:28 am

Re: Why is there No Simple Text File of the Standard Works?

Postby ross.rick » Mon Oct 30, 2017 11:26 am

Update to my previous post:

The epub book of Mormon file I downloaded from lds.org today 30Oct2017 has no footnotes and converted beautifully to plain text as per my above post.

However the other epub scripture files on lds.org today all do have footnotes (contrary to my poor assumption). And the table entries for Doctrine and Covenants, Pearl of Great Price, and Triple combination all link to the same Triple combination epub file.

I'm in the process of trying to convert the Holy Bible epub to text but it is taking my less- than- powerful PC a very long time. I suspect it will try to handle the footnotes so may not produce the clean plain text.

I think there may be other non-lds resources out on the internet to get plain text versions of the Bible including the king James version.

For doctrine and covenants and pearl of Great Price, you may have to hack a little harder to strip out the footnotes etc

For what it is worth.

rmrichesjr
Community Moderators
Posts: 2024
Joined: Thu Jan 25, 2007 11:32 am
Location: Dundee, Oregon

Re: Why is there No Simple Text File of the Standard Works?

Postby rmrichesjr » Mon Oct 30, 2017 7:53 pm

Excellent! Thank you for posting those pointers and instructions.

ross.rick
New Member
Posts: 14
Joined: Sun Mar 06, 2011 6:28 am

Re: Why is there No Simple Text File of the Standard Works?

Postby ross.rick » Mon Oct 30, 2017 8:36 pm

At least with epub it should be easier to strip out the footnotes and superscripts--and you don't have to deal with the two column format of the PDF files. Epub is basically a zipped archive of html files, based on XML. I think python ( and other languages) should be very capable to parse out the text.

rmrichesjr
Community Moderators
Posts: 2024
Joined: Thu Jan 25, 2007 11:32 am
Location: Dundee, Oregon

Re: Why is there No Simple Text File of the Standard Works?

Postby rmrichesjr » Mon Oct 30, 2017 8:40 pm

ross.rick wrote:At least with epub it should be easier to strip out the footnotes and superscripts--and you don't have to deal with the two column format of the PDF files. Epub is basically a zipped archive of html files, based on XML. I think python ( and other languages) should be very capable to parse out the text.


Based on the epub files reportedly being HTML on the inside, another possible solution would be the html2text utility available for Linux and likely for other systems.

User avatar
aebrown
Community Administrator
Posts: 15107
Joined: Tue Nov 27, 2007 8:48 pm
Location: Sandy, Utah

Re: Why is there No Simple Text File of the Standard Works?

Postby aebrown » Wed Nov 01, 2017 8:51 am

ross.rick wrote:No one answered the question 'why'. Lacking an answer from the church, I'd guess that it has to do with control of the copyrighted text.

There are lots of ways to 'hack' a text copy of the scriptures based on files you can download...

These are helpful techniques, and it's nice that you shared them. But everyone should be aware that content derived from copyrighted content is still under the same copyright restrictions as the original. So text obtained this way cannot be used in apps or published in other ways without specific permission from the Church Intellectual Property Office. But if it's for personal use, then you can use the derived text in the same ways you might use the original text.

nahomie10
New Member
Posts: 1
Joined: Sat Jul 27, 2019 4:31 pm

Re: Why is there No Simple Text File of the Standard Works?

Postby nahomie10 » Sat Jul 27, 2019 4:31 pm

brettbkg wrote:I wish I had the skills to do write such a script. I work in healthcare and I'm not deeply technical, so I'm afraid I just had to do it the long and hard way. The work has been done (It took a few hours). I'm happy to make it available to others in my situation who can't write scripts to automate it (I could link to it in a public folder in my Dropbox) -- I just don't know if anyone would frown on that from a legal/copyright standpoint.



I would love a copy of your spreadsheet if you are sharing. : ) My email is nahomie10@gmail.com


Return to “ePublishing”

Who is online

Users browsing this forum: No registered users and 1 guest