How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindle)

(http://stream-recorder.com/forum/showthread.php?t=5426)

Quote:

Originally Posted by ch mn (Post 21374)

I finally found the tools to try to convert a topaz book to something more portable (maybe?). The book downloaded to my PC as .azw file along with a .mbp file, but it looked like a topaz book and Skindle identified it as a topaz book. I renamed it Book.tpz for simplicity and ran the following script:

python cmbtc_dump.py -d -o Book Book.tpz

I got the following result:
File "cmbtc_dump.py", line 774
except Exception as message:
^
SyntaxError: invalid syntax

However this script seemed to work:
python cmbtc_dump_nonk4pc.py -d -o Book -p abCdeFgh Book.tpz

Following the instructions, I then ran:
python gensvg.py Book

Which in the end resulted in the following:
page0055.dat
Traceback (most recent call last):
File "gensvg.py", line 405, in <module>
sys.exit(main(''))
File "gensvg.py", line 329, in main
flat_xml = convert2xml.main(pargv)
File "C:\EZSkindle\Topaz\lib\convert2xml.py", line 789, in main
xmlpage = pp.process()
File "C:\EZSkindle\Topaz\lib\convert2xml.py", line 703, in process
tag = self.procToken(self.dict.lookup(v))
File "C:\EZSkindle\Topaz\lib\convert2xml.py", line 439, in procToken
subtagres.append(self.procToken(self.dict.lookup(v al)))
File "C:\EZSkindle\Topaz\lib\convert2xml.py", line 439, in procToken
subtagres.append(self.procToken(self.dict.lookup(v al)))
File "C:\EZSkindle\Topaz\lib\convert2xml.py", line 140, in lookup
print "Error - %d outside of string table limits" % val
TypeError: int argument required

Of course everything after that failed.

Although I am an engineer, I am not a programmer. I don't know if I got a buggy version of the tools or if there is something weird about this book, or if did something wrong. I have not tried another book, as this is the only one I have unless someone can point me a free download that is definitely a topaz book that these tools have successfully been used on.

I installed Python 2.6.6 (I may have had 2.4) and new version of the tools.zip. Whichever did it I was able to convert my one Topaz book, although TopazExtract_Kindle4PC.pyw still did not work, I had to use TopazExtract_KindleV1_iPhone_iPad.pyw.

Most of the text is OK, but the fancy chapter titles and drop caps are pretty screwed up. I do not have Kindle, but it was free from Amazon. I will have to try converting it to epub for my Nook in Calibre.

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

Once you have the XHTML files, is there a way to 'easily' extract the images in order to do your own OCR on them. I can't find a way to create any image format from the XHTML pages. I believe I could do a better job OCRing them than Amazon has done.

Edit: I figured it out. Once you have the .xhtml files from the TopazFiles2SVG.pyw you can open Sigil and add them as stated in the first post (but without much explanation). Here is more explanation - When you start a new project, you'll notice some folder names in the left frame. One of these is "Text". Add all the .xhtml files here. If they don't automatically load, add the image files from the img folder from TopazFiles2SVG.pyw to the Sigil Images folder. Save this as an ePub book. However, since the pages are really scanned images, this ePub ebook will be very large. I used Calibre to create a PDF from the ePub so I can use my own OCR program to create a text version of the ebook.

I hope this helps to understand the final steps of the process. I realize that these steps are simple compared to the much more complex programming behind the original poster's steps for obtaining the image files (i.e. the .xhtml files) but, for me, it helped to generate my final deDRMed ebook.

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

You will need 4 sets of tools :

1. "Calibre" or some other tool which can convert between multiple types of eBook, including HTML.
Anyone wanting to convert books presumably has this program anyway... essential software.

2. "Sigil" for correcting & editing ePub ebooks

3. Python 2.6 (not 2.7 or later afaik)

4. DRM tools downloaded from links posted in the following thread :
DRM Tools Archive

Use the Topaz tools from the above download in option 4. Unzip them, and search the directories for the Topaz one.
There is a Readme text file included with the Topaz tools that explains clearly how to use them - they have a GUI interface.

The result of the Topaz DRM removal tools is a "book.html" with an associated CSS style sheet file, and an "img" directory with photos, illustrations, etc.

In order to convert to another ebook format, I had to drag the "book.html" file in to Calibre which 'converts' it to a Zip format ebook; Then I had to select/download a Cover image for it, and convert it to ePub format....

However, this process is not perfect, and each of the 3 books I extracted using Topaz tools had some errors in them when they reached the ePub format in Calibre. Some characters in headings only were mis-translated making nonsense words, and hyperlinks in tables of contents were consistantly "off-by-one" in linking to the wrong chapters... (also, only black and white images were extracted for some reason, apart from the cover)

So I had to use Sigil to edit the resulting ePub book in order to tidy up the files, and correct mistakes in the Table of Contents. Once I had cleaned up this drm free, converted epub format, I could then convert back to drm free Mobi, etc. etc.

Re: skindle - remove DRM from KindleForPC ebooks (mobi and topaz)

Quote:

Originally Posted by Stream Recorder (Post 24831)

Can't find anything about your error. Try to use unswindle, topazscripts, DeDRM AppleScript for Mac OS X 10.5, 10.6 instead.

Thanks!

TopazExtract.pyw said:

Code:

Parse Error : Invalid Record, record not found

o_O This is one bad day...

Any ideas?

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

Quote:

Originally Posted by coderkid (Post 24832)

TopazExtract.pyw said:

Code:

Parse Error : Invalid Record, record not found

Did you download your PRC file with Kindle for PC and use topaz scripts on the same computer?

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

Quote:

Originally Posted by Stream Recorder (Post 24843)

Did you download your PRC file with Kindle for PC and use topaz scripts on the same computer?

Actually no, I downloaded them from another source.

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

Quote:

Originally Posted by coderkid (Post 24863)

Actually no, I downloaded them from another source.

Because your PRC file is not from Amazon, AFAIK topaz scripts won't work.

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

ebook DRM removal tools archive

Quote:

Originally Posted by README_KindleBooks.txt

KindleBooks (Originally called K4MobiDeDRM and Topaz_Tools)

This tools combines functionality of MobiDeDRM with that of K4PCDeDRM, K4MDeDRM, and K4DeDRM. Effectively, it provides one-stop shopping for all your Mobipocket, Kindle for iPhone/iPad/iPodTouch, Kindle for PC, and Kindle for Mac needs and should work for both Mobi and Topaz ebooks.

Preliminary Steps:

1. Make sure you have Python 2.X installed (32 bit) and properly set as part of your SYSTEM PATH environment variable (On Windows I recommend ActiveState's ActivePython. See their web pages for instructions on how to install and how to properly set your PATH). On Mac OSX 10.6 everything you need is already installed.

****
Please Note: If you a happy user of MobiDeDRM, K4DeDRM, K4PCDeDRM, or K4MUnswindle, please continue to use these programs as there is no additional capability provided by this tool over the others. In the long run, if you have problems with any of those tools, you might want to try this one as it will continue under development eventually replacing all of those tools.
****

Instructions:

1. double-click on KindleBooks.pyw

2. In the window that opens:
hit the first '...' button to locate your DRM Kindle-style ebook

3. Then hit the second '...' button to select an output directory for the unlocked file

4. If you have multiple Kindle.Info files and would like to use one specific one, please hit the third "...' button to select it. Note, if you only have one Kindle.Info file (like most users) this can and should be left blank.

5. Then add in any PIDs you need from KindleV1, Kindle for iPhone/iPad/iPodTouch, or other single PID devices to the provided box as a comma separated list of 10 digit PID numbers. If this is a Kindle for Mac or a Kindle for PC book then you can leave this box blank

6. If you have standalone Kindles, add in any 16 digit Serial Numbers as a comma separated list. If this is a Kindle for Mac or a Kindle for PC book then you can leave this box blank

7. hit the 'Start' button

After a short delay, you should see progress in the Conversion Log window indicating is the unlocking was a success or failure.

If your book was a normal Mobi style ebook:
If successful, you should see a "_nodrm" named version Mobi ebook.
If not please examine the Conversion Log window for any errors.

If your book was actually a Topaz book:

Please note that Topaz is most similar to a poor man's image only PDF in style. It has glyphs and x,y positions, ocrText used just for searching, that describe the image each page all encoded into a binary xml-like set of files.

If successful, you will have 3 zip archives created.

1. The first is BOOKNAME_nodrm.zip.
You can import this into calibre as is or unzip it and edit the book.html file you find inside. To create the book.html, Amazon's ocrText is combined with other information to recreate as closely as possible what the original book looked like. Unfortunately most bolding, italics is lost. Also, Amazon's ocrText can be absolutely horrible at times. Much work will be needed to clean up and correct Topaz books.

2. The second is BOOKNAME_SVG.zip
You can also import this into calibre or unzip it and open the indexsvg.xhtml file in any good Browser (Safari, Firefox, etc). This zip contains a set of svg images (one for each pages is created) and it shows the page exactly how it appeared. This zip can be used to create an image only pdf file via post conversion.

3. The third is BOOKNAME_XML.zip
This is a zip archive of the decrypted and translated xml-like descriptions of each page and can be archived/saved in case later code can do a better job converting these files. These are exactly what a Topaz books guts are. You should take a look at them in any text editor to see what they look like.

If the Topaz book conversion is not successful, a large _DEBUG.zip archive of all of the pieces is created and this can examined along with the Conversion Log window contents to determine the cause of the error and hopefully get it fixed in the next release.

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

has anyone had any success in getting "SKINDLE" to work at all?

How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindle)

Re: Script bugs?

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

Re: skindle - remove DRM from KindleForPC ebooks (mobi and topaz)

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl