Audio/video stream recording forums (http://stream-recorder.com/forum/index.php)
-   Removing DRM protection from eBooks (http://stream-recorder.com/forum/forumdisplay.php?f=63)
-   -  

How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindle)

(http://stream-recorder.com/forum/showthread.php?t=5426)

any ANONYMOUS forum user 01-18-2010 03:07 AM

How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindle)


 
Quote:

Topaz is an Amazon format for Kindle devices. It differs from the AZW format in that it can have embedded fonts in the file itself. A .tan sidefile is used to store metadata and bookmarks and other user generated content on the eBook. The metadata is used to help the library mode to reference information about the eBook itself.

While not much is currently known about the internal format used in a Topaz file there is some likelihood that it is related to the standard AZW format. It uses a different compression than standard MOBI files and it can have embedded fonts in the file allowing more complex display using font sets and characters that are not standard to Amazon Kindle. It is also likely to remove other restrictions found in MOBI files such as image size limitations although some of these may have been removed in AZW as well.

According to one publishing industry blogger, Topaz is an implementation of the open EPUB standard. It follows the OEBPS 2.0 specs, and probably the later IDPF guides. It’s a proprietary implementation which means they use ePUB as the source but then convert it to their internal format.

AZW1 - is an eBook in the Topaz (TPZ) format that has been delivered via Whispernet.

TPZ - is an eBook in the Topaz format that that been delivered via Internet download.

The following is experimental and it will probably not work for you but…

ALSO: Please do not use any of this to steal. Theft is wrong.

This is only meant to allow conversion of Topaz books for other book readers you own.

Here are the steps:
  1. First you must use the python scripts in topazscripts.zip to do the translation from Topaz to HTML

    The files you should have after unzipping are:

    cmbtc_dump.py – (author: cmbtc) unencrypts and dumps to files all of the sections, properly numbered and named

    decode_meta.py – converts metadata0000.dat to human readable text

    convert2xml.py – converts page*.dat, other*.dat, and glyphs*.dat files to their “pseudo” xml descriptions.

    flatxml2html.py – converts a “flattened” xml description to html using the ocrtext and markup as its basis.

    stylexml2css.py – converts stylesheet “flattened” xml from other0000.dat into css (as best it can – mainly supporting paragraph style classes)

    genxml.py – main program to convert everything to xml

    genhtml.py – main program to generate “book.html”
  2. You must remove the DRM from the Topaz book and build a directory of its contents using the following commands:

    cmbtc_dump.py -d -o TARGETDIR [-p pid] YOURTOPAZBOOKNAMEHERE

    This should create a directory called “TARGETDIR” in your current directory.

    It should have the following files in it:

    metadata0000.dat – metadata info
    other0000.dat – information used to create a style sheet
    dict0000.dat – dictionary of words used to build page descriptions
    page – directory filled with page*.dat files
    glyphs – directory filled with glyphs*.dat files
  3. You should convert the files in “TARGETDIR” to their xml descriptions
    Please note, this python program uses “decode_meta.py” and “convert2xml.py” so don’t move them.

    genxml.py TARGETDIR
  4. Next attempt a conversion to html where “TARGETDIR” is the directory that was created in step 2.
    Please note, this python program uses “decode_meta.py”, “convert2xml.py”, “flatxml2html.py”, and “stylexml2css.py” so don’t move them.

    genhtml.py TARGETDIR

    Once it completes:

    You should have created the file “book.html” inside of TARGETDIR

    You should also have created the directory xml inside of TARGETDIR
    which has the full xml descriptions of the pages and glyphs for later
    (better) conversion attempts.

One warning … this is not the best long-term solution because much of the layout is only really correct if drawn to the screen (as an svg). Until that solution exists, this should get you something that you can load into Sigil and clean up and make an ePub that you can then convert to other formats

Code:

http://www.pastie.org/760591
http://www.mediafire.com/?qmzjmt25yzf
http://rapidshare.com/files/336800633/topazscripts.zip.html

See also:
ebook DRM removal tools archive

elch 03-21-2010 01:55 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Code:

http://pastie.org/761169.txt
seems to be more up-to-date.

vinografia 03-27-2010 01:40 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
How do you use "http://pastie.org/761169.txt"? (I've tried using unswindle but it doesn't work on topaz.)

TIA

elch 03-28-2010 08:29 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
I don't have a Kindle myself (or which device is necessary for Topaz e-books) but I'm a bit interested in encryption-related topics.

You need Python for this script. I'm not a Windows user anymore but there are precompiled binaries which should work fine.

Download the script, start the command line and type: "python script.py filename"

It accepts the following parameters:
Quote:

print("\nCMBDTC.py [options] bookFileName\n")
print("-p Adds a PID to the list of PIDs that are tried to decrypt the book key (can be used several times)")
print("-r Prints or writes to disk a record indicated in the form name:index (e.g \"img:0\")")
print("-o Output file name to write records")
print("-v Verbose (can be used several times)")
print("-i Print kindle.info database")

Stream Recorder 03-31-2010 12:18 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
The following set of tools can also be used to remove DRM from Amazon Topaz eBooks:
  • TopazExtract_Kindle_iPhone.pyw,
  • TopazFiles2XML.pyw,
  • TopazFiles2SVG.pyw,
  • TopazFiles2HTML.pyw

tools_v1.6b.zip.
Code:

http://www.mediafire.com/?mn3vmttbwrt
The scripts should work with Kindle and iPhone Amazon Topaz Files (.tpz, .azw1). The files are really images of pages with OCR performed on them. Using the tools you can get SVG images of the pages, and the OCRed HTML version for clean-up.

maggerbee 04-03-2010 09:36 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
WOW, thanks so much for that download! I successfully converted my purchased topaz from Amazon, into an HTML, and then used Calibre to convert it to .epub for use on my HTC Hero (Android Phone) using Aldiko book reader. Thanks sooo much!!!

slopsbox 04-04-2010 09:06 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
I'm still unable to convert the files (xhtml) that I have, though the ebook itself has been stripped of DRM. I've tried merging the files with Adobe Acrobat Pro and Calibre, without success.

Can someone post the steps to do so? Thanks so much.

teebee 04-07-2010 07:30 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Is there anyone able to help me with an error message? I have successfully converted 3 books but am having trouble with the 4th. it strips the drm but when I go to convert it to xml I get the following error at page 256 "Error - -1501 outside of string table limits" i did some unsuccessful googling, so if anyone can help me I would appreciate it.

jcklaus 04-11-2010 02:44 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Running this I keep getting the error "Can not find dict0000.dat file" What am I doing wrong? Thanks.

any ANONYMOUS forum user 04-12-2010 03:12 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
How to remove DRM from Topaz ebooks:
  1. Install Python
  2. Open command prompt / terminal and run:

    Code:

    python cmbtc_dump_nonK4PC.py -d -o TARGETDIR -p 12345678 YOURTOPAZBOOKNAMEHERE
    where
    - 12345678 - the first 8 characters of your PID
    - "TARGETDIR" - target directory (can be ommited)
    - YOURTOPAZBOOKNAMEHERE - filename of your Topaz ebook (with the .tpz extension)
  3. Then, again in the command prompt / terminal, run:

    Code:

    python gensvg.py TARGETDIR
  4. Then create HTML file from the SVG file by running the following in the command line / terminal:

    Code:

    python genhtml.py TARGETDIR
    You should get "book.html" file in the TARGETDIR directory.
  5. Convert "book.html" to any other format using Calibre.

any ANONYMOUS forum user 04-12-2010 03:12 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by jcklaus (Post 18075)
Running this I keep getting the error "Can not find dict0000.dat file" What am I doing wrong?

Running what? On what OS? On what files?

jcklaus 04-12-2010 03:35 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Running Mac OS X Terminal and using the command line:

python TopazFiles2HTML.pyw MYTOPAZBOOKNAME

Stream Recorder 04-12-2010 11:43 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by jcklaus (Post 18113)
Running Mac OS X Terminal and using the command line:

python TopazFiles2HTML.pyw MYTOPAZBOOKNAME

The files in the lib directory are used by the script. Make sure to extract the lib directory with the other scripts.

From my understanding, you need to run
1. TopazExtract_Kindle_iPhone.pyw
2. then run TopazFiles2XML.pyw,
3. and then run either TopazFiles2SVG.pyw or TopazFiles2HTML.pyw
May be I'm wrong. I don't have a Kindle to check it out.

You can also try to run cmbdtc.py on your Topaz ebook
Code:

cmbtc_dump.py -d -o TARGETDIR [-p pid] YOURTOPAZBOOKNAMEHERE
and see whether you get any errors.

Maradona10 04-30-2010 11:14 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by Stream Recorder (Post 17767)
The following set of tools can also be used to remove DRM from Amazon Topaz eBooks:
  • TopazExtract_Kindle_iPhone.pyw,
  • TopazFiles2XML.pyw,
  • TopazFiles2SVG.pyw,
  • TopazFiles2HTML.pyw

tools_v1.6b.zip.
Code:

http://www.mediafire.com/?mn3vmttbwrt
The scripts should work with Kindle and iPhone Amazon Topaz Files (.tpz, .azw1). The files are really images of pages with OCR performed on them. Using the tools you can get SVG images of the pages, and the OCRed HTML version for clean-up.

I have downloaded these scripts. But can you please explain how to execute them properly and in which order? Cause I'm new to python. I have a topaz book on my iphone and want to convert it, but don't know how.

Any help appreciated.

jcklaus 05-01-2010 06:03 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by Stream Recorder (Post 18120)
The files in the lib directory are used by the script. Make sure to extract the lib directory with the other scripts.

From my understanding, you need to run
1. TopazExtract_Kindle_iPhone.pyw
2. then run TopazFiles2XML.pyw,
3. and then run either TopazFiles2SVG.pyw or TopazFiles2HTML.pyw
May be I'm wrong. I don't have a Kindle to check it out.

You can also try to run cmbdtc.py on your Topaz ebook
Code:

cmbtc_dump.py -d -o TARGETDIR [-p pid] YOURTOPAZBOOKNAMEHERE
and see whether you get any errors.

Whenever I try the first step of running TopazExtract_Kindle_Iphone.pyw I get the following error message:

File "./lib/cmbtc_dump_nonK4PC.py", line 517, in <module>
sys.exit(main())
File "./lib/cmbtc_dump_nonK4PC.py", line 478, in main
bookFile = openBook(args[0])
File "./lib/cmbtc_dump_nonK4PC.py", line 57, in openBook
raise CMBDTCFatal("Could not open book file: " + path)
__main__.CMBDTCFatal: Could not open book file:

Stream Recorder 05-02-2010 01:31 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by jcklaus (Post 18666)
Whenever I try the first step of running TopazExtract_Kindle_Iphone.pyw I get the following error message:

File "./lib/cmbtc_dump_nonK4PC.py", line 517, in <module>
sys.exit(main())
File "./lib/cmbtc_dump_nonK4PC.py", line 478, in main
bookFile = openBook(args[0])
File "./lib/cmbtc_dump_nonK4PC.py", line 57, in openBook
raise CMBDTCFatal("Could not open book file: " + path)
__main__.CMBDTCFatal: Could not open book file:

Are you trying to remove DRM from Topaz book? Or mobipocket book?

jcklaus 05-02-2010 07:20 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
I'm trying to remove DRM from my Topaz azw1 files to eventually convert to epub

yankgirl013 06-01-2010 07:13 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Hi, I'm hoping someone can help me here. I'm trying to remove a DRM off a topaz file and I'm not getting anywhere. I'm using a Mac OS.
I've downloaded all the files listed, but I keep getting a 'Can not find dict0000.dat file' error.

Is there a way we can 'dumb' the directions down? I've removed them from azw and mobi using terminal and python scripts.

Thanks so much!!!

It really doesn't matter what I convert it to, I can just change it to epub using calibre

any ANONYMOUS forum user 06-01-2010 11:30 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by yankgirl013 (Post 19314)
I keep getting a 'Can not find dict0000.dat file' error.

Not sure whether it will help

Quote:

Originally Posted by some updates
That means it can’t find your dict0000.dat file which should be right where you are.

This can not be the true error since other pages worked.

Please make sure that all of these are in the some location (i.e. side by side inside of TARGETDIR

convert2xml.py
dict0000.dat
pageNNNN.dat

where NNNN is the number of the problem page*.dat file

Then make sure you have cd to the TARGETDIR and then run

convert2xml.py -d dict0000.dat pageNNNN.dat > debug.txt

where again the NNNN is the number of the page file that does not work.

Then look in debug.txt for “Unknown” or any other warning or error message and post at http://darkreverser.wordpress.com/2008/02/13/new-blog/ what it says around that point in the debug.txt file.


djpyle 06-09-2010 06:24 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by jcklaus (Post 18666)
Whenever I try the first step of running TopazExtract_Kindle_Iphone.pyw I get the following error message:

File "./lib/cmbtc_dump_nonK4PC.py", line 517, in <module>
sys.exit(main())
File "./lib/cmbtc_dump_nonK4PC.py", line 478, in main
bookFile = openBook(args[0])
File "./lib/cmbtc_dump_nonK4PC.py", line 57, in openBook
raise CMBDTCFatal("Could not open book file: " + path)
__main__.CMBDTCFatal: Could not open book file:

I'm getting this same error trying to remove DRM from a .tpz file. Any ideas? I'm running OSX 10.6.3.

If I try to run it without the PID, I get:

Traceback (most recent call last):
File "./lib/cmbtc_dump.py", line 37, in <module>
from ctypes import windll, c_char_p, c_wchar_p, c_uint, POINTER, byref, \
ImportError: cannot import name windll

Error: File Extraction Failed

djpyle 06-09-2010 07:45 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Never mind, I got it to work using the command line.

Now my next question: is it possible to generate svg pages where the page takes up the entire screen instead of having the back, forward, and zoom in/out options? If so, you could load the xhtml pages in eCub and generate an ePub that would work perfectly in iBooks. Even with the back/forward in/out aspects, it still works well on iBooks, but those things are just taking up unnecessary space in this particular instance.

I know enough to open gensvg.py in an editor, I just don't know what to take out and what to leave in. Anyone have any tips?

djpyle 06-09-2010 07:51 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by yankgirl013 (Post 19314)
Hi, I'm hoping someone can help me here. I'm trying to remove a DRM off a topaz file and I'm not getting anywhere. I'm using a Mac OS.
I've downloaded all the files listed, but I keep getting a 'Can not find dict0000.dat file' error.

Is there a way we can 'dumb' the directions down? I've removed them from azw and mobi using terminal and python scripts.

Thanks so much!!!

It really doesn't matter what I convert it to, I can just change it to epub using calibre

You need to run cmbtc_dump.py or cmbtc_dump_nonK4PC.py first. The former if you purchased the book for Kindle for PC, the latter if you purchased it for a Kindle or iDevice. That removes the DRM. Then you can convert it using the other scripts.

jcklaus 06-12-2010 05:00 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by djpyle (Post 19486)
Never mind, I got it to work using the command line.

Now my next question: is it possible to generate svg pages where the page takes up the entire screen instead of having the back, forward, and zoom in/out options? If so, you could load the xhtml pages in eCub and generate an ePub that would work perfectly in iBooks. Even with the back/forward in/out aspects, it still works well on iBooks, but those things are just taking up unnecessary space in this particular instance.

I know enough to open gensvg.py in an editor, I just don't know what to take out and what to leave in. Anyone have any tips?

Could you run through how you used the command line to solve the error I was getting as well? I don't understand. Thanks.

djpyle 06-12-2010 05:27 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by jcklaus (Post 19564)
Could you run through how you used the command line to solve the error I was getting as well? I don't understand. Thanks.

  1. First instead of using the .pyw program, download the regular .py scripts discussed in the original post. I never could get the .pyw to work. You'll have to use Terminal to run the scripts. Once in Terminal, navigate to the folder containing your scripts and type:

    Code:

    python cmbtc_dump_nonK4PC.py -d -o TARGETDIR -p 12345678 YOURTOPAZBOOKNAMEHERE
    where 12345678 should be replaced by the first 8 characters of your PID. You can use something else for "TARGETDIR" if you want, but you don't have to. The only things you have to change are the PID number and the filename. Use the full file name, including the extension (eg Patient-Zero-A-Joe-Ledger-Novel.tpz).

    This will create a folder called TARGETDIR (or whatever you replaced that with) in the folder with your script.
  2. Then, still in Terminal, type:

    Code:

    python gensvg.py TARGETDIR
  3. This is where I stop because the .html the next step creates is very sloppy and doesn't retain italics, but if you want to create an .html, type:

    Code:

    python genhtml.py TARGETDIR
    This creates a "book.html" file in the TARGETDIR folder, which you can then convert to other formats as you see fit.

Hope that helps. Let me know if you have any problems. I'll try to help as best I can.

jcklaus 06-13-2010 11:51 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by djpyle (Post 19568)
  1. First instead of using the .pyw program, download the regular .py scripts discussed in the original post. I never could get the .pyw to work. You'll have to use Terminal to run the scripts. Once in Terminal, navigate to the folder containing your scripts and type:

    Code:

    python cmbtc_dump_nonK4PC.py -d -o TARGETDIR -p 12345678 YOURTOPAZBOOKNAMEHERE
    where 12345678 should be replaced by the first 8 characters of your PID. You can use something else for "TARGETDIR" if you want, but you don't have to. The only things you have to change are the PID number and the filename. Use the full file name, including the extension (eg Patient-Zero-A-Joe-Ledger-Novel.tpz).

    This will create a folder called TARGETDIR (or whatever you replaced that with) in the folder with your script.
  2. Then, still in Terminal, type:

    Code:

    python gensvg.py TARGETDIR
  3. This is where I stop because the .html the next step creates is very sloppy and doesn't retain italics, but if you want to create an .html, type:

    Code:

    python genhtml.py TARGETDIR
    This creates a "book.html" file in the TARGETDIR folder, which you can then convert to other formats as you see fit.

Hope that helps. Let me know if you have any problems. I'll try to help as best I can.

THANK YOU! It worked! This had been driving me crazy and you completely spooked it for me.

jcklaus 06-13-2010 11:53 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by yankgirl013 (Post 19314)
Hi, I'm hoping someone can help me here. I'm trying to remove a DRM off a topaz file and I'm not getting anywhere. I'm using a Mac OS.
I've downloaded all the files listed, but I keep getting a 'Can not find dict0000.dat file' error.

Is there a way we can 'dumb' the directions down? I've removed them from azw and mobi using terminal and python scripts.

Thanks so much!!!

It really doesn't matter what I convert it to, I can just change it to epub using calibre

Another user solved this for me. You need to do one additional conversion step in the Terminal using python to turn it into an html file. Type:

python genhtml.py TARGETDIR

Where TARGETDIR is the name of your folder. You'll get an html file called book.html which you can use in calibre

djpyle 06-13-2010 03:40 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by jcklaus (Post 19578)
THANK YOU! It worked! This had been driving me crazy and you completely spooked it for me.

Glad I could help.

jtchris 06-27-2010 12:20 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
I don't suppose somebody could PM me with the latest share where I could find the scripts, eh?

maggerbee 06-28-2010 02:54 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
I think this may be it, I uploaded it for ya.

Code:

http://www.mediafire.com/file/4yhzqzjzjvm/tools.zip

headala 06-30-2010 08:12 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Is anyone else getting an "Error - -249 outside of string table limits" with this? I get it with both the CLI and the GUI.

headala 07-05-2010 02:19 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
If there is anyone who could please help me with the above error I would appreciate it.

I've now tried it on a 32-bit system as well, with the same results. Perhaps it's an issue with my ebook?

nerfherder 07-21-2010 02:41 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by headala (Post 19925)
Is anyone else getting an "Error - -249 outside of string table limits" with this? I get it with both the CLI and the GUI.

I had a similar error. I've had success before, so I think it only happens with certain books. This version of convert2xml.py worked for me:

Code:

http://pastebin.com/RDERReyk
The version number is the same as the one that failed for me, but this one is clearly an improvement.

headala 07-25-2010 09:16 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by nerfherder (Post 20383)
I had a similar error. I've had success before, so I think it only happens with certain books. This version of convert2xml.py worked for me:

Code:

http://pastebin.com/RDERReyk
The version number is the same as the one that failed for me, but this one is clearly an improvement.

Thank you SO very much for your reply...I was beginning to lose hope that anyone would help!

However, I am still getting the same error. Could you PM me or post the whole topaz tools files you are using? Thanks!

epstewart 08-15-2010 11:56 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Help! I accidentally obtained a Topaz e-book and, via Amazon Kindle for Mac, downloaded it. I also downloaded it to Kindle for the iPhone. I would like to convert it to an EPUB for iBooks on the iPhone. As I read through the posts in this thread, I find myself getting very confused. Exactly how do I do what I want to do?

I do know how to use the skindle app to unlock the downloaded file (using Parallels emulator software which lets me run Windows on my Mac). The result can be either a compressed .tpz file or an uncompressed one — but neither of those works with calibre! Does unlocking the original file with skindle get me any closer to the result I'm after?

Some of the Python scripts mentioned here seem to assume one has a physical Kindle, for which one needs to specify the PID (whatever that is). How do I go about doing things if all I have are a Mac and an iPhone, with no PID?

I get the feeling I have to convert the downloaded .tpz file to the SVG format. I have no idea what SVG means, though, or why I can't convert the .tpz file directly to an EPUB. I'd like to find out the answers to both those questions.

I also think I understand that the SVG file then has to be converted to HTML, which can at last be input to calibre and converted to EPUB format. Am I right about that? Again, why not just convert the .tpz file directly to (if not EPUB) HTML?

Any help anyone can give me will be much appreciated ...

Stream Recorder 08-15-2010 09:40 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by epstewart (Post 20879)
Help! I accidentally obtained a Topaz e-book and, via Amazon Kindle for Mac, downloaded it. I also downloaded it to Kindle for the iPhone. I would like to convert it to an EPUB for iBooks on the iPhone. As I read through the posts in this thread, I find myself getting very confused. Exactly how do I do what I want to do?

I do know how to use the skindle app to unlock the downloaded file (using Parallels emulator software which lets me run Windows on my Mac). The result can be either a compressed .tpz file or an uncompressed one — but neither of those works with calibre! Does unlocking the original file with skindle get me any closer to the result I'm after?

Some of the Python scripts mentioned here seem to assume one has a physical Kindle, for which one needs to specify the PID (whatever that is). How do I go about doing things if all I have are a Mac and an iPhone, with no PID?

I get the feeling I have to convert the downloaded .tpz file to the SVG format. I have no idea what SVG means, though, or why I can't convert the .tpz file directly to an EPUB. I'd like to find out the answers to both those questions.

I also think I understand that the SVG file then has to be converted to HTML, which can at last be input to calibre and converted to EPUB format. Am I right about that? Again, why not just convert the .tpz file directly to (if not EPUB) HTML?

Any help anyone can give me will be much appreciated ...

There is no need to use Parallels Desktop for Max OS X:
http://stream-recorder.com/forum/sho...8&postcount=24

Anonymouslemming 09-10-2010 04:13 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by Stream Recorder (Post 20892)

Hi - that post you link to states that you need to know your PID. I can't find that from Kindle4Mac or Kindle4PC. Where would I get that without owning a kindle ?

Also, when I run TopazExtract_Kindle4PC.pyw, I get the following error:

ImportError: cannot import name windll

Stream Recorder 09-11-2010 01:03 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by Anonymouslemming (Post 21362)
Hi - that post you link to states that you need to know your PID. I can't find that from Kindle4Mac or Kindle4PC. Where would I get that without owning a kindle ?

skindle - remove DRM from KindleForPC ebooks (mobi and topaz)
Removing DRM protection from Kindle for PC ebooks using unswindle
DeDRM AppleScript for Mac OS X 10.5, 10.6

ch mn 09-11-2010 09:03 AM

Script bugs?


 
I finally found the tools to try to convert a topaz book to something more portable (maybe?). The book downloaded to my PC as .azw file along with a .mbp file, but it looked like a topaz book and Skindle identified it as a topaz book. I renamed it Book.tpz for simplicity and ran the following script:

python cmbtc_dump.py -d -o Book Book.tpz

I got the following result:
File "cmbtc_dump.py", line 774
except Exception as message:
^
SyntaxError: invalid syntax

However this script seemed to work:
python cmbtc_dump_nonk4pc.py -d -o Book -p abCdeFgh Book.tpz

Following the instructions, I then ran:
python gensvg.py Book

Which in the end resulted in the following:
page0055.dat
Traceback (most recent call last):
File "gensvg.py", line 405, in <module>
sys.exit(main(''))
File "gensvg.py", line 329, in main
flat_xml = convert2xml.main(pargv)
File "C:\EZSkindle\Topaz\lib\convert2xml.py", line 789, in main
xmlpage = pp.process()
File "C:\EZSkindle\Topaz\lib\convert2xml.py", line 703, in process
tag = self.procToken(self.dict.lookup(v))
File "C:\EZSkindle\Topaz\lib\convert2xml.py", line 439, in procToken
subtagres.append(self.procToken(self.dict.lookup(v al)))
File "C:\EZSkindle\Topaz\lib\convert2xml.py", line 439, in procToken
subtagres.append(self.procToken(self.dict.lookup(v al)))
File "C:\EZSkindle\Topaz\lib\convert2xml.py", line 140, in lookup
print "Error - %d outside of string table limits" % val
TypeError: int argument required

Of course everything after that failed.

Although I am an engineer, I am not a programmer. I don't know if I got a buggy version of the tools or if there is something weird about this book, or if did something wrong. I have not tried another book, as this is the only one I have unless someone can point me a free download that is definitely a topaz book that these tools have successfully been used on.

Stream Recorder 10-05-2010 07:35 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Quote:

Originally Posted by Chris9 (Post 21963)
I used TopazExtract_Kindle4PC.pyw on a PRC file successfully ... but when I use topazfiles2html.pyw I just get an error message: "HTML conversion failed"

If I use topazfiles2xml.pyw it will work, but is there then a way to convert the xml files into an epub????

Use Calibre for converting XML to ePub or other formats.

Chris9 10-05-2010 08:00 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Nevermind, I was able to make it work. You have to run the files in this order (for a topaz prc) ...

1. TopazExtract_Kindle4PC
2. TopazFiles2SVG
3. TopazFiles2html

The resulting book.html file was easily exported into Calibre with a metadata TOC intact. On a quick inspection, there may be only minor cleanups needed which I can do in Sigil.

Edit: It wasn't as clean as I thought it would be. There are typical errors you get from OCR books, as this uses an OCR process to convert the books. Topaz is a crappy format, and even on K4PC the book's text doesn't look that great. One problem is that Amazon doesn't specify on its kindle books whether they are topaz format or not. So it's risky buying from them unless you ONLY want to read the book on a kindle device. In this case, for me, I could not find this title in an ebook version anyplace else. So it was worth it for me. But if I have a choice, I will always opt for a non-Amazon ebook first.


All times are GMT -6. The time now is 10:17 PM.