Audio/video stream recording forums (http://stream-recorder.com/forum/index.php)
-   Removing DRM protection from eBooks (http://stream-recorder.com/forum/forumdisplay.php?f=63)
-   -  

How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindle)

(http://stream-recorder.com/forum/showthread.php?t=5426)

any ANONYMOUS forum user 01-18-2010 03:07 AM

How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindle)


 
Quote:

Topaz is an Amazon format for Kindle devices. It differs from the AZW format in that it can have embedded fonts in the file itself. A .tan sidefile is used to store metadata and bookmarks and other user generated content on the eBook. The metadata is used to help the library mode to reference information about the eBook itself.

While not much is currently known about the internal format used in a Topaz file there is some likelihood that it is related to the standard AZW format. It uses a different compression than standard MOBI files and it can have embedded fonts in the file allowing more complex display using font sets and characters that are not standard to Amazon Kindle. It is also likely to remove other restrictions found in MOBI files such as image size limitations although some of these may have been removed in AZW as well.

According to one publishing industry blogger, Topaz is an implementation of the open EPUB standard. It follows the OEBPS 2.0 specs, and probably the later IDPF guides. It’s a proprietary implementation which means they use ePUB as the source but then convert it to their internal format.

AZW1 - is an eBook in the Topaz (TPZ) format that has been delivered via Whispernet.

TPZ - is an eBook in the Topaz format that that been delivered via Internet download.

The following is experimental and it will probably not work for you but…

ALSO: Please do not use any of this to steal. Theft is wrong.

This is only meant to allow conversion of Topaz books for other book readers you own.

Here are the steps:
  1. First you must use the python scripts in topazscripts.zip to do the translation from Topaz to HTML

    The files you should have after unzipping are:

    cmbtc_dump.py – (author: cmbtc) unencrypts and dumps to files all of the sections, properly numbered and named

    decode_meta.py – converts metadata0000.dat to human readable text

    convert2xml.py – converts page*.dat, other*.dat, and glyphs*.dat files to their “pseudo” xml descriptions.

    flatxml2html.py – converts a “flattened” xml description to html using the ocrtext and markup as its basis.

    stylexml2css.py – converts stylesheet “flattened” xml from other0000.dat into css (as best it can – mainly supporting paragraph style classes)

    genxml.py – main program to convert everything to xml

    genhtml.py – main program to generate “book.html”
  2. You must remove the DRM from the Topaz book and build a directory of its contents using the following commands:

    cmbtc_dump.py -d -o TARGETDIR [-p pid] YOURTOPAZBOOKNAMEHERE

    This should create a directory called “TARGETDIR” in your current directory.

    It should have the following files in it:

    metadata0000.dat – metadata info
    other0000.dat – information used to create a style sheet
    dict0000.dat – dictionary of words used to build page descriptions
    page – directory filled with page*.dat files
    glyphs – directory filled with glyphs*.dat files
  3. You should convert the files in “TARGETDIR” to their xml descriptions
    Please note, this python program uses “decode_meta.py” and “convert2xml.py” so don’t move them.

    genxml.py TARGETDIR
  4. Next attempt a conversion to html where “TARGETDIR” is the directory that was created in step 2.
    Please note, this python program uses “decode_meta.py”, “convert2xml.py”, “flatxml2html.py”, and “stylexml2css.py” so don’t move them.

    genhtml.py TARGETDIR

    Once it completes:

    You should have created the file “book.html” inside of TARGETDIR

    You should also have created the directory xml inside of TARGETDIR
    which has the full xml descriptions of the pages and glyphs for later
    (better) conversion attempts.

One warning … this is not the best long-term solution because much of the layout is only really correct if drawn to the screen (as an svg). Until that solution exists, this should get you something that you can load into Sigil and clean up and make an ePub that you can then convert to other formats

Code:

http://www.pastie.org/760591
http://www.mediafire.com/?qmzjmt25yzf
http://rapidshare.com/files/336800633/topazscripts.zip.html

See also:
ebook DRM removal tools archive

elch 03-21-2010 01:55 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Code:

http://pastie.org/761169.txt
seems to be more up-to-date.

vinografia 03-27-2010 01:40 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
How do you use "http://pastie.org/761169.txt"? (I've tried using unswindle but it doesn't work on topaz.)

TIA

elch 03-28-2010 08:29 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
I don't have a Kindle myself (or which device is necessary for Topaz e-books) but I'm a bit interested in encryption-related topics.

You need Python for this script. I'm not a Windows user anymore but there are precompiled binaries which should work fine.

Download the script, start the command line and type: "python script.py filename"

It accepts the following parameters:
Quote:

print("\nCMBDTC.py [options] bookFileName\n")
print("-p Adds a PID to the list of PIDs that are tried to decrypt the book key (can be used several times)")
print("-r Prints or writes to disk a record indicated in the form name:index (e.g \"img:0\")")
print("-o Output file name to write records")
print("-v Verbose (can be used several times)")
print("-i Print kindle.info database")

Stream Recorder 03-31-2010 12:18 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
The following set of tools can also be used to remove DRM from Amazon Topaz eBooks:
  • TopazExtract_Kindle_iPhone.pyw,
  • TopazFiles2XML.pyw,
  • TopazFiles2SVG.pyw,
  • TopazFiles2HTML.pyw

tools_v1.6b.zip.
Code:

http://www.mediafire.com/?mn3vmttbwrt
The scripts should work with Kindle and iPhone Amazon Topaz Files (.tpz, .azw1). The files are really images of pages with OCR performed on them. Using the tools you can get SVG images of the pages, and the OCRed HTML version for clean-up.

maggerbee 04-03-2010 09:36 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
WOW, thanks so much for that download! I successfully converted my purchased topaz from Amazon, into an HTML, and then used Calibre to convert it to .epub for use on my HTC Hero (Android Phone) using Aldiko book reader. Thanks sooo much!!!

slopsbox 04-04-2010 09:06 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
I'm still unable to convert the files (xhtml) that I have, though the ebook itself has been stripped of DRM. I've tried merging the files with Adobe Acrobat Pro and Calibre, without success.

Can someone post the steps to do so? Thanks so much.

teebee 04-07-2010 07:30 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Is there anyone able to help me with an error message? I have successfully converted 3 books but am having trouble with the 4th. it strips the drm but when I go to convert it to xml I get the following error at page 256 "Error - -1501 outside of string table limits" i did some unsuccessful googling, so if anyone can help me I would appreciate it.

jcklaus 04-11-2010 02:44 PM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
Running this I keep getting the error "Can not find dict0000.dat file" What am I doing wrong? Thanks.

any ANONYMOUS forum user 04-12-2010 03:12 AM

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


 
How to remove DRM from Topaz ebooks:
  1. Install Python
  2. Open command prompt / terminal and run:

    Code:

    python cmbtc_dump_nonK4PC.py -d -o TARGETDIR -p 12345678 YOURTOPAZBOOKNAMEHERE
    where
    - 12345678 - the first 8 characters of your PID
    - "TARGETDIR" - target directory (can be ommited)
    - YOURTOPAZBOOKNAMEHERE - filename of your Topaz ebook (with the .tpz extension)
  3. Then, again in the command prompt / terminal, run:

    Code:

    python gensvg.py TARGETDIR
  4. Then create HTML file from the SVG file by running the following in the command line / terminal:

    Code:

    python genhtml.py TARGETDIR
    You should get "book.html" file in the TARGETDIR directory.
  5. Convert "book.html" to any other format using Calibre.


All times are GMT -6. The time now is 03:27 PM.