Audio/video stream recording forums

Attention Visitor:
You may have to register or log in before you can post:
  • Click the register link to sign up.
  • Registered members please fill in the form below and click the "Log in" button.
To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Audio/video stream recording forums > Other discussions > Removing DRM protection from eBooks
Register FAQ Members List Calendar Mark Forums Read

Reply Post New Thread
 
Thread Tools Display Modes
  #1  
Old 01-18-2010, 04:07 AM
any ANONYMOUS forum user any ANONYMOUS forum user is offline
any user of the forum who preferred to post anonymously
 
Join Date: Aug 2011
Location: Server of stream-recorder.com
Posts: 211
any ANONYMOUS forum user is on a distinguished road
Default

How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindle)


Quote:
Topaz is an Amazon format for Kindle devices. It differs from the AZW format in that it can have embedded fonts in the file itself. A .tan sidefile is used to store metadata and bookmarks and other user generated content on the eBook. The metadata is used to help the library mode to reference information about the eBook itself.

While not much is currently known about the internal format used in a Topaz file there is some likelihood that it is related to the standard AZW format. It uses a different compression than standard MOBI files and it can have embedded fonts in the file allowing more complex display using font sets and characters that are not standard to Amazon Kindle. It is also likely to remove other restrictions found in MOBI files such as image size limitations although some of these may have been removed in AZW as well.

According to one publishing industry blogger, Topaz is an implementation of the open EPUB standard. It follows the OEBPS 2.0 specs, and probably the later IDPF guides. It’s a proprietary implementation which means they use ePUB as the source but then convert it to their internal format.

AZW1 - is an eBook in the Topaz (TPZ) format that has been delivered via Whispernet.

TPZ - is an eBook in the Topaz format that that been delivered via Internet download.

The following is experimental and it will probably not work for you but…

ALSO: Please do not use any of this to steal. Theft is wrong.

This is only meant to allow conversion of Topaz books for other book readers you own.

Here are the steps:
  1. First you must use the python scripts in topazscripts.zip to do the translation from Topaz to HTML

    The files you should have after unzipping are:

    cmbtc_dump.py – (author: cmbtc) unencrypts and dumps to files all of the sections, properly numbered and named

    decode_meta.py – converts metadata0000.dat to human readable text

    convert2xml.py – converts page*.dat, other*.dat, and glyphs*.dat files to their “pseudo” xml descriptions.

    flatxml2html.py – converts a “flattened” xml description to html using the ocrtext and markup as its basis.

    stylexml2css.py – converts stylesheet “flattened” xml from other0000.dat into css (as best it can – mainly supporting paragraph style classes)

    genxml.py – main program to convert everything to xml

    genhtml.py – main program to generate “book.html”
  2. You must remove the DRM from the Topaz book and build a directory of its contents using the following commands:

    cmbtc_dump.py -d -o TARGETDIR [-p pid] YOURTOPAZBOOKNAMEHERE

    This should create a directory called “TARGETDIR” in your current directory.

    It should have the following files in it:

    metadata0000.dat – metadata info
    other0000.dat – information used to create a style sheet
    dict0000.dat – dictionary of words used to build page descriptions
    page – directory filled with page*.dat files
    glyphs – directory filled with glyphs*.dat files
  3. You should convert the files in “TARGETDIR” to their xml descriptions
    Please note, this python program uses “decode_meta.py” and “convert2xml.py” so don’t move them.

    genxml.py TARGETDIR
  4. Next attempt a conversion to html where “TARGETDIR” is the directory that was created in step 2.
    Please note, this python program uses “decode_meta.py”, “convert2xml.py”, “flatxml2html.py”, and “stylexml2css.py” so don’t move them.

    genhtml.py TARGETDIR

    Once it completes:

    You should have created the file “book.html” inside of TARGETDIR

    You should also have created the directory xml inside of TARGETDIR
    which has the full xml descriptions of the pages and glyphs for later
    (better) conversion attempts.

One warning … this is not the best long-term solution because much of the layout is only really correct if drawn to the screen (as an svg). Until that solution exists, this should get you something that you can load into Sigil and clean up and make an ePub that you can then convert to other formats

Code:
http://www.pastie.org/760591
http://www.mediafire.com/?qmzjmt25yzf
http://rapidshare.com/files/336800633/topazscripts.zip.html
See also:
ebook DRM removal tools archive
Reply With Quote
  #2  
Old 03-21-2010, 02:55 PM
elch elch is offline
Member
 
Join Date: Mar 2010
Posts: 78
elch is on a distinguished road
Default

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


Code:
http://pastie.org/761169.txt
seems to be more up-to-date.
Reply With Quote
  #3  
Old 03-27-2010, 02:40 AM
vinografia vinografia is offline
Junior Member
 
Join Date: Mar 2010
Posts: 1
vinografia is on a distinguished road
Default

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


How do you use "http://pastie.org/761169.txt"? (I've tried using unswindle but it doesn't work on topaz.)

TIA
Reply With Quote
  #4  
Old 03-28-2010, 09:29 AM
elch elch is offline
Member
 
Join Date: Mar 2010
Posts: 78
elch is on a distinguished road
Default

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


I don't have a Kindle myself (or which device is necessary for Topaz e-books) but I'm a bit interested in encryption-related topics.

You need Python for this script. I'm not a Windows user anymore but there are precompiled binaries which should work fine.

Download the script, start the command line and type: "python script.py filename"

It accepts the following parameters:
Quote:
print("\nCMBDTC.py [options] bookFileName\n")
print("-p Adds a PID to the list of PIDs that are tried to decrypt the book key (can be used several times)")
print("-r Prints or writes to disk a record indicated in the form name:index (e.g \"img:0\")")
print("-o Output file name to write records")
print("-v Verbose (can be used several times)")
print("-i Print kindle.info database")
Reply With Quote
  #5  
Old 03-31-2010, 01:18 AM
Stream Recorder
 
Posts: n/a
Default

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


The following set of tools can also be used to remove DRM from Amazon Topaz eBooks:
  • TopazExtract_Kindle_iPhone.pyw,
  • TopazFiles2XML.pyw,
  • TopazFiles2SVG.pyw,
  • TopazFiles2HTML.pyw

tools_v1.6b.zip.
Code:
http://www.mediafire.com/?mn3vmttbwrt
The scripts should work with Kindle and iPhone Amazon Topaz Files (.tpz, .azw1). The files are really images of pages with OCR performed on them. Using the tools you can get SVG images of the pages, and the OCRed HTML version for clean-up.
Reply With Quote
  #6  
Old 04-03-2010, 10:36 AM
maggerbee maggerbee is offline
Junior Member
 
Join Date: Apr 2010
Posts: 2
maggerbee is on a distinguished road
Default

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


WOW, thanks so much for that download! I successfully converted my purchased topaz from Amazon, into an HTML, and then used Calibre to convert it to .epub for use on my HTC Hero (Android Phone) using Aldiko book reader. Thanks sooo much!!!
Reply With Quote
  #7  
Old 04-04-2010, 10:06 PM
slopsbox slopsbox is offline
Junior Member
 
Join Date: Jun 2009
Posts: 12
slopsbox is on a distinguished road
Default

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


I'm still unable to convert the files (xhtml) that I have, though the ebook itself has been stripped of DRM. I've tried merging the files with Adobe Acrobat Pro and Calibre, without success.

Can someone post the steps to do so? Thanks so much.
Reply With Quote
  #8  
Old 04-07-2010, 08:30 PM
teebee teebee is offline
Junior Member
 
Join Date: Apr 2010
Posts: 1
teebee is on a distinguished road
Default

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


Is there anyone able to help me with an error message? I have successfully converted 3 books but am having trouble with the 4th. it strips the drm but when I go to convert it to xml I get the following error at page 256 "Error - -1501 outside of string table limits" i did some unsuccessful googling, so if anyone can help me I would appreciate it.
Reply With Quote
  #9  
Old 04-11-2010, 03:44 PM
jcklaus jcklaus is offline
Junior Member
 
Join Date: Apr 2010
Posts: 7
jcklaus is on a distinguished road
Default

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


Running this I keep getting the error "Can not find dict0000.dat file" What am I doing wrong? Thanks.
Reply With Quote
  #10  
Old 04-12-2010, 04:12 AM
any ANONYMOUS forum user any ANONYMOUS forum user is offline
any user of the forum who preferred to post anonymously
 
Join Date: Aug 2011
Location: Server of stream-recorder.com
Posts: 211
any ANONYMOUS forum user is on a distinguished road
Default

Re: How to convert Topaz ebooks to HTML (Remove DRM from TPZ and AZW1 books for Kindl


Quote:
Originally Posted by jcklaus View Post
Running this I keep getting the error "Can not find dict0000.dat file" What am I doing wrong?
Running what? On what OS? On what files?
Reply With Quote
Reply Post New Thread
Tags: , , , , , , , , , , , , , , ,



Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -6. The time now is 01:48 PM.


Powered by All-streaming-media.com; 2006-2011
vB forum hacked with Zoints add-ons