javascript

How I Bypassed Amazon's Kindle Web DRM Because Their App Sucked

As it turns out they don't actually want you to do this (and have some interesting ways to stop you)

Pixelmelt

15 Oct 2025 • 6 min read

Photo by Madalyn Cox / Unsplash

TL;DR

I bought my first ebook from amazon
Amazon's Kindle Android app was really buggy and crashed a bunch
Tried to download my book to use with a functioning reader app
Realized Amazon no longer lets you do that
Decided to reverse engineer their DRM system out of spite
Discovered multiple layers of protection including randomized alphabets
Defeated all of them with font matching wizardry
You can now download the books you own books with my code

Part 1: Amazon Made This Personal

The One Time I Tried To Do Things The Right Way

I've been "obtaining" ebooks for years. But this ONE time, I thought: "Let's support the author."

Download Kindle app on Android. Open book.

Crash.

I Just Wanted To Read My Book

App crashes. Fine, I'll use the web reader.

Oh wait, can't download it for offline reading. What if I'm on a plane?

Hold on, I can't even export it to Calibre? Where I keep ALL my other books?

So let me get this straight:

I paid money for this book
I can only read it in Amazon's broken app
I can't download it
I can't back it up
I don't actually own it
Amazon can delete it whenever they want

This is a rental, not a purchase.

It Becomes Personal

I could've refunded and pirated it in 30 seconds. Would've been easier.

But that's not the point.

The point is I PAID FOR THIS BOOK. It's mine. And I'm going to read it in Calibre with the rest of my library even if I have to reverse engineer their web client to do it.

Reversal Time

Kindle Cloud Reader (the web version) actually works. While looking through the network requests, I spotted this:

https://read.amazon.com/renderer/render

To download anything, you need:

1. Session cookies - standard Amazon login

2. Rendering token - from the startReading API call

3. ADP session token - extra auth layer

Sending the same headers and cookies the browser does returns a TAR file.

What's Inside The TAR?

page_data_0_4.json   # The "text" (spoiler: it's not text)
glyphs.json          # SVG definitions for every character
toc.json             # Table of contents
metadata.json        # Book info
location_map.json    # Position mappings

Part 3: Amazon's Obfuscation Layers of Ebook Hell

Downloaded the first few pages, expected to see text. Got this instead:

{
  "type": "TextRun",
  "glyphs": [24, 25, 74, 123, 91, 18, 19, 30, 4, ...],
  "style": "paragraph"
}

These aren't letters. They're glyph IDs. Character 'T' isn't Unicode 84, it's glyph 24.

And glyph 24 is just a series of numbers that define a stroke path, its just an image of a letter.

It's a substitution cipher! Each character maps to a non-sequential glyph ID.

The Alphabet Changes Every. Five. Pages.

Downloaded the next batch of pages. Same letter 'T' is now glyph 87.

Next batch? Glyph 142.

They randomize the entire alphabet on EVERY request.

This means:

You can only get 5 pages at a time (API hard limit)
Each request gets completely new glyph mappings
Glyph IDs are meaningless across requests
You can't build one mapping table for the whole book

Let Me Show You How Bad This Is

For my 920-page book:

184 separate API requests needed
184 different random alphabets to crack
361 unique glyphs discovered (a-z, A-Z, punctuation, ligatures)
1,051,745 total glyphs to decode

Fake Font Hints (They're Getting Sneaky)

Some SVG paths contained this garbage:

M695.068,0 L697.51,-27.954 m3,1 m1,6 m-4,-7 L699.951,-55.908 ...

Looking at it, we see these tiny m3,1 m1,6 m-4,-7 commands, they are micro MoveTo operations.

Why this is evil:

Browsers handle them fine (native Path2D)
Python SVG libraries create spurious connecting lines
Makes glyphs look corrupted when rendered naively
Breaks path-sampling approaches

This is deliberate anti-scraping. The glyphs render perfectly in browser but make it so we cant just compare paths in our parser.

Take a look

Eventually I figured out that filling in the complete path mitigated this.

Multiple Font Variants

Not just one font. FOUR variants:

bookerly_normal (99% of glyphs)
bookerly_italic (emphasis)
bookerly_bold (headings)
bookerly_bolditalic (emphasized headings)

Plus special ligatures: ff, fi, fl, ffi, ffl

More variations = more unique glyphs to crack = more pain.

OCR Is Mid (My Failed Attempt)

Tried running OCR on rendered glyphs. Results:

178/348 glyphs recognized (51%)
170 glyphs failed completely

OCR just sucks at single characters without context. Confused 'l' with 'I' with '1'. Couldn't handle punctuation. Gave up on ligatures entirely.

OCR needs words and sentences to work well. Single characters? Might as well flip a coin.

Part 4: The Solution That Actually Worked

Every request includes `glyphs.json` with SVG path definitions:

{
  "24": {
    "path": "M 450 1480 L 820 1480 L 820 0 L 1050 0 L 1050 1480 ...",
    "fontFamily": "bookerly_normal"
  },
  "87": {
    "path": "M 450 1480 L 820 1480 L 820 0 L 1050 0 L 1050 1480 ...",
    "fontFamily": "bookerly_normal"
  }
}

Glyph IDs change, but SVG shapes don't.

Why Direct SVG Comparison Failed

First attempt: normalize and compare SVG path coordinates.

Failed because:

Coordinates vary slightly
Path commands represented differently

Pixel-Perfect Matching

Screw coordinate comparison. Let's just render everything and compare pixels.

1. Render every SVG as an image

Use cairosvg (lets us handle those fake font hints correctly)
Render at 512 x 512px for accuracy

2. Generate perceptual hashes

Hash each rendered image
The hash becomes the unique identifier
Same shape = same hash, regardless of glyph ID

3. Build normalized glyph space

Map all 184 random alphabets to hash-based IDs
Now glyph "a1b2c3d4..." always means letter 'T'

4. Match to actual characters

Download Bookerly TTF fonts
Render every character (A-Z, a-z, 0-9, punctuation)
Use SSIM (Structural Similarity Index) to match

Why SSIM Is Perfect For This

SSIM compares image structure, not pixels directly. It handles:

Slight rendering differences
Anti-aliasing variations
Minor scaling issues

For each unknown glyph, find the TTF character with highest SSIM score. That's your letter.

Handling The Edge Cases

Ligatures: ff, fi, fl, ffi, ffl

These are single glyphs for multiple characters
Had to add them to TTF library manually

Special characters: em-dash, quotes, bullets

Extended character set beyond basic ASCII
Matched against full Unicode range in Bookerly

Font variants: Bold, italic, bold-italic

Built separate libraries for each variant
Match against all libraries, pick best score

Part 5: The Moment It All Worked

Final Statistics

=== NORMALIZATION PHASE ===
Total batches processed: 184
Unique glyphs found: 361
Total glyphs in book: 1,051,745

=== MATCHING PHASE ===
Successfully matched 361/361 unique glyphs (100.00%)
Failed to match: 0 glyphs
Average SSIM score: 0.9527

=== DECODED OUTPUT ===
Total characters: 5,623,847
Pages: 920

Perfect. Every single character decoded correctly.

EPUB Reconstruction With Perfect Formatting

The JSON includes positioning for every text run:

{
  "glyphs": [24, 25, 74],
  "rect": {"left": 100, "top": 200, "right": 850, "bottom": 220},
  "fontStyle": "italic",
  "fontWeight": 700,
  "fontSize": 12.5,
  "link": {"positionId": 7539}
}

I used this to preserve:

Paragraph breaks (Y-coordinate changes)
Text alignment (X-coordinate patterns)
Bold/italic styling
Font sizes
Internal links

The final EPUB is near indistinguishable from the original!

The Real Conclusion

Amazon put real effort into their web DRM.

Was It Worth It?

To read one book? No.

To prove a point? Absolutely.

To learn about SVG rendering, perceptual hashing, and font metrics? Probably yes.

Use This Knowledge Responsibly

This is for backing up books YOU PURCHASED.

Don't get me sued into oblivion thanks.