How I Bypassed Amazon's Kindle Web DRM Because Their App Sucked

As it turns out they don't actually want you to do this (and have some interesting ways to stop you)

How I Bypassed Amazon's Kindle Web DRM Because Their App Sucked
Photo by Madalyn Cox / Unsplash

TL;DR

  • I bought my first ebook from amazon
  • Amazon's Kindle Android app was really buggy and crashed a bunch
  • Tried to download my book to use with a functioning reader app
  • Realized Amazon no longer lets you do that
  • Decided to reverse engineer their DRM system out of spite
  • Discovered multiple layers of protection including randomized alphabets
  • Defeated all of them with font matching wizardry
  • You can now download the books you own books with my code

Part 1: Amazon Made This Personal

The One Time I Tried To Do Things The Right Way

I've been "obtaining" ebooks for years. But this ONE time, I thought: "Let's support the author."

Download Kindle app on Android. Open book.

Crash.

I Just Wanted To Read My Book

App crashes. Fine, I'll use the web reader.

Oh wait, can't download it for offline reading. What if I'm on a plane?

Hold on, I can't even export it to Calibre? Where I keep ALL my other books?

So let me get this straight:

  • I paid money for this book
  • I can only read it in Amazon's broken app
  • I can't download it
  • I can't back it up
  • I don't actually own it
  • Amazon can delete it whenever they want

This is a rental, not a purchase.

This does not say "Rent"

It Becomes Personal

I could've refunded and pirated it in 30 seconds. Would've been easier.

But that's not the point.

The point is I PAID FOR THIS BOOK. It's mine. And I'm going to read it in Calibre with the rest of my library even if I have to reverse engineer their web client to do it.

Reversal Time

Kindle Cloud Reader (the web version) actually works. While looking through the network requests, I spotted this:

https://read.amazon.com/renderer/render

To download anything, you need:

1. Session cookies - standard Amazon login

2. Rendering token - from the startReading API call

3. ADP session token - extra auth layer

Sending the same headers and cookies the browser does returns a TAR file.

What's Inside The TAR?

page_data_0_4.json   # The "text" (spoiler: it's not text)
glyphs.json          # SVG definitions for every character
toc.json             # Table of contents
metadata.json        # Book info
location_map.json    # Position mappings

Part 3: Amazon's Obfuscation Layers of Ebook Hell

Downloaded the first few pages, expected to see text. Got this instead:

{
  "type": "TextRun",
  "glyphs": [24, 25, 74, 123, 91, 18, 19, 30, 4, ...],
  "style": "paragraph"
}

These aren't letters. They're glyph IDs. Character 'T' isn't Unicode 84, it's glyph 24.

And glyph 24 is just a series of numbers that define a stroke path, its just an image of a letter.

It's a substitution cipher! Each character maps to a non-sequential glyph ID.

The Alphabet Changes Every. Five. Pages.

Downloaded the next batch of pages. Same letter 'T' is now glyph 87.

Next batch? Glyph 142.

They randomize the entire alphabet on EVERY request.

This means:

  • You can only get 5 pages at a time (API hard limit)
  • Each request gets completely new glyph mappings
  • Glyph IDs are meaningless across requests
  • You can't build one mapping table for the whole book

Let Me Show You How Bad This Is

For my 920-page book:

  • 184 separate API requests needed
  • 184 different random alphabets to crack
  • 361 unique glyphs discovered (a-z, A-Z, punctuation, ligatures)
  • 1,051,745 total glyphs to decode

Fake Font Hints (They're Getting Sneaky)

Some SVG paths contained this garbage:

M695.068,0 L697.51,-27.954 m3,1 m1,6 m-4,-7 L699.951,-55.908 ...

Looking at it, we see these tiny m3,1 m1,6 m-4,-7 commands, they are micro MoveTo operations.

Why this is evil:

  • Browsers handle them fine (native Path2D)
  • Python SVG libraries create spurious connecting lines
  • Makes glyphs look corrupted when rendered naively
  • Breaks path-sampling approaches

This is deliberate anti-scraping. The glyphs render perfectly in browser but make it so we cant just compare paths in our parser.

Take a look

Fun!

Eventually I figured out that filling in the complete path mitigated this.

Multiple Font Variants

Not just one font. FOUR variants:

  • bookerly_normal (99% of glyphs)
  • bookerly_italic (emphasis)
  • bookerly_bold (headings)
  • bookerly_bolditalic (emphasized headings)

Plus special ligatures: ff, fi, fl, ffi, ffl

More variations = more unique glyphs to crack = more pain.

OCR Is Mid (My Failed Attempt)

Tried running OCR on rendered glyphs. Results:

  • 178/348 glyphs recognized (51%)
  • 170 glyphs failed completely

OCR just sucks at single characters without context. Confused 'l' with 'I' with '1'. Couldn't handle punctuation. Gave up on ligatures entirely.

OCR needs words and sentences to work well. Single characters? Might as well flip a coin.

Part 4: The Solution That Actually Worked

Every request includes `glyphs.json` with SVG path definitions:

{
  "24": {
    "path": "M 450 1480 L 820 1480 L 820 0 L 1050 0 L 1050 1480 ...",
    "fontFamily": "bookerly_normal"
  },
  "87": {
    "path": "M 450 1480 L 820 1480 L 820 0 L 1050 0 L 1050 1480 ...",
    "fontFamily": "bookerly_normal"
  }
}

Glyph IDs change, but SVG shapes don't.

Why Direct SVG Comparison Failed

First attempt: normalize and compare SVG path coordinates.

Failed because:

  • Coordinates vary slightly
  • Path commands represented differently

Pixel-Perfect Matching

Screw coordinate comparison. Let's just render everything and compare pixels.

Render that A

1. Render every SVG as an image

  • Use cairosvg (lets us handle those fake font hints correctly)
  • Render at 512 x 512px for accuracy

2. Generate perceptual hashes

  • Hash each rendered image
  • The hash becomes the unique identifier
  • Same shape = same hash, regardless of glyph ID

3. Build normalized glyph space

  • Map all 184 random alphabets to hash-based IDs
  • Now glyph "a1b2c3d4..." always means letter 'T'

4. Match to actual characters

  • Download Bookerly TTF fonts
  • Render every character (A-Z, a-z, 0-9, punctuation)
  • Use SSIM (Structural Similarity Index) to match

Why SSIM Is Perfect For This

SSIM compares image structure, not pixels directly. It handles:

  • Slight rendering differences
  • Anti-aliasing variations
  • Minor scaling issues

For each unknown glyph, find the TTF character with highest SSIM score. That's your letter.

Handling The Edge Cases

Ligatures: ff, fi, fl, ffi, ffl

  • These are single glyphs for multiple characters
  • Had to add them to TTF library manually

Special characters: em-dash, quotes, bullets

  • Extended character set beyond basic ASCII
  • Matched against full Unicode range in Bookerly

Font variants: Bold, italic, bold-italic

  • Built separate libraries for each variant
  • Match against all libraries, pick best score

Part 5: The Moment It All Worked

Final Statistics

=== NORMALIZATION PHASE ===
Total batches processed: 184
Unique glyphs found: 361
Total glyphs in book: 1,051,745

=== MATCHING PHASE ===
Successfully matched 361/361 unique glyphs (100.00%)
Failed to match: 0 glyphs
Average SSIM score: 0.9527

=== DECODED OUTPUT ===
Total characters: 5,623,847
Pages: 920

Perfect. Every single character decoded correctly.

EPUB Reconstruction With Perfect Formatting

The JSON includes positioning for every text run:

{
  "glyphs": [24, 25, 74],
  "rect": {"left": 100, "top": 200, "right": 850, "bottom": 220},
  "fontStyle": "italic",
  "fontWeight": 700,
  "fontSize": 12.5,
  "link": {"positionId": 7539}
}

I used this to preserve:

  • Paragraph breaks (Y-coordinate changes)
  • Text alignment (X-coordinate patterns)
  • Bold/italic styling
  • Font sizes
  • Internal links

The final EPUB is near indistinguishable from the original!

The Real Conclusion

Amazon put real effort into their web DRM.

Was It Worth It?

To read one book? No.

To prove a point? Absolutely.

To learn about SVG rendering, perceptual hashing, and font metrics? Probably yes.

Use This Knowledge Responsibly

This is for backing up books YOU PURCHASED.

GitHub - PixelMelt/amazon_book_downloader: Downloads books from the amazon web reader
Downloads books from the amazon web reader. Contribute to PixelMelt/amazon_book_downloader development by creating an account on GitHub.

Don't get me sued into oblivion thanks.