2019-12-16 / GHOST

Getting All Draft Posts from Ghost

Mirroring your glorious Ghost blog but missing out on those drafts you don’t want the world to see just yet? Here’s how to download them using the Admin API.

Mirrors and Drafts

Mirroring an entire domain can be fairly straightforward. Just take the one you’re looking at right now. At the time of writing, the start page did look something like this.

How Hard Can It Be?! Homepage

Now, in order to make a complete copy, simply point wget at the domain root, tell it to follow every link inside the domain it encounters (amongst an extra 1,000 special parameters you need to provide in order to make it work and not accidentally mirror the entire internet), apologise in advance to the admin that you’re about to hammer her website with some heavy traffic (unless you’re really careful), and then let wget work its magic.

If you’re not in the poor man’s crawling business, then there’s always the option of going via a Sitemap — in case you have one. Thankfully, Ghost comes with a sitemap.xml built right into it that’s enabled by default. Otherwise, Google wouldn’t even know this page exists.

Great idea, except that the mirror can be incomplete. While the crawl eventually mirrors every page reachable from the domain root — what about those “special” pages that aren’t accessible from any page? Because they are hidden from the public. On purpose.

In Ghost, those are draft posts. Posts that are still in the making but not yet ready for the main stage. You can view them in a sort of preview mode on the website but they are not accessible from anywhere on the main pages. You need to know where they are located. Hence, drafts.

So, how do you include the drafts in the mirror? Crawling is out of the question as there is nothing to crawl from. You need to know where those drafts are located. Get to the URLs so you can eventually feed them into the giant wget hoover. Sorted.

Luckily, there’s an API for that in Ghost.

The Admin API to the Rescue!

Now, the Content API only serves content that has already been published (it clearly says so near the top of the page if you actually care to read it. Just right above all the details you came there for in the first place — I have to admit that I missed it the first time around. And the second. And the… Anyways.).

However, the Admin API (or Private API as it’s also known) has all the glorious remaining details. All you need to fill y’er boots. Including draft post URLs. Hooray!

For a Ghost installation at <base-url>, the Admin API is located at <base-url>/ghost/api/v2/admin/. In the case of https://www.how-hard-can-it.be, the Admin API is located at https://www.how-hard-can-it.be/ghost/api/v2/admin/. No need to try it out right now — there’s nothing to see over there…

And given that it’s a proper API — all you really need for the job is Bash! Because: All you really need is Bash!

So, all that remains to do is:

Create a session and store the cookie in memory only
Use the session cookie to retrieve all draft post data
Extract all draft post URLs and store them in a file for further downstream processing

Right. Let’s get cracking.

As described in article “Keeping cURL’s Hands Out of the Cookie Jar”, a session cookie for the Admin API can be obtained and eventually stored in the temporary variable cookie via

cookie=$(curl -c - \
              -d username="not-a-real-username" \
              -d password="not-a-real-password" \
              -H "Origin: localhost" \
              "https://www.how-hard-can-it.be/ghost/api/v2/admin/session/")

Side note: As we are sending username and password in plain text(!), let’s make sure we are using HTTPS! Oh, and: Please don’t try these values on our hard working server — they are obviously for demonstration purposes only. Thank you!

Retrieving all Draft Posts

With the session cookie in memory, we can now go ask for all draft posts in JSON format and eventually store the result in file post.json.

echo "${cookie}" | curl -b -
                        -H "Content-Type: application/json" \
                        -H "Origin: localhost" \
                        "https://www.how-hard-can-it.be/ghost/api/v2/admin/posts/?limit=all&filter=status:draft&fields=uuid" > posts.json

Here, the trick is to get the query parameters right so that the resulting posts.json looks like

{
    "posts": [
        {
            "uuid": "732ba42f-9880-429c-9ddb-18c1dff4afd8"
        },
        {
            "uuid": "bbf95f26-7624-4cd9-a275-320d3a82957e"
        },
        {s
            "uuid": "6c230c78-89c8-4b72-8c51-2c129831f2ac"
        }
    ],
    "meta": {
        "pagination": {
            "page": 1,
            "limit": "all",
            "pages": 1,
            "total": 3,
            "next": null,
            "prev": null
        }
    }
}

Now, just how are those query parameters constructed and what do they actually mean?! One parameter at a time…!

Just Given Me Everything!

The first query parameter of ?limit=all asks for all draft posts in one go. While it might look scary, it should actually be alright. Hey, worx for me…!™

Unless you’re running a shop with hundreds or thousands of draft posts lying around. But then why are you not publishing those?! Anyways. Your decision. Feel free to switch to pagination if your mileage varies drastically.

I Just Want Draft Posts!

The next query parameter of &filter=status:draft filters all posts that have status="draft" in their corresponding JSON representation.

Interestingly enough, draft posts were being excluded for me when omitting all filters. So, not only does this filter make draft posts visible, it also limits them to just the draft posts themselves at the same time.

I Only Really Care About One Field!

The last query parameter of &fields=uuid limits the JSON fields for each draft post to its uuid. It’s the one thing you really need when it comes to creating draft post paths.

Extracting all Draft Post URLs

With the uuids contained inside the posts.json file, all that’s needed to liberate them is some JSON parsing in Bash — best done via jq.

Here, the secret to generating the final URLs is that given a <uuid> and a <base-url>, the corresponding draft post can be found at <base-url>/p/<uuid>.

Looping over the uuids that jq extracts from posts.json and constructing the resulting draft post paths in the following Bash commands

for uuid in $(jq -r '.posts[].uuid' posts.json)
do
    echo "/p/${uuid}/" >> drafts.paths
done

results in the content of drafts.paths being something like

/p/732ba42f-9880-429c-9ddb-18c1dff4afd8/
/p/bbf95f26-7624-4cd9-a275-320d3a82957e/
/p/6c230c78-89c8-4b72-8c51-2c129831f2ac/

Now, that we can feed into the giant wget hoover for mirroring. Sorted!

Everything in One Go

When combined with input parameter handling for usability beyond just How Hard Can It Be?! and some sane password handling, the above Bash snippets eventually turn into the following GitHub Gist

Yes, this is what you came here for, most likely. Now, do the following: Copy. Paste. Adjust. Rethink. Claim Success.

So, How Do You Mirror Draft Posts?!

While the above works for me when it comes to mirroring Ghost draft posts, you may use an alternative or better way.

Think this is all rubbish, incomplete, or massively overcomplicated?! Feel free to comment on the GitHub Gist or reach out to me on LinkedIn and teach me something new!

As always, prove me wrong and I’ll buy you a pint!

Getting All Draft Posts from Ghost

Mirrors and Drafts

The Admin API to the Rescue!

Retrieving all Draft Posts

Just Given Me Everything!

I Just Want Draft Posts!

I Only Really Care About One Field!

Extracting all Draft Post URLs

Everything in One Go

So, How Do You Mirror Draft Posts?!

A Basic Directories Diff

Batch Rotating Images on macOS

Mirrors and Drafts

The Admin API to the Rescue!

Creating a Session Cookie

Retrieving all Draft Posts

Just Given Me Everything!

I Just Want Draft Posts!

I Only Really Care About One Field!

Extracting all Draft Post URLs

Everything in One Go

So, How Do You Mirror Draft Posts?!

Subscribe to How Hard Can It Be?!