Dissecting IMDB APIs

Overview

IMDB doesn't provide their API publicly and if you really need to use their APIs, you'd need to pay thousands of freedom currency units per year. That's way too much. I certainly don't make that much money to pay for this. But I also kind of want to use their API. Looks like we need to figure out how they use their own API first.

To do this, we need to install mitmproxy. It's a fantastic tool to observe network traffic. Just make sure to install the CA certificate in system store1.

Fortunately, IMDB doesn't pin certificates, which means installing mitmproxy's certificate in system store can successfully decrypt the connection from the IMDB app. Having the certificate in the system store is necessary as since android 7, app ignore user certificate store. When you install a certificate through the usual means on android, it installs it in the user certificate store.

Follow mitmproxy's Getting Started guide to set it up. For me, it was as simple as apt install mitmproxy and running mitmproxy or mitmweb (for the web interface). Then Setting up a proxy on android in wifi settings (it's under "advanced") and running the official IMDB app.

Public API Endpoints

Before we get any further, IMDB has public undocumented apis that don't need authentication. The only interesting one I found is the search suggestions api.

curl -X GET "https://v3.sg.media-imdb.com/suggestion/a/<slug>.json"

with <slug> being the urlencoded title of what you want to find or the imdb id. The response is self-explanatory. Most interesting is the qid property which helps with filtering extra fluff. Here's what I got searching for rick and morty.

{
    "d": [
        {
            "i": {
                "height": 1920,
                "imageUrl": "https://m.media-amazon.com/images/M/MV5BZjRjOTFkOTktZWUzMi00YzMyLThkMmYtMjEwNmQyNzliYTNmXkEyXkFqcGdeQXVyNzQ1ODk3MTQ@._V1_.jpg",
                "width": 1280
            },
            "id": "tt2861424",
            "l": "Rick and Morty",
            "q": "TV series",
            "qid": "tvSeries",
            "rank": 78,
            "s": "Justin Roiland, Chris Parnell",
            "y": 2013
        },
        ...
    ],
    "q": "rick%20and%20morty",
    "v": 1
}

Authenticated API Endpoints

Pretty much everything else needs authentication.

  1. Get temporary credentials

    Getting temporary credentials is the easy part. All you need is the appKey and curl. I suspect appKey is specific to IMDB android app.

    curl -X POST -d '{"appKey": "c2a5f61b-8dea-44bc-b739-db7937519f4e"}' https://api.imdbws.com/authentication/credentials/temporary/android860

    What you get back is everything you need to authenticate against "AWS Data Exchange" database or whatever it is. Apparently it's the "new" api.

    {
        "@meta": {
            "operation": "GetTemporaryCredentials",
            "requestId": "xyz-123-xyz-123",
            "serviceTimeMs": "1.23"
        },
        "resource": {
            "@type": "imdb.api.auth.credentials.temporary",
            "accessKeyId": "<alphanumeric>",
            "expirationTimeStamp": "1999-12-30T03:00:00Z",
            "secretAccessKey": "<alphanumeric>",
            "sessionToken": "<base64 encoded string>"
        }
    }

    Everything under "resource" will be handy when doing authentication. The sessionToken will be the value of x-amz-security-token, and accessKeyId will be part of x-amzn-authorization header of future requests.

  2. Authenticate

    This is the difficult part. There are certain headers that are required for authorization against AWS Data Exchange. Amazon has documentation for the expected values of x-amz-date, x-amz-security-token, and x-amzn-authorization.

    x-amzn-sessionid: 942-1698069-8532063
    x-amz-date: <ISO-8601 date format>
    x-amz-security-token: <ALPHANUMERIC>
    x-amzn-authorization: <specific format>
    

    In addition, the following headers are informational.

    user-agent: IMDb/8.7.0.108700400 (Fairphone|FP3; Android 29; Fairphone) IMDb-flg/8.7.0 (1080,2016,422,428) IMDb-var/app-andr-ph
    accept: application/vnd.imdb.api+json
    
    1. x-amz-date

      This the ISO-8601 formatted date.

      headers["x-amz-date"] = datetime.datetime.today().isoformat()
    2. x-amz-security-token

      Another given one. This is the sessionToken from the credentials we got earlier

      headers["x-amz-security-token"] = credentials["sessionToken"]
    3. x-amzn-authorization

      At last, the beast! It took a bit of searching to find the documentation page, and it's not obvious how SWF relates to Data Exchange. But the clue is the format of the header value between what is sent by IMDB and what is expected when looking at the docs. Let's break it down.

      It starts with AWS3 as a tag, followed by AWSAccessKeyId which we get from accessKeyId from the credentials. The Algorithm is always HmacSHA256. Then there is a Signature and SignedHeaders.

      The way this header works is that we construct a string including some information about the request being sent (let's call it string_to_sign), and sign it. That's our Signature. The headers that we included in string_to_sign then are listed in full under SignedHeaders. This extra information from the app's communication helps us figure out what headers we need to include. A sample x-amzn-authorization is as follows.

      AWS3 AWSAccessKeyId=ASIAYOLDPPJ6WMOMECUF,Algorithm=HmacSHA256,Signature=1meBNRwYsk+HVziftdJ/8Bpb1F9DG82Ss6dLLzlKHGk=,SignedHeaders=host;x-amz-date;x-amz-security-token;x-amzn-sessionid
      

      To recreate this, we have the x-amz-date, x-amz-security-token, and even x-amzn-sessionid which we can copy from the app. But what is the host?

Down the rabbit hole

The host is not evident from the requests that are being sent. This is where we need to get to the source. The next step then is to get apktool, dex2jar, and jd-gui2 and disassemble the imdb apk.

In jd-gui, a search for X-Amzn-Auth (note the capital letters) reveals RedactedHeaders class - aptly named.

arrayList.add("x-amz-security-token");
arrayList.add("X-Amzn-Authorization");
arrayList.add("x-imdb-authentication");
arrayList.add("x-imdb-map-authentication");
arrayList.add("x-imdb-map-authentication-token");

The most interesting function is public String sign(). The argument names are exposed by the calls to kotlin.jvm.internals.Intrinsics.checkNotNullParameter. I've transcribed the code into python for no good reason at all other than for my own understanding.

public String getStringToSign() {
    headers.put("host", hostname);
    join
}

def sign(hostname: str,
         method: str,
         path: str,
         headers: Dict[str, str],
         params: Dict[str, str],
         array_of_bytes: List[int],
         credentials: ZuluTemporaryCredentials):
    # getStringToSign(hostname, method, path, headers, params)
    headers["host"] = hostname
    stringToSign = "".join(method,
                           "/" + urllib.parse.urlencode(path), # ZuluSigningHelper.getCanonicalizedResource
                           urllib.parse.urlencode(sorted(params)), # ZuluSigningHelper.getCanonicalizedQueryString
                           "\n".join(["%s:%s" % (k, headers[k]) for k in sorted(headers.keys())]))[:30] # ZuluSigningHelper.canonicalHeaders


    # ZuluSigningHelper.hash(stringToSign, array_of_bytes)
    digest = hashlib.sha256()
    digest.update(stringToSign.encode("UTF-8"))
    digest.update(array_of_bytes)
    hashedStringToSignWithBody = digest.hexdigest()


    signature = calculateSignature(hashedStringToSignWithBody, credentials["secretAccessKey"])
    canonicalHeaders
    # ZuluSigner.getAuthorizationHeader(headers, signature, credentials)
    authorization_header = f"AWS3 AWSAccessKeyId={credentials['accessKeyId']},Algorithm=HmacSHA256,Signature={signature},SignedHeaders={ZuluSigningHelper.canonicalHeaderKeys(headers)}"[:62]



def getStringToSign(hostname, method, path, headers, params):
    pass #blah blah blah

Looks like there is more that we are missing. Next is ZuluSigningInterceptor which looks very interesting. A search for ZuluSign reveals a world of wonder: ZuluSigner, which includes the methods getAuthorizationHeader and getStringToSign.

getAuthorizationHeader starts with "AWS3 AWSAccessKeyId" followed by getAccessKeyId which is accessKeyId. Then Algorithm=HmacSHA256. Signature is calculated by ZuluSignatureCalculator.calculateSignature which is passed to ZuluSigner from somewhere, and SignedHeaders is taken from ZuluSigningHelper.canonicalHeaderKeys

Importantly, in the real requests, SignedHeaders includes only host;x-amz-security-token;x-amzn-sessionid.

At this point, I could guess the value of host and canonical resource path used to make the signature from the requests that are being made. I'd guess host is api.imdbws.com and the resource path is the url the request is being made to, e.g. /template/imdb-android-writable/8.7.title-persisted-metadata.jstl/render. I would also have to play around with parameters that may be passed and I have no idea where to even look. That's too much guess work.

Frida and instrumentation

Why guess when you can observe. I had never used frida instrumentation tools before, so it was a fun exercise. Install Frida on your phone, then install frida-tools on your computer. Follow the official instructions to set it up. My phone is a rooted LineageOS phone. I had to setenforce 0 as root in termux to allow frida-server to run.

Once frida is set up, we can begin experimenting. frida-ps -U shows a list of running apps, but it shows IMDB app as IMDb. To get the package name, run frida-ps -U -a -i (see frida-ps -h for help). This helpfully returns com.imdb.mobile. Then, to run any frida script against IMDB app, run:

frida -U -f com.imdb.mobile -l myscript.js

myscript.js is the frida hook we'll write to monitor calls to our target functions. Reading through the javascript api reference, and bit of searching around, I eventually got to this script:

Java.perform(function() {
    var calculatorActivity = Java.use("com.imdb.webservice.requests.zulu.ZuluSigner");
    calculatorActivity.getStringToSign.implementation = function(a, b, c, m, l) {
        var retval = this.getStringToSign(a, b, c, m, l);
        console.log("---BEGIN---");
        console.log(retval);
        console.log("--- END ---");
        return retval
    };
});

Read through the documentation for details, but essentially this script tries to replace the implementation of the getStringToSign function with our function here, which prints the return value (and returns it so the app can continue functioning). This frida thing is magic!

As a side-note, the following snippet will be helpful to explore what classes you have access to while using frida (to be used with Java.use). I used it to make sure I'm catching the right class with Java.use.

Java.enumerateLoadedClasses({
    onEnter: function(className) {
        if (className.startsWith("com.imdb.webservice")) {
            console.log(className);
        }
    },
    onComplete: function() {}
});

The output is quite helpful and answers the remaining question.

GET
/template/imdb-android-writable/8.7.app-config.jstl/render

host:api.imdbws.com
x-amz-date:Thu, 15 Sep 2022 03:39:07 GMT
x-amz-security-token:somelonghexstringwhichwealreadyknowtheoriginof
x-amzn-sessionid:123-1231233-1231233


The above info is exactly what we saw from the requests, but now we know. The rest is just implementing the rest of the signing procedure in python, an exercise left to the reader.

Footnotes


  1. I had to mount -o rw,remount / to mount /system as read-write.↩︎

  2. I ended up switching over to jadx-gui since it seems to have better argument name elision, something I thought I had to do by hand (hence the upcoming python code).↩︎