Overview
IMDB doesn't provide their API publicly and if you really need to use their APIs, you'd need to pay thousands of freedom currency units per year. That's way too much. I certainly don't make that much money to pay for this. But I also kind of want to use their API. Looks like we need to figure out how they use their own API first.
To do this, we need to install mitmproxy. It's a fantastic tool to observe network traffic. Just make sure to install the CA certificate in system store1.
Fortunately, IMDB doesn't pin certificates, which means installing mitmproxy
's certificate in system store can successfully decrypt the connection from the IMDB app. Having the certificate in the system store is necessary as since android 7, app ignore user certificate store. When you install a certificate through the usual means on android, it installs it in the user certificate store.
Follow mitmproxy
's Getting Started guide to set it up. For me, it was as simple as apt install mitmproxy
and running mitmproxy
or mitmweb
(for the web interface). Then Setting up a proxy on android in wifi settings (it's under "advanced") and running the official IMDB app.
Public API Endpoints
Before we get any further, IMDB has public undocumented apis that don't need authentication. The only interesting one I found is the search suggestions api.
curl -X GET "https://v3.sg.media-imdb.com/suggestion/a/<slug>.json"
with <slug>
being the urlencoded title of what you want to find or the imdb id. The response is self-explanatory. Most interesting is the qid
property which helps with filtering extra fluff. Here's what I got searching for rick and morty
.
{"d": [
{"i": {
"height": 1920,
"imageUrl": "https://m.media-amazon.com/images/M/MV5BZjRjOTFkOTktZWUzMi00YzMyLThkMmYtMjEwNmQyNzliYTNmXkEyXkFqcGdeQXVyNzQ1ODk3MTQ@._V1_.jpg",
"width": 1280
,
}"id": "tt2861424",
"l": "Rick and Morty",
"q": "TV series",
"qid": "tvSeries",
"rank": 78,
"s": "Justin Roiland, Chris Parnell",
"y": 2013
,
}...
,
]"q": "rick%20and%20morty",
"v": 1
}
Authenticated API Endpoints
Pretty much everything else needs authentication.
Get temporary credentials
Getting temporary credentials is the easy part. All you need is the
appKey
andcurl
. I suspectappKey
is specific to IMDB android app.curl -X POST -d '{"appKey": "c2a5f61b-8dea-44bc-b739-db7937519f4e"}' https://api.imdbws.com/authentication/credentials/temporary/android860
What you get back is everything you need to authenticate against "AWS Data Exchange" database or whatever it is. Apparently it's the "new" api.
{"@meta": { "operation": "GetTemporaryCredentials", "requestId": "xyz-123-xyz-123", "serviceTimeMs": "1.23" , }"resource": { "@type": "imdb.api.auth.credentials.temporary", "accessKeyId": "<alphanumeric>", "expirationTimeStamp": "1999-12-30T03:00:00Z", "secretAccessKey": "<alphanumeric>", "sessionToken": "<base64 encoded string>" } }
Everything under
"resource"
will be handy when doing authentication. ThesessionToken
will be the value ofx-amz-security-token
, andaccessKeyId
will be part ofx-amzn-authorization
header of future requests.Authenticate
This is the difficult part. There are certain headers that are required for authorization against AWS Data Exchange. Amazon has documentation for the expected values of
x-amz-date
,x-amz-security-token
, andx-amzn-authorization
.x-amzn-sessionid: 942-1698069-8532063 x-amz-date: <ISO-8601 date format> x-amz-security-token: <ALPHANUMERIC> x-amzn-authorization: <specific format>
In addition, the following headers are informational.
user-agent: IMDb/8.7.0.108700400 (Fairphone|FP3; Android 29; Fairphone) IMDb-flg/8.7.0 (1080,2016,422,428) IMDb-var/app-andr-ph accept: application/vnd.imdb.api+json
x-amz-date
This the ISO-8601 formatted date.
"x-amz-date"] = datetime.datetime.today().isoformat() headers[
x-amz-security-token
Another given one. This is the
sessionToken
from the credentials we got earlier"x-amz-security-token"] = credentials["sessionToken"] headers[
x-amzn-authorization
At last, the beast! It took a bit of searching to find the documentation page, and it's not obvious how SWF relates to Data Exchange. But the clue is the format of the header value between what is sent by IMDB and what is expected when looking at the docs. Let's break it down.
It starts with
AWS3
as a tag, followed byAWSAccessKeyId
which we get fromaccessKeyId
from the credentials. TheAlgorithm
is alwaysHmacSHA256
. Then there is aSignature
andSignedHeaders
.The way this header works is that we construct a string including some information about the request being sent (let's call it
string_to_sign
), and sign it. That's ourSignature
. The headers that we included instring_to_sign
then are listed in full underSignedHeaders
. This extra information from the app's communication helps us figure out what headers we need to include. A samplex-amzn-authorization
is as follows.AWS3 AWSAccessKeyId=ASIAYOLDPPJ6WMOMECUF,Algorithm=HmacSHA256,Signature=1meBNRwYsk+HVziftdJ/8Bpb1F9DG82Ss6dLLzlKHGk=,SignedHeaders=host;x-amz-date;x-amz-security-token;x-amzn-sessionid
To recreate this, we have the
x-amz-date
,x-amz-security-token
, and evenx-amzn-sessionid
which we can copy from the app. But what is the host?
Down the rabbit hole
The host
is not evident from the requests that are being sent. This is where we need to get to the source. The next step then is to get apktool
, dex2jar
, and jd-gui
2 and disassemble the imdb apk.
In jd-gui
, a search for X-Amzn-Auth
(note the capital letters) reveals RedactedHeaders
class - aptly named.
arrayList.add("x-amz-security-token");
arrayList.add("X-Amzn-Authorization");
arrayList.add("x-imdb-authentication");
arrayList.add("x-imdb-map-authentication");
arrayList.add("x-imdb-map-authentication-token");
The most interesting function is public String sign()
. The argument names are exposed by the calls to kotlin.jvm.internals.Intrinsics.checkNotNullParameter
. I've transcribed the code into python for no good reason at all other than for my own understanding.
public String getStringToSign() {"host", hostname);
headers.put(
join
}
def sign(hostname: str,
str,
method: str,
path: str, str],
headers: Dict[str, str],
params: Dict[int],
array_of_bytes: List[
credentials: ZuluTemporaryCredentials):# getStringToSign(hostname, method, path, headers, params)
"host"] = hostname
headers[= "".join(method,
stringToSign "/" + urllib.parse.urlencode(path), # ZuluSigningHelper.getCanonicalizedResource
sorted(params)), # ZuluSigningHelper.getCanonicalizedQueryString
urllib.parse.urlencode("\n".join(["%s:%s" % (k, headers[k]) for k in sorted(headers.keys())]))[:30] # ZuluSigningHelper.canonicalHeaders
# ZuluSigningHelper.hash(stringToSign, array_of_bytes)
= hashlib.sha256()
digest "UTF-8"))
digest.update(stringToSign.encode(
digest.update(array_of_bytes)= digest.hexdigest()
hashedStringToSignWithBody
= calculateSignature(hashedStringToSignWithBody, credentials["secretAccessKey"])
signature
canonicalHeaders# ZuluSigner.getAuthorizationHeader(headers, signature, credentials)
= f"AWS3 AWSAccessKeyId={credentials['accessKeyId']},Algorithm=HmacSHA256,Signature={signature},SignedHeaders={ZuluSigningHelper.canonicalHeaderKeys(headers)}"[:62]
authorization_header
def getStringToSign(hostname, method, path, headers, params):
pass #blah blah blah
Looks like there is more that we are missing. Next is ZuluSigningInterceptor
which looks very interesting. A search for ZuluSign
reveals a world of wonder: ZuluSigner
, which includes the methods getAuthorizationHeader
and getStringToSign
.
getAuthorizationHeader
starts with "AWS3 AWSAccessKeyId"
followed by getAccessKeyId
which is accessKeyId
. Then Algorithm=HmacSHA256
. Signature
is calculated by ZuluSignatureCalculator.calculateSignature
which is passed to ZuluSigner
from somewhere, and SignedHeaders
is taken from ZuluSigningHelper.canonicalHeaderKeys
Importantly, in the real requests, SignedHeaders
includes only host;x-amz-security-token;x-amzn-sessionid
.
At this point, I could guess the value of host
and canonical resource path used to make the signature from the requests that are being made. I'd guess host
is api.imdbws.com
and the resource path is the url the request is being made to, e.g. /template/imdb-android-writable/8.7.title-persisted-metadata.jstl/render
. I would also have to play around with parameters that may be passed and I have no idea where to even look. That's too much guess work.
Frida and instrumentation
Why guess when you can observe. I had never used frida instrumentation tools before, so it was a fun exercise. Install Frida on your phone, then install frida-tools
on your computer. Follow the official instructions to set it up. My phone is a rooted LineageOS phone. I had to setenforce 0
as root in termux
to allow frida-server
to run.
Once frida is set up, we can begin experimenting. frida-ps -U
shows a list of running apps, but it shows IMDB app as IMDb
. To get the package name, run frida-ps -U -a -i
(see frida-ps -h
for help). This helpfully returns com.imdb.mobile
. Then, to run any frida script against IMDB app, run:
frida -U -f com.imdb.mobile -l myscript.js
myscript.js
is the frida hook we'll write to monitor calls to our target functions. Reading through the javascript api reference, and bit of searching around, I eventually got to this script:
.perform(function() {
Javavar calculatorActivity = Java.use("com.imdb.webservice.requests.zulu.ZuluSigner");
.getStringToSign.implementation = function(a, b, c, m, l) {
calculatorActivityvar retval = this.getStringToSign(a, b, c, m, l);
console.log("---BEGIN---");
console.log(retval);
console.log("--- END ---");
return retval
;
}; })
Read through the documentation for details, but essentially this script tries to replace the implementation of the getStringToSign
function with our function here, which prints the return value (and returns it so the app can continue functioning). This frida thing is magic!
As a side-note, the following snippet will be helpful to explore what classes you have access to while using frida (to be used with Java.use
). I used it to make sure I'm catching the right class with Java.use
.
.enumerateLoadedClasses({
JavaonEnter: function(className) {
if (className.startsWith("com.imdb.webservice")) {
console.log(className);
},
}onComplete: function() {}
; })
The output is quite helpful and answers the remaining question.
GET
/template/imdb-android-writable/8.7.app-config.jstl/render
host:api.imdbws.com
x-amz-date:Thu, 15 Sep 2022 03:39:07 GMT
x-amz-security-token:somelonghexstringwhichwealreadyknowtheoriginof
x-amzn-sessionid:123-1231233-1231233
- Notes
- Two empty lines at the end, indicating the requests don't include a body
- Lack of parameters in the string to sign
- No spaces around colon (separates header from value)
The above info is exactly what we saw from the requests, but now we know. The rest is just implementing the rest of the signing procedure in python, an exercise left to the reader.