Recently at Marvel we launched our Platform API. We wanted to harness the power of integrations and place it directly into the hands of the creative community of people that use Marvel every day, so that we can work more seamlessly -work- with the other tools that serve them on a daily basis.
Here I'll write about some of the thought processes we went through when we set out to build it, explain some of the problems with other APIs and their potential solutions, and cover why we made the choice to use GraphQL over REST despite the team being vastly less experienced with GraphQL.
What makes a good API?
If we were ever to make a great API of our own, we would need to think about what makes a great API in the first place, so we identified some good practices for designing any API, whether that’s on the web or not
- Great documentation. Even the most life-changing tools are useless to me if I can’t figure out how to get them to work.
- No needless complexity. Needless complexity is for bureaucrats. We should prefer simplicity always.
- A lack of surprises. Getting caught out because something works in ways you didn’t expect, or because it has an unexpected side-effect is no fun. Things should be as straightforward as they can be.
We can think of achieving these points as ways of nailing the developer experience. This should make sense to us - when we design our app and web interfaces we try to keep the user’s experience front of mind; an API is a user interface with a specific developer audience, and it requires we do the same.
Where are other APIs lacking?
Most web APIs of the last 5+ years have been REST(ish) and talking json, rather than the slightly older fashioned SOAP, RPC or custom XML endpoints. Generally this is seen to be a good thing, and we’ve seen a proliferation of APIs as devices become smart and services have begun to work tightly together.
The problem is that whilst json has become the standard way to send data between web APIs, each API is unique in its structure and its capabilities, and there has been no standard way of describing itself. When every API is different to the last it isn't possible to build tooling that works for them all.
So whilst the API you’re using today may be perfectly well supported with good documentation and even some handy tools like editor integrations or client libraries, those tools you’re used to probably won’t work on the API you’re working with the next week, or will be too generic to help you with the unique parts of it if they do. In reality it usually means you don’t have any tooling at all.
Mobile networks & devices
Another problem has emerged as more and more web traffic has transitioned to mobile devices, which are often on unreliable networks with high-latency and low bandwidth.
A REST API will commonly require the client to hit multiple endpoints to gather data for the many resources need to render a certain view (say, fetching the user’s profile as well as their favourite recipes in two separate calls).
Because mobile networks are so much more unreliable than other networks, the chances of failure are much higher when multiple requests must be made. This makes it much more likely that you’ll end up with partial data, which leaves you unable to render or having to render placeholders whilst you retry the request.
As bad as that is, latency is the big performance killer when working with the network. Latency is the time it takes for data to be transferred between two devices - such as your phone and Marvel's servers. When the latency between a device and the server is 50ms, it means it takes 50ms for data to be sent between the device and the server, and the RTT (round trip time) would be 100ms. 100ms doesn’t sound like a lot, but you have to consider that it takes much more than 1 round trip to send a single HTTP request. There’s DNS lookups, TLS encryption handshake, as well as the three-way TCP handshake required before you can actually send or receive any data, all of which require one or more round trips. The TCP protocol we use to move bits around on the web also requires constant acknowledgement of received data so that any lost packets can be retransmitted. Sadly these acknowledgements obey the laws of physics and so they incur the cost of the round trip too, which puts a cap on the network throughout which is directly tied to network latency.
On a high latency network, it might take up to a second or more for a packet to make a round trip. Network latency is largely out of our control as application developers, so we need to instead try to reduce its impact by reducing the number of requests we make.
Another issue disproportionately impacting mobile devices is over fetching. An endpoint will generally return a full representation of a resource, but chances are the client doesn’t necessarily need or want all of that data, most of the time they’re interested in a subset. Nonetheless, the server will still send this extraneous data and the client has no say in the matter.
When fetching data from an API endpoint, chances are the client doesn’t want or need all of the data that it returns, yet it still has to pay the cost of transferring and processing it which is a waste of bandwidth and CPU cycles, causing longer response times for no benefit.
What solutions exist for these issues?
There are a few open source projects attempting to solve some of these issues for REST APIs. Namely, swagger (OpenAPI) and API Blueprint. We already had a little in house experience with swagger (and not a particularly happy one, yaml can be hell) and tried out API Blueprint when adding new endpoints to our existing API, but found that small errors in the MSON documents could be difficult to track down (not much better than yaml then).
The promise of these tools is good though, and there’s one thing they get right: they force you to define a schema for your API. A schema acts as documentation for your system’s boundaries, and system boundaries are often where integration pain is going to be felt most. Once you’ve got a schema, you can use it to do all sorts of cool stuff: generate documentation pages, spin up mock servers for development, even generating code for API clients if you are so inclined. Also, because the schema is specified in a standardised way, tooling built for one API is portable between any API using the same standard. Importantly, because everything is defined up front, consumers of your API know exactly what is expected of them and what they can expect from you in return.
This is something that GraphQL gets right too. The GraphQL server defines a schema containing all the data types, mutations and queries available to the clients. This way everybody knows where they stand. I can immediately see exactly which operations are available to me, which types they accept as inputs and which types they return as an output.
Whilst swagger and API Blueprint bring a schema to the request and response bodies, there’s nothing they can do for the over-fetching and multiple request problems. If you were to solve those issues in your API, it would be something custom and non-standardised, unique to an individual API. Typically you’d end up writing an endpoint for a specific view. For the example mentioned earlier, this means you’d have an endpoint that returns the user and their recipes all at once. That’s for a single view though, how many unique views are in your application? This could get messy.
So why GraphQL?
Unlike REST, one of the core ideas of GraphQL is related to solving these issues. Every call returns only the data that is specified in the request. This has a few benefits. Firstly it completely solves the over-fetching problem by putting it entirely in the client’s hands. Secondly, it removes the need for multiple requests by allowing the client to ask for everything they need all at once. The third advantage of this approach is that it allows us as the operators of the API to see exactly who is requesting each field, something which makes it a lot easier for us to deprecate fields as time moves on. We’re now able to mark a field as deprecated, which lets new integrators know not to use it, and then grab a list of all the existing integrations and get in touch with them to offer an upgrade path.
As I alluded to earlier, tooling was also a factor in our decision. All the solutions discussed above have some similar tooling available and relatively healthy communities around them, but GraphQL seems to go above and beyond the other two. Coming out of Facebook, GraphQL has mostly been adopted within the react community, which is very large and very active. This means there’s loads of great tools available to make the developer experience as good as it can be.
For example, it’s possible to hook up your editor to your schema and have it lint your queries, flagging any errors before you get a chance to execute them. Integrate it with autocomplete and you can have it make suggestions as you type.
One of my favourite tools is GraphiQL, which is an IDE built for GraphQL APIs that runs in a browser. As you can see below, GraphiQL gives you somewhere to write your queries (with features you’d expect like syntax highlighting and autocompletion), somewhere to see the results, and somewhere to reference the documentation.
It being browser based means we can throw it up somewhere public (as Marvel does) and allow somebody to get hands on experience with our API in a developer friendly environment immediately. It turns out to be a really effective way of getting to grips with GraphQl itself as well as the specifics of an API, and really helps people to hit the ground running.
Alongside the other benefits it brings, this is the primary reason we chose to use GraphQl. It simply allows us to provide a better developer experience than any of the alternatives.