Building a system where users can collaborate on creating and editing content from any device, both online and offline, is hard. It’s so hard that most apps in the wild try to avoid dealing with issues around collaborative content creation by making simplifying assumptions to avoid dealing with these issues. Typically apps assume that the majority of mobile users will be consuming content rather than creating it, and if they are creating content then at least they’ll be doing it online.

Quizlet is different. Millions of K-12 students use Quizlet to study and for most teenagers their smartphone is their main computing device, so content creation simply needs to work on mobile (25% of all study sets are created on our mobile apps out of about a million per week). In addition, teenagers don’t have unlimited data plans. Many students turn off data to prevent getting in trouble with their parents when apps eat up their family plans, so creating content on Quizlet has to work seamlessly online and offline. As an added constraint, students often collaborate when creating study sets so the same content can be edited by multiple students at the same time.

We overcame these challenges by treating our apps like miniature, standalone versions of Quizlet that each have a small subset of the data in Quizlet’s database server. The mobile apps operate just on their own local database, so the apps are fully functional without an internet connection. When the apps come online, they sync their local database with the main Quizlet server via our API and resolve any conflicts that arise.

Underpinning all of this is a redesigned API, which has been serving over 7 million mobile users in production for the past 12 months. In the rest of this article we’ll talk about how to think through designing an API like this. We’ll talk about our vision for our own API, and some of the edge-cases and pitfalls we’ve worked through along the way.


Content creation on iOS and Android

Designing a Robust API

The first step when designing an API and for mobile is to decide on what it needs to do. You can drastically simplify the design of the overall system if you don’t need to support interactive content-creation, or if you can assume that your app won’t work offline.

In our case, we knew our mobile apps needed to be fully functional offline, while being able to sync data back to the website once the users come back online. We needed content to be editable from multiple devices at the same time with minimal clobbering of data. Accomplishing all of this meant designing our apps and API around the idea of syncing changes between the local database of the apps and the main database of our website.

When designing an API it’s a good idea to start out with something standard and modify as needed. This makes it easy for developers to get up to speed quickly, and makes it more likely that external projects will work seamlessly with the API you design.

For us, that meant using REST as a starting point. We settled on 5 standard actions that can be exposed for every resource: Create, Read, Update, Delete, and Index. Create, Read, Update, and Delete are the standard CRUD actions, and Index is an action which returns a paginated list of models. These actions map to the standard GET, POST, PUT, and DELETE HTTP verbs in a REST-like fashion. The index action allows for a querying interface where the app can specify, for example, to only return terms belonging to a given study set, or to only return folders created by a specific user. We also allow bulk requests per resource as long as they share the action. For example, we allow a multi-show of study sets in the same request, or multi-create of terms in a study set.

We made the API map to our underlying database tables as much as possible. This means we explicitly expose all join tables, and use the same column names on our backend database, so that the mobile app can closely mimic the production website’s schema. This gives us the flexibility to implement anything on the mobile app that can be implemented on the site, and makes it possible to sync any table between the mobile app and the production website.

Syncing Between the App and the Backend

Syncing is the core of any offline app. When an app comes back online after being used offline, it needs to seamlessly sync its state with a remote server. This requires a robust syncing model to avoid losing data during syncing -- a surprisingly difficult problem when edits are made in different places.

Our apps use a “dirty flag” to indicate that data has changed locally on the app and needs to be synced. This flag is set per-attribute of each model, so that models can track which specific fields have been changed since the last sync with the server. For example, if a user modifies the title of a study set on the app, the app will make the change in its local database and set the dirty flag to true for the title field on the affected study set model.

When the app later fetches data from the backend via a GET action (Read or Index), the app first checks if any attribute of model that’s about to be updated has a dirty flag set. If it does, it discards the update from the server for that attribute. This ensures that any changes made to any models in the app’s local database don’t get overwritten until the app has had a chance to inform the server of those changes via an update request.

Whenever the app has an internet connection, it will try to sync any models that have a dirty flag set. This means issuing a Create, Update, or Delete request to the server for the models that are affected. The server will either make the requested changes or reject them, and respond to the client with the server’s authoritative version of the affected models. The app then removes the dirty flags from all synced models, and updates its local database to reflect what was returned by the server.

This technique lets us only sync models that need to be synced, and makes sure that the app has a chance to tell the server about any local data changes before it reads data from the server. This also minimizes the amount of data that gets overwritten if multiple users edit the same content at the same time since conflicts can be resolved on a per-attribute basis. The flow is illustrated below:


Dealing with Deleted Models

When a model gets deleted, there needs to be a way to inform the apps so they can remove that model from their local databases as well. At first glance it’s tempting to try to simply reload the world occasionally in the app and assume that any models which we don’t see between requests must have been deleted. However, this solution is performance-intensive on the apps (they need to loop over their entire database) and requires a lot of network bandwidth.

A better approach is to use soft deletes for all models on the server. That way, if a model is deleted, the server can respond to any requests involving that model with a response indicating it’s been deleted. We use an abbreviated response for these models - just their ID, last modified timestamp, and an isDeleted field.

"id": 42,
"lastModified": 1414433762,
"isDeleted": true

Soft-deletes are useful app-side too, so that when a model is deleted on the app, it can inform the server of the delete when it comes back online.

Ensuring Consistent Pagination

Pagination seems deceptively simple: just chop up your results by a given limit and offset which the app specifies on each request. In practice, there are some subtle gotchas which can occur while paging if you’re not careful. If data is added or deleted while the app is in the middle of paging through results, the pages can shift underneath the app and results can be included twice, or, even worse, not included at all! This dilemma is illustrated below:


To deal with this, we take a snapshot of the full result set when we begin paging and identify this cached result-set with a randomly generated paging token. The API returns that paging token along with the first page of results, and it must be sent along with the request for all subsequent pages. That way, it doesn’t matter if the data changes while the app is in the process of paging because it’s paging over the pre-cached result set. Internally, we implement this by taking all the IDs of the result set and storing them in memcached, keyed by the paging token, with a lifespan of a few minutes.

Dealing with tightly-coupled models

REST works well when each resource is relatively independent from every other resource in the system, and can be created, updated, and deleted without requiring changes to other models in the system. If your data doesn’t conform to this assumption then you will likely run into some sticky edge-cases when using REST. An example which gave us a particular amount of trouble is keeping a sorted list of terms in our study sets. Essentially, each term has a rank which determines its position in the list. No two terms can have the same rank, and whenever a term is inserted, deleted, or re-ordered in the list the ranks of all the other terms need to be updated.

As we talked about earlier, our apps are designed to work offline, and when they come online simply sync their local changes to the server. These syncing operations happen using our bulk create, update, and delete endpoints, and typically happen in parallel. However, this causes indeterminate results when dealing with created, updating, and deleting terms as described above. This is illustrated below:


In the above example, the final positions of the terms in the database depend on the order that the REST operations arrive.

We were able to get around this specific issue by enforcing client-side that any operations dealing with terms must first DELETE, then POST (create) in serial. Furthermore, the apps need to only send terms whose positions were updated directly by the user, not the other terms whose positions were forced to change as a result of a term in front of them being added or deleted. This works because once the deletes have gone through, the positions of the terms being POSTed in bulk are consistent across the client and server.

This is a tradeoff though as it imposes a lot of constraints on the apps, and, worst of all, does not work when existing terms can be reordered in addition to terms being created and deleted. Currently our mobile apps don’t support reordering of existing terms, but if we do add that feature in the future (as is likely) we will need to rethink how this syncing works.

Sticking close to REST has a lot of advantages, but dealing with tightly-coupled models presents a strong case for breaking the REST paradigm and adding an endpoint which can create, update and delete a collection of models in a single request, since that would remove the question of what order CRUD requests are sent during syncing. Another possible solution is to re-architect the underlying data schema in such a way that these inter-model dependencies no longer exist, or to use a separate data structure which is better-suited to problems like this such as CRDTs. However, these solutions could add a lot of additional complexity as well. If you have a better solution to problems like this, let us know!

Next Steps

We’ll keep learning as we implement more and more of the Quizlet experience on mobile, and there’s certainly more gotchas waiting for us as well. If you have experience implementing a cross-platform, offline-capable API at a company you work for, we’d love to hear how you thought about it and what solutions you came up with. And of course, if you want to help us solve problems like this at scale, we’re hiring!

Discuss on Hacker News