Series Posts

· 11 min read Posted by Kevin Galligan

Offline Syncing

I will prep this post by saying the obvious to anybody who’s had to do it.  Offline operation with data syncing is difficult.  By order of difficulty, apps and data generally go like this:

  1. Offline only.  No server communication.

  2. Read-only.  Just pull data.

  3. Write-only (or read/write. similar). Pushing data requires dealing with errors and connectivity problems.

  4. Read/write, with offline operation.

This scale is not representative.  If 1-3 roughly represented increase in difficulty, that “4” should be more like “9”.  Not with all apps, of course, but as the app gets more complex, offline sync complexity feels somewhat exponential.

Usually syncing involves a private server API.  If you’re lucky you’ll be the second platform (if you’re an Android developer, that should be pretty common ;).  Even then, the server calls are usually programmed “good enough”.  There are often little quirks here and there. The funny thing about building the front end of a service, whether its the app or server having trouble, the story is “hey, your app is broken!”. You generally need to prove otherwise.

So, hard.

This is a big topic.  I’m sure I’ll miss some concepts.  Most, really, but I’ll give it my best from personal experience.

Methods of Offline Syncing

I’ll be talking about 2 basic methods: state-based and command-based. State-based (SB) is when you flag records in your database that need to be updated, and query for them at update time.  Command-based (CB) involves adding specific commands to a queue and processing them in order.

State-Based

SB is conceptually simpler.  Add a “dirty” field to a table, or possibly an “updatedAt” field.  When its time to sync, query the DB for records that are marked as needing an update.  Send update, get current version back from server, update local record (and clear flag field).

Pros: Simple idea. Fairly simple to implement, although you need delayed error handling and reporting.  Also need to compensate for rare, but potential double-posts of the same field (particularly a problem with adds.  Updates aren’t as big of a deal).  Directly linked to the actual data, so if you want to provide some type of update UI, its (maybe) simpler.

Cons: Custom management code. We could build a framework, but you’re still doing a lot of the heavy lifting.  No discrete updates on a single entity. If a table is “updated”, unless you do some REALLY painful data management, when you sync with the server, its the whole entity.  Also, if you make one update now, and one later, they’re still just one “update”.  If order update is important, this can be problematic.  Say you create an account, then add a product to it, but before the sync happens, you also update the account. You’ll need to be very careful that the account creation call happens before the product creation.  Also, you can’t kick off operations that don’t involve tables, or again you’ll be doing extra management code.

I don’t want to sound unduly down on SB.  If your model is simple enough, SB is probably simpler.  Also, if you’re using something more complex (like CB), you need to be VERY careful that you don’t drop commands.  This gets complex when networking is involved.

There’s sort of a variant on SB that I’ll mention. Database systems that include the ability to automatically sync data.  There are situations where this might make sense, but it seems like you’d have a lot of issues when the business logic is remotely non-trivial. I can also imagine a situation where a malicious user decides to blow up your system by inserting piles of data under the hood. Finally (finally?), you’ll need to convince “management” that the back end should use a custom database and business logic to support this.  Unlikely in most situations.

Command-Based

CB involves posting commands to a queue and processing them at sync time.  This involves more setup work, and you need to be VERY careful about not dropping commands (you won’t know that something local didn’t make it to the server. Bad).  You’ll also need to “encode” the commands. By that I mean you’ll need some way of defining the command, and a way to store the data related.

The ultimate point of this post is introducing the Superbus. Our project that allows you set up a sync bus in your app.

https://github.com/touchlab/Superbus

You choose your PersistenceProvider implementation (that’s how you store the commands), set up the sync service, and start busting out commands. Right now the best persistence choice is file-based with json, but we will have a SQLite storage mechanism coming soon.  SQLite is preferred if you’re using SQLite (you should be) because you can run your data update and command storage in a single transaction.

The merits of CB

Pros: The queue preserves order of updates. Updates can be fine-grained, so the add/add/update scenario above wouldn’t be an issue (unless the initial add fails). If you encode updates to individual fields, you can update only the modified parts of an entity. You can create commands that don’t correspond directly to a DB table. Uploading an image, for example.  Increasing app complexity is more approximately linear.  If you create a new set of entities, you should be able to simply add them.  With SB, you may need to worry about more dependency issues.

Cons: Harder to set up initially. May be overkill for simpler apps.  You’ll need to manage error conditions with your own code (probably a pro. With simple SB, you’ll have a REALLY hard time rolling things back). Currently process in serial, so slow commands delay updates.  Dropped commands will leave you with local updates and no way to know they haven’t gone to the server.

Assuming the command bus doesn’t drop commands on its own, and your code is careful, you shouldn’t ever drop a command. Try not to lose sleep over it, though. We’ve been spending a lot of time agonizing over the bus code, and its open source, so I’m hoping we get some eyeballs on it.

Using The Command Bus

This will be a short intro.  A longer one will be coming later in the week. Here are the basic parts of the Superbus.

co.touchlab.android.superbus.SuperbusService

This is the main processor code.  When initiated, it will grab current commands one by one, and attempt to process them.

co.touchlab.android.superbus.provider.PersistenceProvider

This is an interface that allows stores commands, and allows the service to query for them.  Current implementations are MemoryPersistenceProvider, which only holds commands in memory (you don’t care if they disappear when the app shuts down), and file-based.  Right now there’s only AbstractFilePersistenceProvider. We’ve implemented one with Json for a project we’re about to release, and will roll that into the bus code asap.

co.touchlab.android.superbus.Command

This is the bulk of what you’ll be implementing.  Each Command class definition implements the logic of your command.

TransientException and PermanentException

Understanding these is critical for using the bus. Transient issues are things like network connectivity. You expect that things will eventually be sorted out. When your command throws a TransientException, the bus will periodically retry to sync, and eventually go to sleep for a while. You will not lose the command, and order will be preserved.

Permanent issues are things like null pointers, local database issues, no file space on the device, etc.  Your app code was not expecting the issue, and doesn’t know what to do with it.  This command will be removed.  You’ll probably want to do something when this happens (there’s a callback for that).

Some really important things to grasp.

  • If wrap all your code and always return a TransientException, you can shut down your app. You may have had an unanticipated issue, like a null pointer, or a missing file. In the current implementation, we NEVER throw away issues that throw this type of exception.
  • If you’re not careful with your networking code, and you throw PermanentException (or don’t catch and some sort of RuntimeException is throw), your commands will be removed when they didn’t need to be.

Since the only obvious use case for TransientException is a network issue, we’ve provided some helper code.  To make life easier, we used a wrapper for http calls.  Look here: basic HTTP client for Android.  There’s a custom HttpClient called BusHttpClient.  In the code, network issues throw a TransientException.  Status codes of 400 or greater throw PermanentException.  !!!This is totally new!!! Very little testing as of today, so be careful.  We may need to tweak what status codes return which type of exception.

I’ll going to put a little mental bookmark here and wrap this up later. Now for a little bit about testing.

Testing Offline Sync

This won’t be so much about “testing” as error reporting, but a little discussion about testing.

To test offline syncing, the basic method is pretty obvious. Put the phone in airplane mode, do some stuff, and turn it back on. The normal issues we run into are add/updates mixing with each other. If you’re online when you add an entity, and then update it, you’ll probably have the server id available, but if you’re offline, you may not. (should also do a “best practices” discussion about managing local and remote ids. Short version, you’ll probably try to have 1 id, but in almost all cases, having a local id and a remote id just works out better).

The really tricky thing with offline data, we’ve found, is figuring out what happened. Most error reporting tools will give you a stack trace and some info about the device. I would call this the “what” of the crash. They don’t give you too much about the “how/why”.

We built something internally to help with this, then spent some time making it pretty-ish, but ultimately haven’t decided to plunge into the “error reporting” field. Kind of crowded. However, the plan is to open it up and see if you get any benefit out of it.

https://touchtrack.co/

You embed the client, write data to it as the app runs, and trigger reports (either manually, or in a default exception handler). “Data” is both log statements and actual files.

Memory Log

The log is a rolling memory buffer. In a much earlier version we were listening to logcat, but this requires a special permission, which angers people, is potentially a security issue, and will also give you a mess of log statements that have nothing to do with your app.

The log contains the usual log levels, but also contains a special one called ‘ua’, short for “User Action”.  These are displayed with a special background color. They are reserved for something triggered by the user. You’ll see “user clicked ‘Add’ button” mixed in with processing steps, to help you identify what happened.

Data Files

Data files are written to a local file cache.  They are periodically removed, based on age. The basic use for data files are for data passed back and forth to the server. For example, write request and response json data. Data files will have a download link on the report, and if the format is understood, can be viewed inline (json, xml, etc).

Data files can also be added at exception time, by adding a callback. During your testing period, you can add lots of stuff. Database file, screenshots, etc. In production, your users probably wouldn’t be excited about this. Either skip it in production mode, or ask permission. You can switch the mode of the app from the web panel, so when a particular version is pushed to the app store, simply check version and ignore the “big stuff”.

Another method would be to check if the app came from the app store, and consider that production.

A suggestion for offline sync would be to trigger a report upload when a PermanentException is returned from the bus processor. In situations where there is an issue with data interactions with the server, this kind of info could be invaluable.

One of our past clients had a very brittle data api. Formats for very similar data differed depending on context, and international support (date/money) was kind of a mess. Occasionally, fields that were considered mandatory were returned empty. That kind of thing.

The initial testing plan was to have the user report the issue, then ask for permission to log in as them (with their password) to test. Wrong on many levels.  Basic stack traces would be better than nothing, but to clearly demonstrate to the back end developers that the issue came from the server, we needed some more detail. The proto TouchTrack was born.