· 13 min read Posted by Touchlab

Offline Sync Queue (aka. Superbus)

Quick Note: To avoid name confusion, the project is called “Superbus”. This was intended to be a placeholder name, but has stuck for a bit. When we get a new one, we’ll update.

Find the project here: https://github.com/touchlab/Superbus

The OCQ facilitates offline app operation and syncing. The concept is very simple. Commands are sent to a service-based queue, and are processed when possible. Generally this means, if you have network access, commands are processed immediately. If there’s no network, they’re queued for later.

Being able to operate offline is a feature most apps should have, but it’s complex and error prone. It’s a difficult thing to get right. As a result, many apps simply refuse to function offline. Most first-time mobile developers tend to think you almost always have access to the network, but the real world is not so kind. To deliver a great mobile experience, you should design your apps as if you’re almost always offline, or at least have a shoddy connection.

Offline sync tends to come in two basic flavors: state based (SB) and command based (CB).

SB is simple. Add an “updated” flag to each entity, and use that to figure out what needs to be sent off to the server.

Pros:

  • Simple concept. Just query and get the list.
  • Less error prone. As long as you get a “success” response from the server before clearing the flag, you shouldn’t ever drop an update.

Cons:

  • Course-grained. Hard to figure out what was updated. Can’t just update the email address on an account, unless you implement complex tracking. Also, multiple edits only trigger one update (might be a pro?)
  • Cross-entity dependencies are tricky, and increase complexity as more entities are involved. For example, create account, then add a product to an account. You need to interleave the updates to make sure the account is created first. If you update the account before the server sync, the “updated” flag on the account entity will be newer than the product. Problem.
  • Non-db updates need special case handling. Upload image, for example.
  • Custom coded.

CB is complex to set up (without a framework, of course). As you perform operations, you post “commands” to a queue. These should process in order, but also be persisted so they’ll be available on app restarts.

Pros:

  • Once set up, extending the command set is simple. Complexity doesn’t grow (much) as new commands are added.
  • Command order is implicit. Now if you add an account, then a product, then edit the account, those operations are processed in order.
  • Can be fine-grained. It’s easier to have an “update account” command and send the whole data set, but you COULD have an “update account email” command on its own.
  • Easy to add non-db commands.

Cons:

  • You need to be VERY careful about error conditions. If a command has an error, you need to roll back the local change to avoid client/server mismatch (you have the same potential problem with SB, minus a way to do so locally).
  • The command is disconnected from the entity.  If you do a full “server refresh” of data, you can’t check the “updated” flag, so you can overwrite local changes before they’re sent to the server (we have a solution, but you need to be careful).
  • SB is simply easier to understand. With simple data sets, it may be the best way to go regardless.
  • Current framework runs in serial. Slow commands in front will delay later ones. You can set priority to help with this, and later versions will include limited parallel capability. NOTE: if there’s serious need for certain operations to process in a “slow command” queue, we can probably set that up in the very near future. The obvious use case would be uploading an image. However, it wouldn’t support complex ordering without some refactoring.

I am probably ignoring cons and exaggerating pros for CB, but you get the idea. Assuming the framework functions as expected, using CB allows for more complex offline functionality.

Implementing A Queue

There are several pieces that need to be in place for a queue to function.

SuperbusService

This is the main service class that handles updates.  In general, you don’t need a custom implementation.  Just add this to your AndroidManifest.xml

<service android:name="co.touchlab.android.superbus.SuperbusService" />

PersistedApplication

You need to implement a custom Application instance, and implement the PersistedApplication interface.  This interface provides objects to the Superbus service.

Read here about custom Application classes.

public interface PersistedApplication
{
    PersistenceProvider getProvider();
    BusLog getLog();
    SuperbusEventListener getEventListener();
    CommandPurgePolicy getCommandPurgePolicy();
}

Only the first method, getProvider, is required to return anything.

The “getLog” method returns a custom log implementation. The default simply wraps the standard Android Log.  Why custom? We have a custom logger for TouchTrack, so it made sense. You can safely return null and expect log statements in LogCat.

The “getEventListener” method is callback for queue events.  Hastily added to turn on/off network config event notifications.  Not required.

The “getCommandPurgePolicy” method;  commands that don’t have a “hard” error are returned to the queue. If you’ve miscalculated the “hardness” of your error (ie, it happens every time and will never resolve), your queue could be stuck forever. If you’re not sure how well you’ve classified your errors, you can override this to force out commands with “soft” errors after specified conditions.

PersistenceProvider

The PersistenceProvider is where you add commands, where they’re stored, and where the SuperbusService comes to grab them. You can implement your own, but this is HIGHLY discouraged. There are several implementations provided by default.

Your first decision is between no persistence (memory only), file-based, or sqlite-based. File based has had the most testing so far, and is the simplest (of the persisted options. Memory is WAY simpler). If you’re using SQLite, however, you can coordinate queue adds with transactions, which will make the whole thing more robust. If you’re careful, you should avoid some sticky edge cases that may come up with file-based persistence (temporary edge cases. You shouldn’t lose any data).

Once you’ve decided on storage medium, the next decision is format. Two json-based types are provided, as well as abstract types so you can implement whatever you want. For the json types, there’s raw “JSONObject” versions, which require some hand-stitching, and gson based, which should handle your data automatically.  This will require downloading the gson runtime libs, though.

For example, if you used SQLite and gson, create the GsonSqlitePersistenceProvider, and return it from your PersistedApplication.

BE VERY CAREFUL!!!  In your Application instance, create one instance of PersistenceProvider and hold onto it.  More than one will shut the party down fast.

Command

This is where you get work done.  Command is abstract and requires some methods to be implemented.  Some others are optional, but are recommended.

public abstract String logSummary();

This is for your benefit. Put whatever will help with debugging.

public abstract boolean same(Command command);

Sort of like equals, but softer. Some commands really only need to be added once, like “RefreshAllAccounts”.  If you return true, this command will be considered a duplicate and removed.

public abstract void callCommand(Context context) throws TransientException, PermanentException;

This is where you do your work. Be VERY careful about exceptions in here. If you throw an unchecked exception, or PermanentException, the command will be considered dead and removed. If the issue is something you expect to be resolved, throw TransientException. The command will be returned to the queue. This is almost always a network connection issue. Again, BE VERY CAREFUL. If you return TransientException, but it never resolves, you may block your queue permanently. To avoid this, return a different exception, or implement a CommandPurgePolicy that removes commands after a few retries. Those commands will be treated like PermanentException events.

Figuring out what are temporary network issues, and what are permanent, is difficult. We’ve provided some help by way of the BusHttpClient and the awesome basic-http-client project. See the included example apps for usage.

public void onTransientError(Context context, TransientException exception){}

Called when a transient error occurs. Your command will be put back in the queue, so there’s probably not much to do here.

public void onPermanentError(Context context, PermanentException exception){}

Your command failed for real. You should do something here.  Notify the user, roll back changes, or force a re-sync from the server (destroying local changes). It’s a tough world. These things happen. You must deal with them.

public void onSuccess(Context context) {}

Self-explanatory. I would assume, anyway.

ppublic void onRuntimeMessage(String message)
{
        onRuntimeMessage(message, null);
}

public void onRuntimeMessage(String message, Map args){}

You can send messages to your commands while they are in the queue. The general use case is to notify update procs to cancel if they’re mid-stream, so local changes can be spared. For a detailed example, see the example-sql project.

StoredCommand, SqliteCommand and JsonCommand

Custom versions for their respective PersistenceProviders. If you’re using file-based storage, use StoredCommand.  SQLite, SqliteCommand.  JsonCommand is a special case for the raw JSONObject implementation. Will probably need to tweak this, or make it an interface, as it only supports file-based storage right now (or roll your own…)

Passing a regular old Command instance to a persisted provider will simply put it in memory.  You will lose it on restart, but this may be desired for some command types.

Restarting on Network Connection

Since this whole thing exists for offline sync, you’d obviously want to restart the queue once your connection comes back.  We have 2 things for you:

1. ConnectionChangeReceiver

This receiver can be registered in your AndroidManifest.xml file.  When your connection comes back, it will kick off the SuperbusService to process anything that’s hanging around.

<receiver android:name="co.touchlab.android.superbus.network.ConnectionChangeReceiver">
    <intent-filter>
        <action android:name="android.net.conn.CONNECTIVITY_CHANGE" />
    </intent-filter>
</receiver>

2. ConnectionChangeBusEventListener

In most cases, listening for connection changes and starting your app and service will be overkill. To cut down on the noise, return an instance of ConnectionChangeBusEventListener from your PersistedApplication implementation. When the queue finishes processing, it will check the command count. If there was a network cutoff, you may have commands waiting. This listener will “turn on” the ConnectionChangeReceiver.  If there’s nothing waiting, it’ll turn it off.  YOU MUST ALSO REGISTER THE ConnectionChangeReceiver IN YOUR MANIFEST. The ConnectionChangeBusEventListener expects it, and won’t help if it’s not there.

Best Practices

First a word to the reader. Handling offline data is not easy. Nothing makes users quite as uniquely upset as losing data, except losing it quietly. This is a sharp ax we’re giving you here. A useful tool, but you can REALLY hurt yourself if you’re not careful.

Write Local First

Only perform your network operations in the queue. If you’re going to write changes to a local file or db, do that before posting to the queue. Seems obvious after you’ve used the queue for a while, but it’s worth pointing out.

Be Vocal When You Fail

Just like in “real life”, admit when you screw up. Implement a notification in onPermanentError. Attempt to fix, if possible, or roll back, but tell the user what happened.

Store Actual Changes in the Command

This is sort of personal preference, but we tend to store the actual changes in the command object, rather than just a pointer to the db entity. This is less important if you follow the collision avoidance rules, but it still feels safer. Even if you manage to overwrite local changes, you’ll still send the changes to the server.  Also, if you’re hard core about failing, you might be able to figure out how to roll back (if you store the original data as well).

Don’t Freeze Your Queue

If you don’t handle Transient/PermanentException properly, you can wedge your queue permanently. We made a decision to default the purge policy to never purging commands based on TransientException. I’m thinking we’ll wind up defaulting to nothing, and requiring you to choose a policy, as never failing could be dangerous if you’re not careful.  Make VERY sure you know what you’re doing.

If you have some commands that are more dangerous than others, or simply less important, you can check transientExceptionCount when processing the command, and throw a PermanentException if the count is high.

Don’t Step on Local Changes

Here’s a basic pattern we deal with often. You have a command that refreshes a bunch of data wholesale, say “RefreshAccounts”. This command will call the server, get all account records, and update the local SQLite db.  Smooth.

The user can update an account locally, which triggers a db update, then a post to the queue with the “UpdateAccount” command.  Also smooth.

However. If you have a “RefreshAccounts” command waiting in your queue, and you post an “UpdateAccount” command, you’ll find a hilarious side effect. When your commands process, your local account changes will revert. Not so smooth.

To help prevent this, set the “RefreshAccounts” to a lower priority.  It’ll push it to the back of the queue. Pretty smooth.

There’s still a little edge case that will crop up periodically. It’ll be the type you NEVER see in testing, but will upset the rare user that sees it. Say “RefreshAccounts” is currently running.  At the same time, your user clicks “Save” on an account edit. The save update preempts the completion of “RefreshAccounts” (you understand threads, right?). Your local changes are saved to the db, just in time for “RefreshAccounts” to blow them away. Smooth?

In this scenario, if you’ve stored the actual changes in the command, you will still persist the changes remotely, and eventually locally, but it’ll freak out users if they see the entity reverted.

Avoiding this will seem complex, but pay attention. It’s not that bad.

  1. In your local db update code, start a db transaction.
  2. Post a message to the PersistenceProvider. You’ll need to implement the semantics, but your “RefreshAccounts” equivalent should listen for this command, and mark itself “cancelled” in an internal boolean.
  3. Post the “UpdateAccount” command to the queue.
  4. Finish the transaction.
  5. In your “RefreshAccounts” command, if you notice cancelled at the start, just return (possibly repost yourself).
  6. If your “RefreshAccounts” command is in the middle of processing, do the following.
  7. When you’re ready to update the db, start a transaction. Once in a transaction, no local updates will be happening outside of the transaction (this is critical).
  8. First check “cancelled”. If it’s now true, you’ve found yourself in the rare case where and update happened mid-flight (log some exclamation points. You’ve earned them). If true, don’t update, close the transaction, and return (optionally repost self).

Once inside either transaction, the other updates won’t happen, so if you check conditions properly, you shouldn’t ever overwrite local changes.

Also, make sure the “RefreshAccounts” command has a lower priority.

Code samples from “example-sql”.

From “GetMessagesCommand”

@Override
public void onRuntimeMessage(String message)
{
    if(message.equals(CANCEL_UPDATE))
        cancelUpdate = true;
}
@Override
public void callCommand(Context context) throws TransientException, PermanentException
{
    BusHttpClient httpClient = new BusHttpClient("http://wejit.herokuapp.com”);
    httpClient.setConnectionTimeout(10000);
    HttpResponse httpResponse = httpClient.get("/device/getExamplePosts", null);
    httpClient.checkAndThrowError();
    String content = httpResponse.getBodyAsString();
    DatabaseHelper instance = DatabaseHelper.getInstance(context);
    SQLiteDatabase writableDatabase = instance.getWritableDatabase();
    try
    {
        writableDatabase.beginTransaction();
        if(!cancelUpdate)
            instance.saveToDb(context, content);
        writableDatabase.setTransactionSuccessful();
    }
    catch (JSONException e)
    {
        throw new PermanentException(e);
    }
    finally
    {
        writableDatabase.endTransaction();
    }
    sendUpdateBroadcast(context);
}

Notice cancelUpdate is set in the message, and is checked before updating the db. Note, I’m not reposting. You’d most likely want to do that if the update was cancelled.

From ExampleActivity

private void callUpdate(MessageEntry messageEntry) throws StorageException
{
    DatabaseHelper instance = DatabaseHelper.getInstance(this);
    final SQLiteDatabase db = instance.getWritableDatabase();
    db.beginTransaction();
    try
    {
        persistenceProvider.sendMessage(GetMessageCommand.CANCEL_UPDATE);
        //Update the database
        instance.insertOrUpdateMessage(db, messageEntry);
        Long serverId = messageEntry.getServerId();
        String messString = messageEntry.getMessage();
        if (serverId != null)
            persistenceProvider.put(this, new EditMessageCommand(messString, serverId));
        else
            persistenceProvider.put(this, new PostMessageCommand(messString));
        db.setTransactionSuccessful();
    }
    finally
    {
        db.endTransaction();
    }
}

You may want to be more precise about what updates get blocked, but the basic pattern should hold up if you’re careful. I’d err on the side of canceling more updates than trying to be too smart about it, but that’s me.