Learning from Ubuntu One
Since Ubuntu One file sync server has recently been opensourced, I'd like to highlight some of the historical decisions that led to creation (and, well, demise) of the file synchronization service originally known as Ubuntu One.
Note that Martin Albisetti has provided the architectural overview of the server side, so you may want to look at that too.
Originally, the file sync was not really a sync. Your files were supposed to be stored online and accessible over FUSE mount when needed. While this was very convenient in theory, the abstraction would shatter in case you become offline, or the process/kernel crashes. An intelligent caching scheme would have been required to mitigate the issue, but...
There are 2 hard problems in computer science: caching, naming, and off-by-1 errors
So, at some point this version of the client was scraped and a new project was born, codename "chicharra", also known as "syncdaemon".
Instead of using an off-the shelf protocol for file transfer like HTTP, a decision was made to create a brand new one, based on Google's Protocol Buffers as the payload format. It was called "storageprotocol". The server part called "updown" was listening on port 443, but it was not a HTTPS server. The custom protocol made it harder to support non-trivial networking setups, such as the ones involving proxies, and that took about 4 years to get implemented. The clients would also try to get the location of the updown node via DNS SRV requests before using the hardcoded default, and it turned out that SRV records were sometimes blocked by ISPs.
The files were compressed on the client and then the compressed blob would be stored on the server side. This has also turned out to be a disadvantage. Both music streaming application and wget downloading a public file could request the Range of the file, not the whole blob, requiring the file to be decompressed in a way that resembles the act of rewinding of a tape. Yes, even already compressed MP3 files were again compressed before sending to the server causing the noticeable CPU spikes during hashing/compression stage.
In order to prevent file upload if it already existed SyncDaemon would hash the file locally, send the hash to the server, and the server could immediately reply that the file was already there. Figuring out how we battled the "dropship"-like actions is left as an exercise for the reader.
Not all calls were implemented via storageprotocol though, some of the requests (such as publishing files, or getting quota for the UI) were still going through the HTTP servers and syncdaemon was just proxying the calls to endpoints specifically designed to support these features only.
Almost everything syncdaemon had to share with the world was exposed through DBus, which made it extremely easy to interface with. This enabled u1sdtool to control the service, Nautilus extension to show the emblems for files, Shutter (written in Perl!) was able to publish the screenshots to Ubuntu One. The Windows version could not use DBus, so it used
twisted.spread based on loopback TCP socket instead.
The Ubuntu syncdaemon client part was quite usable, but when the time has come to create Android and iOS clients, the custom always-connected storageprotocol was not cooperating. There was a functional syncdaemon protocol implementation in Java to be used by Android, but it was fairly slow to perform the tasks the user wants to do on their phone, namely browse files, download and upload them. This required an actual REST API. Upon implementation, Android and iOS applications were released to a wide audience, and James Henstridge implemented a FTP proxy which would translate the FTP calls into Ubuntu One REST API.
Ubuntu Stable Release Updates were preventing the software from reaching the people right when it was ready, causing the team to support multiple software versions across multiple Ubuntu Releases with no way of merging the latest code into LTS releases. Dropbox on the other hand had a repository that supported multiple Ubuntu versions from the same code base.
On the other hand the server side was evolving rapidly. The original infrastructure was not able to support the growing user base of the service (after all, Ubuntu One was pre-installed in Ubuntu), and as Martin said, massive architecture changes were made to get it to work.
- Inventing your own protocols is expensive and should generally be avoided.
- Measure things before optimizing them.
- Plan for public APIs beforehand.