As wise people say, the Internet was initially built for remotely connecting scientists to expensive supercomputers (whose computing power was comparable to modern cell phones). Thus, they supported the abstraction of conversation. Currently, however, the Internet is mostly used for disseminating content – and this mismatch definitely creates some problems.
The swift protocol is a content-centric multiparty transport protocol. Basically, it answers one and only one question: 'Here is a hash! Give me data for it!'. Ultimately swift aims at the abstraction of the Internet as a single big data cloud. Such entities as storage, servers and connections are abstracted away and are virtually invisible at the API layer. Given a hash, the data is received from whatever source available and data integrity is checked cryptographically with Merkle hash trees.
An old Unix adage says: 'free memory is wasted memory'. Once a computer is powered on, there is no benefit in keeping some memory unoccupied. We may extend this principle a bit further:
- free bandwidth is wasted bandwidth
- free storage is wasted storage.
Unless your power budget is really tight, there is no sense in conserving either. Thus, instead of emphasising reciprocity and incentives we focus on code with a lighter footprint, non-intrusive congestion control and automatic disk space management.
Currently, most parts of the protocol/library are implemented, pass basic testing and successfully transfer data on real networks. After more scrutinized testing, the protocol and the library are expected to be real-life-ready in December 2010.
Design of the protocol
Most features of the protocol are defined by its function as a content-centric multiparty transport protocol. It entirely drops TCP's abstraction of sequential reliable data stream delivery: for swift this is redundant. For example, out-of-order data could still be saved and the same piece of data might always be received from another peer. Being implemented over UDP, the protocol does its best to make every datagram self-contained. In general, pruning of unneeded functions and aggressive layer collapsing greatly simplifies the protocol compared to, for example, the BitTorrent+TCP stack.
Atomic datagrams, not data stream
To achieve per-datagram flexibility of data flow and also to adapt to the unreliable medium (UDP, and, ultimately, IP), the protocol was built around the abstraction of atomic datagrams. Ideally, once received, a datagram is either immediately discarded or permanently accepted, ready to be forwarded to other peers. For the sake of flexibility, most of the protocol's messages are optional. It also has no 'standard' header. Instead, each datagram is a concatenation of zero or more messages. No message ever spans two datagrams. Except for the data pieces themselves, no message is acknowledged or guaranteed to be delivered.
Scale-independent unit system
To avoid a multilayered request/acknowledgement system, where every layer basically does the same but for bigger chunks of data – as is the case with BitTorrent+TCP packet-block-piece-file-torrent stacking – swift employs a scale-independent acknowledgement/request system, where data is measured by aligned power-of-2 intervals (so called bins). All acknowledgements and requests are done in terms of bins.
Datagram-level integrity checks
swift builds Merkle hash trees down to every single packet (1KB of data). Once data is transmitted, all uncle hashes necessary for verification are prepended to the same datagram. As the receiver constantly remembers old hashes, the average number of 'new' hashes which have to be transmitted is small: normally around one per packet of data.
NAT traversal by design
The only method of peer discovery in swift is PEX: a third peer initiates a connection between two of its contacted peers. The protocol's handshake is engineered to perform simple NAT hole punching transparently if needed.
Subsetting of the protocol
Different kinds of peers might implement different subsets of messages; a 'tracker', for example, uses the same protocol as every peer, except it only accepts the HANDSHAKE message and the HASH message (to let peers explain what content they are interested in), while returning only HANDSHAKE and PEX_ADD messages (to return the list of peers). Different subsets of accepted/emitted messages may correspond to push/pull peers, plain trackers, hash storing trackers, live streaming peers, etc.
Push AND pull
The protocol allows both for PUSH (sender decides what to send) and PULL (receiver explicitly requests the data). PUSH is normally used as a fallback if PULL fails; also, the sender may ignore requests and send any data it finds convenient to send. Merkle hash trees allow this flexibility without causing security implications.
No transmission metadata
Unlike BitTorrent swift employs no transmission metadata (the .torrent file). The only bootstrap information is the root hash; file size is derived from the hash tree once the first packet is received; the hash tree is reconstructed incrementally in the process of download.
Specifications and documentation
- Current stable tarball
- swift sources at github
- Mac OS X: swift binary and test script (outdated)
- Windows: swift binary and test script (outdated)
- One-click demo for Windows: swift-nobrainer.exe (outdated)
Frequently asked questions
Well, why swift?
That name has served well for many other protocols; we hope it will serve well for ours. It may be thought of as a meta-joke. The working name for the protocol was 'VicTorrent'. We also insist on lowercase italic swift to keep the name formally unique (for some definition of unique).
How is it different from...
TCP emulates reliable in-order delivery ("data pipe") over chaotic unreliable multi-hop networks. TCP has no idea what data it is dealing with, as the data is passed from the userspace. In our case, the data is fixed in advance and many peers participate in distributing the same data. Order of delivery is of little importance and unreliability is naturally compensated for by redundance. Thus, many functions of TCP turn out to be redundant. The only function of TCP that is also critical for swift is the congestion control, but... we need our own custom congestion control! Thus, we did not use TCP.
That led both to hurdles and to some savings. As one example, every TCP connection needs to maintain buffers for the data that has left the sender's userspace but not yet arrived at the receiver's userspace. As we know that we are dealing with the same fixed data, we don't need to maintain per-connection buffers.
UDP, which is the thinnest wrapper around IP, is our choice of underlying protocol. From the standpoint of ideology, a transport protocol should be implemented over IP, but unfortunately that causes some chicken-and-egg problems, like a need to get into the kernel to get deployments, and a need to get deployments to be accepted into the kernel. UDP is also quite nice with regard to NAT penetration.
BitTorrent is an application-level protocol and quite a heavy one. We focused on fitting our protocol into the restrictions of the transport layer, assuming that the protocol might eventually be included into operating system kernels. For example, we stripped the protocol of any transmission's metadata (the .torrent file); leaving a file's root hash as the only parameter.
Historically, BitTorrent required lots of adaptations to its underlying transport. First and foremost, TCP is unable to prioritize traffic, so BitTorrent needed to coerce users somehow to tolerate the inconveniences of seeding. That caused tit-for-tat and, to significant degree, rarest-first. Another example is the four-upload-slots limitation. (Apparently some architectural decisions in BitTorrent were dictated by the oddities of Windows 95, but... never mind.)
Eventually, BitTorrent developers came to the conclusion that not annoying the user in the first place was probably a better stimulus. So they came up with the LEDBAT congestion control algorithm (Low Extra Delay Background Transport). LEDBAT allows a peer to seed without interfering with regular traffic (in other words, without slowing down the browser). To integrate the novel congestion control algorithm into BitTorrent incrementally, BitTorrent Inc had to develop a TCP-alike transport named µTP. The swift project (then named VicTorrent) began by trying to understand what would happen if BitTorrent was stripped of any Win95-specific, TCP-specific or Python-specific workarounds. As it turned out, not much was left.
...Van Jacobson's CCN?
Van Jacobson's team in PARC is doing exploratory research on content-centric networking. While BitTorrent works at layer 5 (application), we go to layer 4 (transport). PARC people are bold enough to go to layer 3 and to propose a complete replacement for the entire TCP/IP world. That is certainly a compelling vision, but we focus on the near future (<10 years) while CCNx is a much more ambitious rework.
This question arises quite frequently as DCCP is a congestion-controlled datagram transport. The option of implementing swift over DCCP was considered, but the inconvenience of working with an esoteric transport was not compensated by the added value of DCCP, which is limited to one mode of congestion control being readily implemented. Architectural restrictions imposed by DCCP were also found to be a major inconvenience. Last but not least, currently only Linux supports DCCP at the kernel level.
SCTP is a protocol fixing some shortcomings of TCP mostly in the context of telephony. As was the case with DCCP, features of SCTP were of little interest to us, while things we really needed were missing from SCTP. Still, we must admit that we employ quite a similar message-oriented model (as opposed to TCP's stream orientation).