Boost C++ Libraries

“...one of the most highly regarded and expertly designed C++ library projects in the world.” Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

Boost.MapReduce Future Work

Note: This library is not yet part of the Boost Library and is still under development and review.

This is the first release of the MapReduce library, and there are a few features that I'd still like to do.

  • Improve support for other platforms. This will require help from the Boost development community.

  • Add a PartioningFunction parameter in local_disk intermediate handler to enable custominsation of the partitioning of data into the final result files.

  • Add a template to the SortFn sort function to prevent expansion of duplicates if required. (For example, this expansion contradicts the combiner in wordcount, and eliminating the two would improve performance considerably).

  • An extension to the intermediates::local_disk<> policy class could be to compress the intermediate files, using the Boost.Iostreams zip/bzip2 compression libraries. This is a long-term item that will be very useful when the library is extended to supported cross-machine MapReduce. Until then, the value is very limited.

Multiple Machine Support

MapReduce was originally designed as a mechanism for working on large datasets across many (1000s) of commodity servers. The current Boost library works across a plurality of CPU cores on a single machine. There is a big jump to multi-machine support, so this is a long-term goal, but a goal nonetheless.

Distributed File System

To support the MapReduce across multiple machines, some form of distributed file system is required. I have begun development of one using Boost libraries (primarily Boost.FileSystem and Boost.Asio). The question is going to be whether this really sits within Boost as a C++ library, or whether it is really a runtime environment for MapReduce to sit atop. My feeling is that there is some value in having a scalable and resilient DFS which is peerless and heterogenous across all platforms as a library that can be built into an application, but whether that is the really remains to be seen.