Latest Publications

A new home for my Blog

As of Today (10th September 2009), this Blog has moved to a new home at http://craig-henderson.blogspot.com/

All the content from here has moved over, and will be joined by new posts.

Please update your bookmarks.

Thanks
– Craig

MapReduce C++ Library v0.2 available to download

I’ve posted v0.2 of my C++ MapReduce library on the Boost Vault. It is downloadable directly from here. Updates in this release are:

  • Moved the library into the boost namespace
  • Created PartitionFn template parameter on intermediates::local_disk to enable customisation of the partitioning of data into result files
  • Use of BOOST_THROW_EXCEPTION in place of throw
  • Rationalised and completed include guards
  • Support for gcc 4.3.3 on Ubuntu Linux

Online documentation is here.

    MapReduce runtime language choices

    I’m not interested in language wars or general arguments over relative merits of a particular technology choice, but a recent blog post caught my attention as it flashed up on my Google Alert email. Entitled Yahoo’s infrastructural disadvantage to Google: Java performance does not scale, the author (Kevin Lawton) presents some experimental results in benchmarking scalability of Java and C++.

    Yahoo(YHOO) uses a Java-based MapReduce infrastructure called Hadoop. This article demonstrates why Java performance does not scale well for large scale compute settings, relative to C++, which is what Google(GOOG) uses for their MapReduce infrastructure.

    What I particularly liked about this post is that Kevin does not just compare languages against each other, but compares scalability of each language against itself and then against the other.

    We all have choices to make, and there will never be a one-size-fits-all technology. The great thing about software - and open-source in particular - is that a lot of very smart people work to provide alternative technology solutions so that we can make these choices freely to best suit our needs.

    Online C++ MapReduce library documentation

    As I posted yesterday, I have upload the first release of my C++ MapReduce library to the Boost Vault. Development and testing continues, so I have uploaded the Boost-ified documentation to my site at http://www.craighenderson.co.uk/mapreduce/ which I can update much more easily than the zip file in the vault.

    Importantly, I have included a Change Log on the front page. This includes the changes that have been made to the library since I uploaded it to the vault, and I’ll keep this up-to-date with reference to updated release, and when I get the code into the Boost Sandbox (subversion).