<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: mapreduce</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/mapreduce.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2021-06-06T16:03:51+00:00</updated><author><name>Simon Willison</name></author><entry><title>The humble hash aggregate</title><link href="https://simonwillison.net/2021/Jun/6/hash-aggregate/#atom-tag" rel="alternate"/><published>2021-06-06T16:03:51+00:00</published><updated>2021-06-06T16:03:51+00:00</updated><id>https://simonwillison.net/2021/Jun/6/hash-aggregate/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://veekaybee.github.io/2021/06/06/hashaggregate/"&gt;The humble hash aggregate&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Today I learned that “hash aggregate” is the name for the algorithm where you split a list of tuples on a common key, run an aggregation against each resulting group and combine the results back together again—I’d previously thought if this in terms of map/reduce but hash aggregate is a much older term used widely by SQL engines—I’ve seen it come up in PostgreSQL explain query output (for GROUP BY) before but didn’t know what it meant.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/vboykis/status/1401549075598675977"&gt;@vboykis&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/algorithms"&gt;algorithms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;&lt;/p&gt;



</summary><category term="algorithms"/><category term="mapreduce"/><category term="sql"/></entry><entry><title>The Friendship That Made Google Huge</title><link href="https://simonwillison.net/2018/Dec/31/the-friendship-that-made-google-huge/#atom-tag" rel="alternate"/><published>2018-12-31T03:56:45+00:00</published><updated>2018-12-31T03:56:45+00:00</updated><id>https://simonwillison.net/2018/Dec/31/the-friendship-that-made-google-huge/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.newyorker.com/magazine/2018/12/10/the-friendship-that-made-google-huge"&gt;The Friendship That Made Google Huge&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The New Yorker profiles Jeff Dean and Sanjay Ghemawat, Google’s first and only level 11 Senior Fellows. This is some of the best writing on complex software engineering topics (map-reduce, Tensor Flow and the like) aimed at a general audience that I’ve ever seen. Also a very compelling case study in pair programming.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/new-yorker"&gt;new-yorker&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tensorflow"&gt;tensorflow&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="mapreduce"/><category term="new-yorker"/><category term="tensorflow"/></entry><entry><title>App Engine at Google I/O 2010</title><link href="https://simonwillison.net/2010/May/20/appengine/#atom-tag" rel="alternate"/><published>2010-05-20T15:30:00+00:00</published><updated>2010-05-20T15:30:00+00:00</updated><id>https://simonwillison.net/2010/May/20/appengine/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://googleappengine.blogspot.com/2010/05/app-engine-at-google-io-2010.html?utm_source=feedburner&amp;amp;utm_medium=feed&amp;amp;utm_campaign=Feed%3A GoogleAppEngineBlog %28Google App Engine Blog%29"&gt;App Engine at Google I/O 2010&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenID and OAuth are now baked in to the AppEngine users API. They’re also demoing two very exciting new features—a mapper API for doing map/reduce style queries against the data store, and a Channel API for building comet applications.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/comet"&gt;comet&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google-app-engine"&gt;google-app-engine&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/oauth"&gt;oauth&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openid"&gt;openid&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google-io"&gt;google-io&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;&lt;/p&gt;



</summary><category term="comet"/><category term="google"/><category term="google-app-engine"/><category term="mapreduce"/><category term="oauth"/><category term="openid"/><category term="google-io"/><category term="recovered"/></entry><entry><title>BashReduce</title><link href="https://simonwillison.net/2009/Jun/28/bashreduce/#atom-tag" rel="alternate"/><published>2009-06-28T15:03:15+00:00</published><updated>2009-06-28T15:03:15+00:00</updated><id>https://simonwillison.net/2009/Jun/28/bashreduce/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://rcrowley.org/2009/06/27/bashreduce"&gt;BashReduce&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Map/Reduce in Bash is no longer a joke project (if it ever was)—Richard Crowley is extending it and using it for analysis at OpenDNS.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bash"&gt;bash&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bashreduce"&gt;bashreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/opendns"&gt;opendns&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/richard-crowley"&gt;richard-crowley&lt;/a&gt;&lt;/p&gt;



</summary><category term="bash"/><category term="bashreduce"/><category term="mapreduce"/><category term="opendns"/><category term="richard-crowley"/></entry><entry><title>Finding similar items with Amazon Elastic MapReduce, Python, and Hadoop streaming</title><link href="https://simonwillison.net/2009/Apr/7/amazon/#atom-tag" rel="alternate"/><published>2009-04-07T09:19:38+00:00</published><updated>2009-04-07T09:19:38+00:00</updated><id>https://simonwillison.net/2009/Apr/7/amazon/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://developer.amazonwebservices.com/connect/entry!default.jspa?categoryID=265&amp;amp;externalID=2294&amp;amp;fromSearchPage=true"&gt;Finding similar items with Amazon Elastic MapReduce, Python, and Hadoop streaming&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Tutorial for running Hadoop jobs on Elastic MapReduce using Python and the 2005 Audioscrobbler dataset.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/amazon"&gt;amazon&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/amazon-web-services"&gt;amazon-web-services&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/audioscrobbler"&gt;audioscrobbler&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/elasticmapreduce"&gt;elasticmapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;



</summary><category term="amazon"/><category term="amazon-web-services"/><category term="audioscrobbler"/><category term="elasticmapreduce"/><category term="hadoop"/><category term="mapreduce"/><category term="python"/></entry><entry><title>Amazon Elastic MapReduce</title><link href="https://simonwillison.net/2009/Apr/2/mapreduce/#atom-tag" rel="alternate"/><published>2009-04-02T10:25:37+00:00</published><updated>2009-04-02T10:25:37+00:00</updated><id>https://simonwillison.net/2009/Apr/2/mapreduce/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://aws.amazon.com/elasticmapreduce/"&gt;Amazon Elastic MapReduce&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Hadoop as a service. Basically a web based GUI around Hadoop—you could roll this yourself on EC2 but for a small markup on regular EC2 prices you get to avoid the extra work setting everything up. Data processing scripts can be written in Java, Ruby, Perl, Python, PHP, R, or C++ and are loaded in to S3 before firing off the job.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://joedrumgoole.com/blog/2009/04/02/amazon-web-services-adds-map-reduce/"&gt;Joe Drumgoole&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/amazon"&gt;amazon&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/amazon-web-services"&gt;amazon-web-services&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cloud-computing"&gt;cloud-computing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ec2"&gt;ec2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/s3"&gt;s3&lt;/a&gt;&lt;/p&gt;



</summary><category term="amazon"/><category term="amazon-web-services"/><category term="cloud-computing"/><category term="ec2"/><category term="hadoop"/><category term="mapreduce"/><category term="s3"/></entry><entry><title>Cascading</title><link href="https://simonwillison.net/2008/Oct/1/about/#atom-tag" rel="alternate"/><published>2008-10-01T13:22:19+00:00</published><updated>2008-10-01T13:22:19+00:00</updated><id>https://simonwillison.net/2008/Oct/1/about/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.cascading.org/about.html"&gt;Cascading&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A Java API abstraction layer over Hadoop that lets developers think in terms of pipes and filters rather than map/reduce. The Cascading developers claim that this model is easier to understand and less error prone.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cascading"&gt;cascading&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/java"&gt;java&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pipesfilters"&gt;pipesfilters&lt;/a&gt;&lt;/p&gt;



</summary><category term="cascading"/><category term="hadoop"/><category term="java"/><category term="mapreduce"/><category term="pipesfilters"/></entry><entry><title>Python + Hadoop = Flying Circus Elephant</title><link href="https://simonwillison.net/2008/May/31/lastfm/#atom-tag" rel="alternate"/><published>2008-05-31T14:14:56+00:00</published><updated>2008-05-31T14:14:56+00:00</updated><id>https://simonwillison.net/2008/May/31/lastfm/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://blog.last.fm/2008/05/29/python-hadoop-flying-circus-elephant"&gt;Python + Hadoop = Flying Circus Elephant&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Last.fm have released Dumbo, a Python module that lets you easily write Hadoop map/reduce tasks using Python and generators.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/dumbo"&gt;dumbo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generators"&gt;generators&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lastfm"&gt;lastfm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;



</summary><category term="dumbo"/><category term="generators"/><category term="hadoop"/><category term="lastfm"/><category term="mapreduce"/><category term="python"/></entry><entry><title>Writing An Hadoop MapReduce Program In Python</title><link href="https://simonwillison.net/2007/Oct/9/writing/#atom-tag" rel="alternate"/><published>2007-10-09T11:33:58+00:00</published><updated>2007-10-09T11:33:58+00:00</updated><id>https://simonwillison.net/2007/Oct/9/writing/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python"&gt;Writing An Hadoop MapReduce Program In Python&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Hadoop (the open source map/reduce framework) can interact with any program that reads from stdin and outputs on stdout—so it’s trivial to drop in Python scripts for the map and reduce steps.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/hadoop"&gt;hadoop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;



</summary><category term="hadoop"/><category term="mapreduce"/><category term="python"/></entry><entry><title>CouchDB: Thinking beyond the RDBMS</title><link href="https://simonwillison.net/2007/Sep/3/labnotes/#atom-tag" rel="alternate"/><published>2007-09-03T09:48:43+00:00</published><updated>2007-09-03T09:48:43+00:00</updated><id>https://simonwillison.net/2007/Sep/3/labnotes/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://blog.labnotes.org/2007/09/02/couchdb-thinking-beyond-the-rdbms/"&gt;CouchDB: Thinking beyond the RDBMS&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
CouchDB is a fascinating project—an Erlang powered non-relational database with a JSON API that lets you define “views” (really computed tables) based on JavaScript functions that execute using map/reduce. Damien Katz, the main developer currently works for MySQL and used to work on Lotus Notes.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/couchdb"&gt;couchdb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/damien-katz"&gt;damien-katz&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/erlang"&gt;erlang&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/json"&gt;json&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lotusnotes"&gt;lotusnotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapreduce"&gt;mapreduce&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mysql"&gt;mysql&lt;/a&gt;&lt;/p&gt;



</summary><category term="couchdb"/><category term="damien-katz"/><category term="databases"/><category term="erlang"/><category term="javascript"/><category term="json"/><category term="lotusnotes"/><category term="mapreduce"/><category term="mysql"/></entry></feed>