<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: denormalisation</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/denormalisation.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2017-08-16T22:49:22+00:00</updated><author><name>Simon Willison</name></author><entry><title>The denormalized query engine design pattern</title><link href="https://simonwillison.net/2017/Aug/16/denormalized-query-engine/#atom-tag" rel="alternate"/><published>2017-08-16T22:49:22+00:00</published><updated>2017-08-16T22:49:22+00:00</updated><id>https://simonwillison.net/2017/Aug/16/denormalized-query-engine/#atom-tag</id><summary type="html">
    &lt;p&gt;I presented this talk &lt;a href="https://2017.djangocon.us/talks/the-denormalized-query-engine-design-pattern/"&gt;at DjangoCon 2017&lt;/a&gt; in Spokane, Washington. Below is the abstract, the slides and the YouTube video of the talk.&lt;/p&gt;
&lt;h4 id="abstract"&gt;Abstract&lt;/h4&gt;
&lt;p&gt;Most web applications need to offer search functionality. Open source tools like Solr and Elasticsearch are a powerful option for building custom search engines… but it turns out they can be used for way more than just search.&lt;/p&gt;
&lt;p&gt;By treating your search engine as a denormalization layer, you can use it to answer queries that would be too expensive to answer using your core relational database. Questions like “What are the top twenty tags used by my users from Spain?” or “What are the most common times of day for events to start?” or “Which articles contain addresses within 500 miles of Toronto?”.&lt;/p&gt;
&lt;p&gt;With the denormalized query engine design pattern, modifications to relational data are published to a denormalized schema in Elasticsearch or Solr. Data queries can then be answered using either the relational database or the search engine, depending on the nature of the specific query. The search engine returns database IDs, which are inflated from the database before being displayed to a user - ensuring that users never see stale data even if the search engine is not 100% up to date with the latest changes. This opens up all kinds of new capabilities for slicing, dicing and exploring data.&lt;/p&gt;
&lt;p&gt;In this talk, I’ll be illustrating this pattern by focusing on Elasticsearch - showing how it can be used with Django to bring new capabilities to your application. I’ll discuss the challenge of keeping data synchronized between a relational database and a search engine, and show examples of features that become much easier to build once you have this denormalization layer in place.&lt;/p&gt;

&lt;h4 id="denorm-query-video"&gt;Video&lt;/h4&gt;

&lt;iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/NzcvewgqYog" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="allowfullscreen"&gt;
&lt;/iframe&gt;

&lt;h4 id="denorm-query-slides"&gt;Slides&lt;/h4&gt;

&lt;iframe class="speakerdeck-iframe" style="border: 0px; background: rgba(0, 0, 0, 0.1) padding-box; margin: 0px; padding: 0px; border-radius: 6px; box-shadow: rgba(0, 0, 0, 0.2) 0px 5px 40px; width: 100%; height: auto; aspect-ratio: 560 / 420;" frameborder="0" src="https://speakerdeck.com/player/465a2d2f25bc449ebdafd19247ec9712" title="The denormalized query engine design pattern" allowfullscreen="true" data-ratio="1.3333333333333333"&gt;
&lt;/iframe&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/denormalisation"&gt;denormalisation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/design-patterns"&gt;design-patterns&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/djangocon"&gt;djangocon&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/elasticsearch"&gt;elasticsearch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/my-talks"&gt;my-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/software-architecture"&gt;software-architecture&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="definitions"/><category term="denormalisation"/><category term="design-patterns"/><category term="django"/><category term="djangocon"/><category term="elasticsearch"/><category term="my-talks"/><category term="software-architecture"/></entry><entry><title>Looking to the future with Cassandra</title><link href="https://simonwillison.net/2009/Sep/9/cassandra/#atom-tag" rel="alternate"/><published>2009-09-09T21:26:52+00:00</published><updated>2009-09-09T21:26:52+00:00</updated><id>https://simonwillison.net/2009/Sep/9/cassandra/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://blog.digg.com/?p=966"&gt;Looking to the future with Cassandra&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Digg are now using Cassandra for their “green badge” (one of your friends have dugg this story) feature—the resulting denormalised dataset weighs in at 3 TB and 76 billion columns.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cassandra"&gt;cassandra&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/denormalisation"&gt;denormalisation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/digg"&gt;digg&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nosql"&gt;nosql&lt;/a&gt;&lt;/p&gt;



</summary><category term="cassandra"/><category term="denormalisation"/><category term="digg"/><category term="nosql"/></entry><entry><title>Flickr Engineers Do It Offline</title><link href="https://simonwillison.net/2008/Sep/28/queues/#atom-tag" rel="alternate"/><published>2008-09-28T01:24:57+00:00</published><updated>2008-09-28T01:24:57+00:00</updated><id>https://simonwillison.net/2008/Sep/28/queues/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/"&gt;Flickr Engineers Do It Offline&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Flickr wrote their own queuing mechanism (in PHP), and currently run ten queue servers on dedicated hardware for tasks like pushing new photos in to indexes, denormalisation and “backfills” which move data between clusters and run bulk scripts against large numbers of existing rows.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/backfills"&gt;backfills&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/denormalisation"&gt;denormalisation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/flickr"&gt;flickr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/message-queues"&gt;message-queues&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/queues"&gt;queues&lt;/a&gt;&lt;/p&gt;



</summary><category term="backfills"/><category term="denormalisation"/><category term="flickr"/><category term="message-queues"/><category term="queues"/></entry></feed>