<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: gunicorn</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/gunicorn.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2022-10-23T19:58:00+00:00</updated><author><name>Simon Willison</name></author><entry><title>Weeknotes: DjangoCon, SQLite in Django, datasette-gunicorn</title><link href="https://simonwillison.net/2022/Oct/23/datasette-gunicorn/#atom-tag" rel="alternate"/><published>2022-10-23T19:58:00+00:00</published><updated>2022-10-23T19:58:00+00:00</updated><id>https://simonwillison.net/2022/Oct/23/datasette-gunicorn/#atom-tag</id><summary type="html">
    &lt;p&gt;I spent most of this week at &lt;a href="https://2022.djangocon.us/"&gt;DjangoCon&lt;/a&gt; in San Diego - my first outside-of-the-Bay-Area conference since the before-times.&lt;/p&gt;
&lt;p&gt;It was a most excellent event. I spent a lot of time in the corridor track - actually the sitting-outside-in-the-sunshine track, catching up with people I haven't seen in several years.&lt;/p&gt;
&lt;p&gt;I gave a talk titled "&lt;a href="https://2022.djangocon.us/talks/massively-increase-your-productivity-on/"&gt;Massively increase your productivity on personal projects with comprehensive documentation and automated tests&lt;/a&gt;", with the alternative title "Coping strategies for the serial project hoarder". I'll do a full write-up of this once the video is made available in a few weeks time, but in the meantime the talk materials can be found here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/simonw/djangocon-2022-productivity"&gt;Supporting notes and links&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://speakerdeck.com/simon/massively-increase-your-productivity-on-personal-projects-with-comprehensive-documentation-and-automated-tests"&gt;Slides on Speaker Deck&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://djangoconus2022.loudswarm.com/session/massively-increase-your-productivity-on-personal-projects-with-comprehensive-documentation-and-automated-tests"&gt;Video for paying DjangoCon attendees&lt;/a&gt; (public video coming soon)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I also gave a lightning talk about AI and magic, which was effectively the five minute oral version of my recent blog post &lt;a href="https://simonwillison.net/2022/Oct/5/spell-casting/"&gt;Is the AI spell-casting metaphor harmful or helpful?&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="benchmarking-sqlite"&gt;Benchmarking SQLite in Django&lt;/h4&gt;
&lt;p&gt;I also hung around for the first day of the DjangoCon sprints.&lt;/p&gt;
&lt;p&gt;For over a decade, the Django documentation has warned against using SQLite in production - recommending PostgreSQL or MySQL instead.&lt;/p&gt;
&lt;p&gt;I asked Django Fellow &lt;a href="https://twitter.com/carltongibson"&gt;Carlton Gibson&lt;/a&gt; what it would take to update that advice for 2022. He suggested that what we really needed was a solid idea for how well modern SQLite performs with Django, against a variety of different settings.&lt;/p&gt;
&lt;p&gt;So I spent some time running benchmarks, using my new &lt;a href="https://github.com/simonw/django_sqlite_benchmark"&gt;django_sqlite_benchmark&lt;/a&gt; repository.&lt;/p&gt;
&lt;p&gt;You can follow the full details of my experiments in these issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/django_sqlite_benchmark/issues/2"&gt;#2: Locust test to exercise /counter/xxx endpoint&lt;/a&gt; which runs benchmarks against a simple Django view that increments a counter stored in a SQLite table&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/django_sqlite_benchmark/issues/3"&gt;#3: Load test for larger writes&lt;/a&gt; runs a benchmark using a script that inserts larger JSON objects into a database table. I also &lt;a href="https://github.com/simonw/django_sqlite_benchmark/issues/3#issuecomment-1287598057"&gt;tried this against PostgreSQL&lt;/a&gt;, getting very similar numbers to SQLite.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/django_sqlite_benchmark/issues/4"&gt;#4: Benchmark endpoint that doesn't interact with database&lt;/a&gt; benchmarks a simple "hello world" view that doesn't use SQLite at all - as a baseline for comparison&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I used &lt;a href="https://locust.io"&gt;Locust&lt;/a&gt; for all of these tests, and wrote up &lt;a href="https://til.simonwillison.net/python/locust"&gt;a TIL about using it&lt;/a&gt; as well.&lt;/p&gt;
&lt;p&gt;Here's the TLDR version of the results: SQLite in its default "journal" mode starts returning "database locked" errors pretty quickly as the write load increases. But... if you switch to "wal" mode (&lt;a href="https://til.simonwillison.net/sqlite/enabling-wal-mode"&gt;here's how&lt;/a&gt;) those errors straight up vanish!&lt;/p&gt;
&lt;p&gt;I was expecting WAL mode to improve things, but I thought I'd still be able to hit errors even with it enabled. No - it turns out that, at least for the amount of traffic I could generate on may laptop, WAL mode proved easily capable of handling the load.&lt;/p&gt;
&lt;p&gt;Even without WAL mode, bumping the SQLite "timeout" option up to 20s solved most of the errors.&lt;/p&gt;
&lt;p&gt;Even more interestingly: I tried using Gunicorn (and Uvicorn) to run multiple Django workers at once. I was certain this would lead to problems, as SQLite isn't designed to handle writes from multiple processes at once... or so I thought. It turned out SQLite's use of file locking meant everything worked far better than I expected - and upping the number of worker processes from 1 to 4 resulted in approximately a 4x increase in throughput.&lt;/p&gt;
&lt;p&gt;I shouldn't be surprised by this, if only because every time I've tried to push SQLite in a new direction it's impressed me with how much more capable it is than I expected.&lt;/p&gt;
&lt;p&gt;But still, these results are very exciting. This problem still needs more thorough testing and more eyes than just mine, but I think this indicates that SQLite should absolutely be considered a viable option for running Django in production in 2022.&lt;/p&gt;
&lt;h4&gt;datasette-gunicorn&lt;/h4&gt;
&lt;p&gt;Datasette has always run as a single process. It uses &lt;a href="https://www.uvicorn.org/"&gt;Uvicorn&lt;/a&gt; to serve requests, but it hard-codes Uvicorn to a single worker (&lt;a href="https://github.com/simonw/datasette/blob/0.62/datasette/cli.py#L617-L619"&gt;here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Based on my experiments with SQLite and Django - in particular how running multiple worker processes gave me an increase in how much traffic I could handle - I decided to try the same thing with Datasette itself.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://gunicorn.org/"&gt;Gunicorn&lt;/a&gt; remains one of the most well regarded options for deploying Python web applications. It acts as a process monitor, balancing requests between different workers and restarting anything that fails with an error.&lt;/p&gt;
&lt;p&gt;I decided to experiment with this through the medium of a Datasette plugin. So I built &lt;a href="https://datasette.io/plugins/datasette-gunicorn"&gt;datasette-gunicorn&lt;/a&gt;, a plugin that adds an extra command to Datasette that lets you start it like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;datasette gunicorn my.db --workers 4
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It takes &lt;a href="https://datasette.io/plugins/datasette-gunicorn#user-content-datasette-gunicorn---help"&gt;most of the same arguments&lt;/a&gt; as Datasette's regular &lt;code&gt;datasette serve&lt;/code&gt; command, plus that new &lt;code&gt;-w/--workers&lt;/code&gt; option for setting the number of workers.&lt;/p&gt;
&lt;p&gt;Initial benchmarks &lt;a href="https://github.com/simonw/datasette-gunicorn/issues/1#issuecomment-1287905177"&gt;were very positive&lt;/a&gt;: 21 requests a second with a single worker, increasing to 75 requests/second with four! Not bad for an initial experiment. I also &lt;a href="https://github.com/simonw/datasette-gunicorn/issues/4"&gt;tested it serving a static page&lt;/a&gt; through Datasette and got up to over 500 requests a second with a warning that Locust needed to be moved to a separate machine for a full load test.&lt;/p&gt;
&lt;p&gt;In writing the plugin I had to figure out how to build a new command that mostly copied parameters from the existing &lt;code&gt;datasette serve&lt;/code&gt; Click command - I wrote &lt;a href="https://til.simonwillison.net/datasette/plugin-modifies-command"&gt;a TIL&lt;/a&gt; about how I ended up doing that.&lt;/p&gt;
&lt;h4&gt;shot-scraper 1.0&lt;/h4&gt;
&lt;p&gt;Also this week: I released &lt;a href="https://github.com/simonw/shot-scraper/releases/tag/1.0"&gt;shot-scraper 1.0&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Despite the exciting version number this actually only has two small new features. Here's the full changelog:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New &lt;code&gt;shot-scraper html URL&lt;/code&gt; command (&lt;a href="https://shot-scraper.datasette.io/en/stable/html.html"&gt;documented here&lt;/a&gt;) for outputting the final HTML of a page, after JavaScript has been executed. &lt;a href="https://github.com/simonw/shot-scraper/issues/96"&gt;#96&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;shot-scraper javascript&lt;/code&gt; has a new &lt;code&gt;-r/--raw&lt;/code&gt; option for outputting the result of the JavaScript expression as a raw string rather than JSON encoded (&lt;a href="https://shot-scraper.datasette.io/en/stable/javascript.html"&gt;shot-scraper javascript documentation&lt;/a&gt;). &lt;a href="https://github.com/simonw/shot-scraper/issues/95"&gt;#95&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Tutorial: &lt;a href="https://simonwillison.net/2022/Oct/14/automating-screenshots/"&gt;Automating screenshots for the Datasette documentation using shot-scraper&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I bumped it to 1.0 because &lt;code&gt;shot-scraper&lt;/code&gt; is mature enough now that I'm ready to commit to not breaking existing features (at least without shipping a 2.0, which I hope to avoid for as long as possible).&lt;/p&gt;
&lt;p&gt;I'm always trying to get more brave when it comes to stamping a 1.0 release on my main projects.&lt;/p&gt;
&lt;p&gt;(I really, really need to get Datasette 1.0 shipped soon.)&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-gunicorn"&gt;datasette-gunicorn&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-gunicorn/releases/tag/0.1"&gt;0.1&lt;/a&gt; - 2022-10-22
&lt;br /&gt;Plugin for running Datasette using Gunicorn&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/shot-scraper"&gt;shot-scraper&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/shot-scraper/releases/tag/1.0"&gt;1.0&lt;/a&gt; - (&lt;a href="https://github.com/simonw/shot-scraper/releases"&gt;23 releases total&lt;/a&gt;) - 2022-10-15
&lt;br /&gt;A command-line utility for taking automated screenshots of websites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/asgi-gzip"&gt;asgi-gzip&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/asgi-gzip/releases/tag/0.2"&gt;0.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/asgi-gzip/releases"&gt;2 releases total&lt;/a&gt;) - 2022-10-13
&lt;br /&gt;gzip middleware for ASGI applications, extracted from Starlette&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/python/too-many-open-files-psutil"&gt;Using psutil to investigate "Too many open files"&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/shot-scraper/subset-of-table-columns"&gt;shot-scraper for a subset of table columns&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/gpt3/guessing-amazon-urls"&gt;Guessing Amazon image URLs using GitHub Copilot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/django/datasette-django"&gt;Adding a Datasette ASGI app to Django&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/python/locust"&gt;Simple load testing with Locust&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/datasette/plugin-modifies-command"&gt;Writing a Datasette CLI plugin that mostly duplicates an existing command&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/djangocon"&gt;djangocon&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/my-talks"&gt;my-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gunicorn"&gt;gunicorn&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/shot-scraper"&gt;shot-scraper&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/carlton-gibson"&gt;carlton-gibson&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="django"/><category term="djangocon"/><category term="projects"/><category term="sqlite"/><category term="my-talks"/><category term="gunicorn"/><category term="datasette"/><category term="weeknotes"/><category term="shot-scraper"/><category term="carlton-gibson"/></entry><entry><title>Running gunicorn behind nginx on Heroku for buffering and logging</title><link href="https://simonwillison.net/2017/Oct/2/nginx-heroku/#atom-tag" rel="alternate"/><published>2017-10-02T01:57:20+00:00</published><updated>2017-10-02T01:57:20+00:00</updated><id>https://simonwillison.net/2017/Oct/2/nginx-heroku/#atom-tag</id><summary type="html">
    &lt;p&gt;Heroku's default setup for Django uses the &lt;a href="http://gunicorn.org/"&gt;gunicorn&lt;/a&gt; application server. Each
Heroku dyno can only run a limited number of gunicorn workers, which means a
limited number of requests can be served in parallel (around 4 per dyno is a
good rule of thumb).&lt;/p&gt;

&lt;p&gt;Where things get nasty is when you have devices on slow connections - like
mobile phones. Heroku's router buffers headers but it does not buffer response
bodies, so a slow device could hold up a gunicorn worker for several seconds.
Too many slow devices at once and the site will become unavailable to other
users.&lt;/p&gt;

&lt;p&gt;This issue is explained and discussed here: &lt;a href="http://blog.etianen.com/blog/2014/01/19/gunicorn-heroku-django/"&gt;Don't use Gunicorn to host your Django sites on Heroku &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That article recommends using waitress as an alternative to gunicorn, but in
the comments at the bottom of the article people suggest using a Heroku
&lt;a href="https://github.com/beanieboi/nginx-buildpack"&gt;nginx-buildpack&lt;/a&gt; as an alternative.&lt;/p&gt;

&lt;p&gt;Here is a slightly out-of-date tutorial on getting this all set up: &lt;a href="https://koed00.github.io/Heroku_setups/"&gt;https://koed00.github.io/Heroku_setups/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I used the following commands to set up the buildpacks:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;heroku stack:set cedar-14
heroku buildpacks:clear
heroku buildpacks:add https://github.com/beanieboi/nginx-buildpack.git
heroku buildpacks:add https://github.com/heroku/heroku-buildpack-python.git
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Unfortunately the nginx buildpack is not yet compatible with the new &lt;samp&gt;heroku-16&lt;/samp&gt;
stack, so until the nginx buildpack has been updated it's necessary to run the
application on the older &lt;samp&gt;cedar-14&lt;/samp&gt; stack. See this discussion for details: &lt;a href="https://github.com/ryandotsmith/nginx-buildpack/issues/68"&gt;ryandotsmith/nginx-buildpack#68&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Adding nginx in this way also gives us the opportunity to fix another
limitation of Heroku: its default logging configuration. By default, log lines produced by Heroku (visible using &lt;samp&gt;heroku logs --tail&lt;/samp&gt; or with a logging addon such as &lt;a href="https://elements.heroku.com/addons/papertrail"&gt;Papertrail&lt;/a&gt;) look like
this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    Oct 01 18:01:06 simonwillisonblog heroku/router: at=info
        method=GET path="/2017/Oct/1/ship/" host=simonwillison.net
        request_id=bb22f67e-6924-4e81-b6ad-74d1f465cda7
        fwd="2001:8003:74c5:8b00:79e4:80ed:fa85:7b37,108.162.249.198"
        dyno=web.1 connect=0ms service=338ms status=200 bytes=4523 protocol=http
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Notably missing here is both the user-agent string and the referrer header
sent by the browser! If you're a fan of tailing log files these omissions are pretty
disappointing.&lt;/p&gt;

&lt;p&gt;The nginx buildback I'm using loads a default configuration file at
&lt;samp&gt;config/nginx.conf.erb&lt;/samp&gt;. By including &lt;a href="https://github.com/simonw/simonwillisonblog/blob/ad874a2bf9ebfeffcb0a1a7f8594ad9735fcfc01/config/nginx.conf.erb"&gt;my own copy of this file&lt;/a&gt; I can override
the original and define my own custom log format.&lt;/p&gt;

&lt;p&gt;Having applied this change, the new log lines look like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    2017-10-02T01:44:38.762845+00:00 app[web.1]:
        measure#nginx.service=0.133 request="GET / HTTP/1.1" status_code=200
        request_id=8b6402de-d072-42c4-9854-0f71697b30e5 remote_addr="10.16.227.159"
        forwarded_for="199.188.193.220" forwarded_proto="http" via="1.1 vegur"
        body_bytes_sent=12666 referer="-" user_agent="Mozilla/5.0 (Macintosh;
        Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko)
        Chrome/61.0.3163.100 Safari/537.36"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;This blog entry started life as &lt;a href="https://github.com/simonw/simonwillisonblog/commit/23615a4822ab463c611a3e6a1f4d6cb4dcfc5e7b"&gt;a commit message&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/logging"&gt;logging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nginx"&gt;nginx&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/user-agents"&gt;user-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/heroku"&gt;heroku&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gunicorn"&gt;gunicorn&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="django"/><category term="logging"/><category term="nginx"/><category term="user-agents"/><category term="heroku"/><category term="gunicorn"/></entry></feed>