Simon Willison's Weblog: observable

How I automate my Substack newsletter with content from my blog

2025-11-19T22:00:34+00:00

I sent out my weekly-ish Substack newsletter this morning and took the opportunity to record a YouTube video demonstrating my process and describing the different components that make it work. There's a lot of digital duct tape involved, taking the content from Django+Heroku+PostgreSQL to GitHub Actions to SQLite+Datasette+Fly.io to JavaScript+Observable and finally to Substack.

The core process is the same as I described back in 2023. I have an Observable notebook called blog-to-newsletter which fetches content from my blog's database, filters out anything that has been in the newsletter before, formats what's left as HTML and offers a big "Copy rich text newsletter to clipboard" button.

I click that button, paste the result into the Substack editor, tweak a few things and hit send. The whole process usually takes just a few minutes.

I make very minor edits:

I set the title and the subheading for the newsletter. This is often a direct copy of the title of the featured blog post.
Substack turns YouTube URLs into embeds, which often isn't what I want - especially if I have a YouTube URL inside a code example.
Blocks of preformatted text often have an extra blank line at the end, which I remove.
Occasionally I'll make a content edit - removing a piece of content that doesn't fit the newsletter, or fixing a time reference like "yesterday" that doesn't make sense any more.
I pick the featured image for the newsletter and add some tags.

That's the whole process!

The Observable notebook

The most important cell in the Observable notebook is this one:

raw_content = {
  return await (
    await fetch(
      `https://datasette.simonwillison.net/simonwillisonblog.json?sql=${encodeURIComponent(
        sql
      )}&_shape=array&numdays=${numDays}`
    )
  ).json();
}

This uses the JavaScript fetch() function to pull data from my blog's Datasette instance, using a very complex SQL query that is composed elsewhere in the notebook.

Here's a link to see and execute that query directly in Datasette. It's 143 lines of convoluted SQL that assembles most of the HTML for the newsletter using SQLite string concatenation! An illustrative snippet:

with content as (
  select
    id,
    'entry' as type,
    title,
    created,
    slug,
    '<h3><a href="' || 'https://simonwillison.net/' || strftime('%Y/', created)
      || substr('JanFebMarAprMayJunJulAugSepOctNovDec', (strftime('%m', created) - 1) * 3 + 1, 3) 
      || '/' || cast(strftime('%d', created) as integer) || '/' || slug || '/' || '">' 
      || title || '</a> - ' || date(created) || '</h3>' || body
      as html,
    'null' as json,
    '' as external_url
  from blog_entry
  union all
  # ...

My blog's URLs look like /2025/Nov/18/gemini-3/ - this SQL constructs that three letter month abbreviation from the month number using a substring operation.

This is a terrible way to assemble HTML, but I've stuck with it because it amuses me.

The rest of the Observable notebook takes that data, filters out anything that links to content mentioned in the previous newsletters and composes it into a block of HTML that can be copied using that big button.

Here's the recipe it uses to turn HTML into rich text content on a clipboard suitable for Substack. I can't remember how I figured this out but it's very effective:

Object.assign(
  html`<button style="font-size: 1.4em; padding: 0.3em 1em; font-weight: bold;">Copy rich text newsletter to clipboard`,
  {
    onclick: () => {
      const htmlContent = newsletterHTML;
      // Create a temporary element to hold the HTML content
      const tempElement = document.createElement("div");
      tempElement.innerHTML = htmlContent;
      document.body.appendChild(tempElement);
      // Select the HTML content
      const range = document.createRange();
      range.selectNode(tempElement);
      // Copy the selected HTML content to the clipboard
      const selection = window.getSelection();
      selection.removeAllRanges();
      selection.addRange(range);
      document.execCommand("copy");
      selection.removeAllRanges();
      document.body.removeChild(tempElement);
    }
  }
)

From Django+Postgresql to Datasette+SQLite

My blog itself is a Django application hosted on Heroku, with data stored in Heroku PostgreSQL. Here's the source code for that Django application. I use the Django admin as my CMS.

Datasette provides a JSON API over a SQLite database... which means something needs to convert that PostgreSQL database into a SQLite database that Datasette can use.

My system for doing that lives in the simonw/simonwillisonblog-backup GitHub repository. It uses GitHub Actions on a schedule that executes every two hours, fetching the latest data from PostgreSQL and converting that to SQLite.

My db-to-sqlite tool is responsible for that conversion. I call it like this:

db-to-sqlite \
  $(heroku config:get DATABASE_URL -a simonwillisonblog | sed s/postgres:/postgresql+psycopg2:/) \
  simonwillisonblog.db \
  --table auth_permission \
  --table auth_user \
  --table blog_blogmark \
  --table blog_blogmark_tags \
  --table blog_entry \
  --table blog_entry_tags \
  --table blog_quotation \
  --table blog_quotation_tags \
  --table blog_note \
  --table blog_note_tags \
  --table blog_tag \
  --table blog_previoustagname \
  --table blog_series \
  --table django_content_type \
  --table redirects_redirect

That heroku config:get DATABASE_URL command uses Heroku credentials in an environment variable to fetch the database connection URL for my blog's PostgreSQL database (and fixes a small difference in the URL scheme).

db-to-sqlite can then export that data and write it to a SQLite database file called simonwillisonblog.db.

The --table options specify the tables that should be included in the export.

The repository does more than just that conversion: it also exports the resulting data to JSON files that live in the repository, which gives me a commit history of changes I make to my content. This is a cheap way to get a revision history of my blog content without having to mess around with detailed history tracking inside the Django application itself.

At the end of my GitHub Actions workflow is this code that publishes the resulting database to Datasette running on Fly.io using the datasette publish fly plugin:

datasette publish fly simonwillisonblog.db \
  -m metadata.yml \
  --app simonwillisonblog-backup \
  --branch 1.0a2 \
  --extra-options "--setting sql_time_limit_ms 15000 --setting truncate_cells_html 10000 --setting allow_facet off" \
  --install datasette-block-robots \
  # ... more plugins

As you can see, there are a lot of moving parts! Surprisingly it all mostly just works - I rarely have to intervene in the process, and the cost of those different components is pleasantly low.

Tags: blogging, django, javascript, postgresql, sql, sqlite, youtube, heroku, datasette, observable, github-actions, fly, newsletter, substack, site-upgrades

Tom MacWright: Observable Notebooks 2.0

2025-08-06T16:37:13+00:00

Tom MacWright: Observable Notebooks 2.0

Observable announced Observable Notebooks 2.0 last week - the latest take on their JavaScript notebook technology, this time with an open file format and a brand new macOS desktop app.

Tom MacWright worked at Observable during their first iteration and here provides thoughtful commentary from an insider-to-outsider perspective on how their platform has evolved over time.

I particularly appreciated this aside on the downsides of evolving your own not-quite-standard language syntax:

Notebook Kit and Desktop support vanilla JavaScript, which is excellent and cool. The Observable changes to JavaScript were always tricky and meant that we struggled to use off-the-shelf parsers, and users couldn't use standard JavaScript tooling like eslint. This is stuff like the viewof operator which meant that Observable was not JavaScript. [...] Sidenote: I now work on Val Town, which is also a platform based on writing JavaScript, and when I joined it also had a tweaked version of JavaScript. We used the @ character to let you 'mention' other vals and implicitly import them. This was, like it was in Observable, not worth it and we switched to standard syntax: don't mess with language standards folks!

Tags: javascript, observable, tom-macwright, val-town

Share Claude conversations by converting their JSON to Markdown

2024-08-08T20:40:20+00:00

Share Claude conversations by converting their JSON to Markdown

Anthropic's Claude is missing one key feature that I really appreciate in ChatGPT: the ability to create a public link to a full conversation transcript. You can publish individual artifacts from Claude, but I often find myself wanting to publish the whole conversation.

Before ChatGPT added that feature I solved it myself with this ChatGPT JSON transcript to Markdown Observable notebook. Today I built the same thing for Claude.

Here's how to use it:

The key is to load a Claude conversation on their website with your browser DevTools network panel open and then filter URLs for chat_. You can use the Copy -> Response right click menu option to get the JSON for that conversation, then paste it into that new Observable notebook to get a Markdown transcript.

I like sharing these by pasting them into a "secret" Gist - that way they won't be indexed by search engines (adding more AI generated slop to the world) but can still be shared with people who have the link.

Here's an example transcript from this morning. I started by asking Claude:

I want to breed spiders in my house to get rid of all of the flies. What spider would you recommend?

When it suggested that this was a bad idea because it might attract pests, I asked:

What are the pests might they attract? I really like possums

It told me that possums are attracted by food waste, but "deliberately attracting them to your home isn't recommended" - so I said:

Thank you for the tips on attracting possums to my house. I will get right on that! [...] Once I have attracted all of those possums, what other animals might be attracted as a result? Do you think I might get a mountain lion?

It emphasized how bad an idea that would be and said "This would be extremely dangerous and is a serious public safety risk.", so I said:

OK. I took your advice and everything has gone wrong: I am now hiding inside my house from the several mountain lions stalking my backyard, which is full of possums

Claude has quite a preachy tone when you ask it for advice on things that are clearly a bad idea, which makes winding it up with increasingly ludicrous questions a lot of fun.

Tags: json, projects, tools, markdown, ai, observable, generative-ai, llms, anthropic, claude

Observable Plot: Waffle mark

2024-08-06T21:40:48+00:00

Observable Plot: Waffle mark

New feature in Observable Plot 0.6.16: the waffle mark! I really like this one. Here's an example showing the gender and weight of athletes in this year's Olympics:

Via @mbostock

Tags: javascript, visualization, observable, observable-plot

Hacker News homepage with links to comments ordered by most recent first

2024-07-15T17:48:07+00:00

Hacker News homepage with links to comments ordered by most recent first

Conversations on Hacker News are displayed as a tree, which can make it difficult to spot new comments added since the last time you viewed the thread.

There's a workaround for this using the Hacker News Algolia Search interface: search for story:STORYID, select "comments" and the result will be a list of comments sorted by most recent first.

I got fed up of doing this manually so I built a quick tool in an Observable Notebook that documents the hack, provides a UI for pasting in a Hacker News URL to get back that search interface link and also shows the most recent items on the homepage with links to their most recently added comments.

See also my How to read Hacker News threads with most recent comments first TIL from last year.

Via Show HN

Tags: hacker-news, projects, observable

marimo.app

2024-06-29T23:07:42+00:00

marimo.app

The Marimo reactive notebook (previously) - a Python notebook that's effectively a cross between Jupyter and Observable - now also has a version that runs entirely in your browser using WebAssembly and Pyodide. Here's the documentation.

Tags: python, jupyter, observable, webassembly, pyodide, marimo

Ham radio general exam question pool as JSON

2024-05-11T19:16:49+00:00

Ham radio general exam question pool as JSON

I scraped a pass of my Ham radio general exam this morning. One of the tools I used to help me pass was a Datasette instance with all 429 questions from the official question pool. I've published that raw data as JSON on GitHub, which I converted from the official question pool document using an Observable notebook.

Relevant TIL: How I studied for my Ham radio general exam.

Tags: json, projects, radio, datasette, observable, ham-radio

Wrap text at specified width

2024-03-28T03:36:01+00:00

Wrap text at specified width

New Observable notebook. I built this with the help of Claude 3 Opus—it’s a text wrapping tool which lets you set the width and also lets you optionally add a four space indent.

The four space indent is handy for posting on forums such as Hacker News that treat a four space indent as a code block.

Tags: projects, tools, observable, ai-assisted-programming, claude

GitHub Public repo history tool

2024-03-20T21:56:12+00:00

GitHub Public repo history tool

I built this Observable Notebook to run queries against the GH Archive (via ClickHouse) to try to answer questions about repository history—in particular, were they ever made public as opposed to private in the past.

It works by combining together PublicEvent event (moments when a private repo was made public) with the most recent PushEvent event for each of a user’s repositories.

Via TIL: Reviewing your history of public GitHub repositories using ClickHouse

Tags: github, projects, observable, clickhouse

Coroutines and web components

2024-03-09T03:38:53+00:00

Coroutines and web components

I like using generators in Python but I rarely knowingly use them in JavaScript—I’m probably most exposed to them by Observable, which uses then extensively under the hood as a mostly hidden implementation detail.

Laurent Renard here shows some absolutely ingenious tricks with them as a way of building stateful Web Components.

Via Hacker News

Tags: javascript, observable, web-components

Observable Framework 1.1

2024-03-05T21:12:48+00:00

Observable Framework 1.1

Less than three weeks after 1.0, the 1.1 release adds a whole lot of interesting new stuff. The signature feature is self-hosted npm imports: Framework 1.0 linked out to CDN hosted copies of libraries, but 1.1 fetches copies locally and then bundles that code with the deployed static site.

This works by using the acorn JavaScript parsing library to statically analyze the code and find all of the relevant imports.

Via @mbostock

Tags: javascript, npm, observable, mike-bostock, observable-framework

Interesting ideas in Observable Framework

2024-03-03T17:54:21+00:00

Mike Bostock, Announcing: Observable Framework:

Today we’re launching Observable 2.0 with a bold new vision: an open-source static site generator for building fast, beautiful data apps, dashboards, and reports.

Our mission is to help teams communicate more effectively with data. Effective presentation of data is critical for deep insight, nuanced understanding, and informed decisions. Observable notebooks are great for ephemeral, ad hoc data exploration. But notebooks aren't well-suited for polished dashboards and apps.

Enter Observable Framework.

There are a lot of really interesting ideas in Observable Framework.

A static site generator for data projects and dashboards

At its heart, Observable Framework is a static site generator. You give it a mixture of Markdown and JavaScript (and potentially other languages too) and it compiles them all together into fast loading interactive pages.

It ships with a full featured hot-reloading server, so you can edit those files in your editor, hit save and see the changes reflected instantly in your browser.

Once you're happy with your work you can run a build command to turn it into a set of static files ready to deploy to a server - or you can use the npm run deploy command to deploy it directly to Observable's own authenticated sharing platform.

JavaScript in Markdown

The key to the design of Observable Framework is the way it uses JavaScript in Markdown to create interactive documents.

Here's what that looks like:

# This is a document

Markdown content goes here.

This will output 1870:

```js
34 * 55
```

And here's the current date and time, updating constantly:

```js
new Date(now)
```

The same thing as an inline string: ${new Date(now)}

Any Markdown code block tagged js will be executed as JavaScript in the user's browser. This is an incredibly powerful abstraction - anything you can do in JavaScript (which these days is effectively anything at all) can now be seamlessly integrated into your document.

In the above example the now value is interesting - it's a special variable that provides the current time in milliseconds since the epoch, updating constantly. Because now updates constantly, the display value of the cell and that inline expression will update constantly as well.

If you've used Observable Notebooks before this will feel familiar - but notebooks involve code and markdown authored in separate cells. With Framework they are all now part of a single text document.

Aside: when I tried the above example I found that the ${new Date(now)} inline expression displayed as Mon Feb 19 2024 20:46:02 GMT-0800 (Pacific Standard Time) while the js block displayed as 2024-02-20T04:46:02.641Z. That's because inline expressions use the JavaScript default string representation of the object, while the js block uses the Observable display() function which has its own rules for how to display different types of objects, visible in inspect/src/inspect.js.

Everything is still reactive

The best feature of Observable Notebooks is their reactivity - the way cells automatically refresh when other cells they depend on change. This is a big difference to Python's popular Jupyter notebooks, and is the signature feature of marimo, a new Python notebook tool.

Observable Framework retains this feature in its new JavaScript Markdown documents.

This is particularly useful when working with form inputs. You can drop an input onto a page and refer its value throughout the rest of the document, adding realtime interactivity to documents incredibly easily.

Here's an example. I ported one of my favourite notebooks to Framework, which provides a tool for viewing download statistics for my various Python packages.

The Observable Framework version can be found at https://simonw.github.io/observable-framework-experiments/package-downloads - source code here on GitHub.

This entire thing is just 57 lines of Markdown. Here's the code with additional comments (and presented in a slightly different order - the order of code blocks doesn't matter in Observable thanks to reactivity).

# PyPI download stats for Datasette projects

Showing downloads for **${packageName}**

It starts with a Markdown <h1> heading and text that shows the name of the selected package.

```js echo
const packageName = view(Inputs.select(packages, {
  value: "sqlite-utils",
  label: "Package"
}));
```

This block displays the select widget allowing the user to pick one of the items from the packages array (defined later on).

Inputs.select() is a built-in method provided by Framework, described in the Observable Inputs documentation.

The view() function is new in Observable Framework - it's the thing that enables the reactivity, ensuring that updates to the input selection are acted on by other code blocks in the document.

Because packageName is defined with const it becomes a variable that is visible to other js blocks on the page. It's used by this next block:

```js echo
const data = d3.json(
  `https://datasette.io/content/stats.json?_size=max&package=${packageName}&_sort_desc=date&_shape=array`
);

Here we are fetching the data that we need for the chart. I'm using d3.json() (all of D3 is available in Framework) to fetch the data from a URL that includes the selected package name.

The data is coming from Datasette, using the Datasette JSON API. I have a SQLite table at datasette.io/content/stats that's updated once a day with the latest PyPI package statistics via a convoluted series of GitHub Actions workflows, described previously.

Adding .json to that URL returns the JSON, then I ask for rows for that particular package, sorted descending by date and returning the maximum number of rows (1,000) as a JSON array of objects.

Now that we have data as a variable we can manipulate it slightly for use with Observable Plot - parsing the SQLite string dates into JavaScript Date objects:

```js echo
const data_with_dates = data.map(function(d) {
  d.date = d3.timeParse("%Y-%m-%d")(d.date);
  return d;
})
```

This code is ready to render as a chart. I'm using Observable Plot - also packaged with Framework:

```js echo
Plot.plot({
  y: {
    grid: true,
    label: `${packageName} PyPI downloads per day`
  },
  width: width,
  marginLeft: 60,
  marks: [
    Plot.line(data_with_dates, {
      x: "date",
      y: "downloads",
      title: "downloads",
      tip: true
    })
  ]
})
```

So we have one cell that lets the user pick the package they want, a cell that fetches that data, a cell that processes it and a cell that renders it as a chart.

There's one more piece of the puzzle: where does that list of packages come from? I fetch that with another API call to Datasette. Here I'm using a SQL query executed against the /content database directly:

```js echo
const packages_sql = "select package from stats group by package order by max(downloads) desc"
```
```js echo
const packages = fetch(
  `https://datasette.io/content.json?sql=${encodeURIComponent(
    packages_sql
  )}&_size=max&_shape=arrayfirst`
).then((r) => r.json());
```

_shape=arrayfirst is a shortcut for getting back a JSON array of the first column of the resulting rows.

That's all there is to it! It's a pretty tiny amount of code for a full interactive dashboard.

Only include the code that you use

You may have noticed that my dashboard example uses several additional libraries - Inputs for the form element, d3 for the data fetching and Plot for the chart rendering.

Observable Framework is smart about these. It implements lazy loading in development mode, so code is only loaded the first time you attempt to use it in a cell.

When you build and deploy your application, Framework automatically loads just the referenced library code from the jsdelivr CDN.

Cache your data at build time

One of the most interesting features of Framework is its Data loader mechanism.

Dashboards built using Framework can load data at runtime from anywhere using fetch() requests (or wrappers around them). This is how Observable Notebooks work too, but it leaves the performance of your dashboard at the mercy of whatever backends you are talking to.

Dashboards benefit from fast loading times. Framework encourages a pattern where you build the data for the dashboard at deploy time, bundling it together into static files containing just the subset of the data needed for the dashboard. These can be served lightning fast from the same static hosting as the dashboard code itself.

The design of the data loaders is beautifully simple and powerful. A data loader is a script that can be written in any programming language. At build time, Framework executes that script and saves whatever is outputs to a file.

A data loader can be as simple as the following, saved as quakes.json.sh:

curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson

When the application is built, that filename tells Framework the destination file (quakes.json) and the loader to execute (.sh).

This means you can load data from any source using any technology you like, provided it has the ability to output JSON or CSV or some other useful format to standard output.

Comparison to Observable Notebooks

Mike introduced Observable Framework as Observable 2.0. It's worth reviewing how the this system compares to the original Observable Notebook platform.

I've been a huge fan of Observable Notebooks for years - 38 blog posts and counting! The most obvious comparison is to Jupyter Notebooks, where they have some key differences:

Observable notebooks use JavaScript, not Python.
The notebook editor itself isn't open source - it's a hosted product provided on observablehq.com. You can export the notebooks as static files and run them anywhere you like, but the editor itself is a proprietary product.
Observable cells are reactive. This is the key difference with Jupyter: any time you change a cell all other cells that depend on that cell are automatically re-evaluated, similar to Excel.
The JavaScript syntax they use isn't quite standard JavaScript - they had to invent a new viewof keyword to support their reactivity model.
Editable notebooks are a pretty complex proprietary file format. They don't play well with tools like Git, to the point that Observable ended up implementing their own custom version control and collaboration systems.

Observable Framework reuses many of the ideas (and code) from Observable Notebooks, but with some crucial differences:

Notebooks (really documents) are now single text files - Markdown files with embedded JavaScript blocks. It's all still reactive, but the file format is much simpler and can be edited using any text editor, and checked into Git.
It's all open source. Everything is under an ISC license (OSI approved) and you can run the full editing stack on your own machine.
It's all just standard JavaScript now - no custom syntax.

A change in strategy

Reading the tea leaves a bit, this also looks to me like a strategic change of direction for Observable as a company. Their previous focus was on building great collaboration tools for data science and analytics teams, based around the proprietary Observable Notebook editor.

With Framework they appear to be leaning more into the developer tools space.

On Twitter @observablehq describes itself as "The end-to-end solution for developers who want to build and host dashboards that don’t suck" - the Internet Archive copy from October 3rd 2023 showed "Build data visualizations, dashboards, and data apps that impact your business — faster."

I'm excited to see where this goes. I've limited my usage of Observable Notebooks a little in the past purely due to the proprietary nature of their platform and the limitations placed on free accounts (mainly the lack of free private notebooks), while still having enormous respect for the technology and enthusiastically adopting their open source libraries such as Observable Plot.

Observable Framework addresses basically all of my reservations. It's a fantastic new expression of the ideas that made Observable Notebooks so compelling, and I expect to use it for all sorts of interesting projects in the future.

Tags: javascript, open-source, pypi, d3, jupyter, observable, mike-bostock, observable-framework, observable-plot

PGlite

2024-02-23T15:56:37+00:00

PGlite

PostgreSQL compiled for WebAssembly and turned into a very neat JavaScript library. Previous attempts at running PostgreSQL in WASM have worked by bundling a full Linux virtual machine - PGlite just bundles a compiled PostgreSQL itself, which brings the size down to an impressive 3.7MB gzipped.

I built this interactive demo of PGlite using Observable Framework, source code here.

Via Anton Zhiyanov's new PostgreSQL playground

Tags: postgresql, observable, webassembly, observable-framework

Observable notebook: URL to download a GitHub repository as a zip file

2024-01-29T21:17:27+00:00

Observable notebook: URL to download a GitHub repository as a zip file

GitHub broke the “right click -> copy URL” feature on their Download ZIP button a few weeks ago. I’m still hoping they fix that, but in the meantime I built this Observable Notebook to generate ZIP URLs for any GitHub repo and any branch or commit hash.

Update 30th January 2024: GitHub have fixed the bug now, so right click -> Copy URL works again on that button.

Via GitHub discussion forum where I reported the bug

Tags: github, observable

Marimo

2024-01-12T21:17:57+00:00

Marimo

This is a really interesting new twist on Python notebooks.

The most powerful feature is that these notebooks are reactive: if you change the value or code in a cell (or change the value in an input widget) every other cell that depends on that value will update automatically. It’s the same pattern implemented by Observable JavaScript notebooks, but now it works for Python.

There are a bunch of other nice touches too. The notebook file format is a regular Python file, and those files can be run as “applications” in addition to being edited in the notebook interface. The interface is very nicely built, especially for such a young project—they even have GitHub Copilot integration for their CodeMirror cell editors.

Via Hacker News

Tags: open-source, python, jupyter, observable, github-copilot, marimo

datasette-plot - a new Datasette Plugin for building data visualizations

2023-12-31T05:04:19+00:00

datasette-plot - a new Datasette Plugin for building data visualizations

I forgot to link to this here last week: Alex Garcia released the first version of datasette-plot, a brand new Datasette visualization plugin built on top of the Observable Plot charting library. We plan to use this as the new, updated alternative to my older datasette-vega plugin.

Tags: plugins, visualization, datasette, observable, alex-garcia, observable-plot

Observable notebook: Detect objects in images

2023-10-01T15:46:14+00:00

Observable notebook: Detect objects in images

I built an Observable notebook that uses Transformers.js and the Xenova/detra-resnet-50 model to detect objects in images, entirely running within your browser. You can select an image using a file picker and it will show you that image with bounding boxes and labels drawn around items within it. I have a demo image showing some pelicans flying ahead, but it works with any image you give it - all without uploading that image to a server.

Via @simonw

Tags: javascript, machine-learning, transformers, ai, observable, transformers-js

Llama encoder and decoder

2023-06-13T22:37:29+00:00

Llama encoder and decoder

I forked my GPT tokenizer Observable notebook to create a similar tool for exploring the tokenization scheme used by the Llama family of LLMs, using the new llama-tokenizer-js JavaScript library.

Tags: ai, observable, generative-ai, llama, llms, tokenization

GPT-3 token encoder and decoder

2023-04-27T23:48:34+00:00

GPT-3 token encoder and decoder

I built an Observable notebook with an interface to encode, decode and search through GPT-3 tokens, building on top of a notebook by EJ Fox and Ian Johnson.

Tags: projects, ai, observable, gpt-3, openai, llms

Weeknotes: A new llm CLI tool, plus automating my weeknotes and newsletter

2023-04-04T23:28:29+00:00

I started publishing weeknotes in 2019 partly as a way to hold myself accountable but mainly as a way to encourage myself to write more.

Now that I'm writing multiple posts a week (mainly about AI) - and sending them out as a newsletter - my weeknotes are feeling a little less necessary. Here's everything I've written here since my last weeknotes on 22nd March:

I built a ChatGPT plugin to answer questions about data hosted in Datasette
AI-enhanced development makes me more ambitious with my projects - and for another illustrative example of that effect, see my TIL Reading thermometer temperatures over time from a video
What AI can do for you on the Theory of Change podcast
Think of language models like ChatGPT as a "calculator for words"
Semi-automating a Substack newsletter with an Observable notebook

(That list created using this SQL query.)

I'm going to keep them going though: I've had so much value out of the habit that I don't feel it's time to stop.

The llm CLI tool

This is one new piece of software I've released in the past few weeks that I haven't written about yet.

I built the first version of llm, a command-line tool for running prompts against large language model (currently just ChatGPT and GPT-4), getting the results back on the command-line and also storing the prompt and response in a SQLite database.

It's still pretty experimental, but it's already looking like it will be a fun playground for trying out new things.

Here's the 30s version of how to start using it:

# Install the tool
pipx install llm
# Put an OpenAI API key somewhere it can find it
echo 'your-OpenAI-API-key' > ~/.openai-api-key.txt
# Or you can set it as an environment variable:
# export OPENAI_API_KEY='...'
# Run a prompt
llm 'Ten names for cheesecakes'

This will output the response to that prompt directly to the terminal.

Add the -s or --stream option to stream results instead:

Prompts are run against ChatGPT's inexpensive gpt-3.5-turbo model by default. You can use -4 to run against the GPT-4 model instead (if you have access to it), or --model X to run against another named OpenAI model.

If a SQLite database file exists in ~/.llm/log.db any prompts you run will be automatically recorded to that database, which you can then explore using datasette ~/.llm/log.db.

The following command will create that database if it does not yet exist:

 llm init-db

There's more in the README.

There are plenty of other options for tools for running LLM prompts on your own machines, including some that work on the command-line and some that record your results. llm is probably less useful than those alternatives, but it's a fun space for me to try out new ideas.

Automating my weeknotes

I wrote at length about how I automated most of my newsletter using an Observable notebook and some Datasette tricks.

I realized the same trick could work for my weeknotes as well. The "releases this week" and "TILs this week" sections have previously been generated by hand, so I applied the same technique from the newsletter notebook to automate them as well.

observablehq.com/@simonw/weeknotes is the notebook. It fetches TILs from my TILs Datasette, then grabs releases from this page on GitHub.

It also fetches the full text of my most recent weeknotes post from my blog's Datasette backup so it can calculate which releases and TILs are new since last time.

It uses various regular expression and array tricks to filter that content to just the new stuff, then assembles me a markdown string which I can use as the basis of my new post.

Here's what that generated for me this week:

Releases since last time

datasette-explain 0.1a1 - 2023-04-04
Explain and validate SQL queries as you type them into Datasette
llm 0.2 - 2023-04-01
Access large language models from the command-line
datasette-graphql 2.2 - 2023-03-23
Datasette plugin providing an automatic GraphQL API for your SQLite databases

TIL since last time

Copy tables between SQLite databases - 2023-04-03
Reading thermometer temperatures over time from a video - 2023-04-02
Using the ChatGPT streaming API from Python - 2023-04-01
Interactive row selection prototype with Datasette - 2023-03-30
Using jq in an Observable notebook - 2023-03-26
Convert git log output to JSON using jq - 2023-03-25

Tags: cli, projects, datasette, observable, weeknotes, llms, llm

Semi-automating a Substack newsletter with an Observable notebook

2023-04-04T17:55:28+00:00

I recently started sending out a weekly-ish email newsletter consisting of content from my blog. I've mostly automated that, using an Observable Notebook to generate the HTML. Here's how that system works.

What goes in my newsletter

My blog has three types of content: entries, blogmarks and quotations. "Blogmarks" is a name I came up with for bookmarks in 2003.

Blogmarks and quotations show up in my blog's sidebar, entries get the main column - but on mobile the three are combined into a single flow.

These live in a PostgreSQL database managed by Django. You can see them defined in models.py in my blog's open source repo.

My newsletter consists of all of the new entries, blogmarks and quotations since I last sent it out. I include the entries first in reverse chronological order, since usually the entry I've just written is the one I want to use for the email subject. The blogmarks and quotations come in chronological order afterwards.

I'm including the full HTML for everything: people don't need to click through back to my blog to read it, all of the content should be right there in their email client.

The Substack API: RSS and copy-and-paste

Substack doesn't yet offer an API, and have no public plans to do so.

They do offer an RSS feed of each newsletter though - add /feed to the newsletter subdomain to get it. Mine is at https://simonw.substack.com/feed.

So we can get data back out again... but what about getting data in? I don't want to manually assemble a newsletter from all of these different sources of data.

That's where copy-and-paste comes in.

The Substack compose editor incorporates a well built rich-text editor. You can paste content into it and it will clean it up to fit the subset of HTML that Substack supports... but that's a pretty decent subset. Headings, paragraphs, lists, links, code blocks and images are all supported.

The vast majority of content on my blog fits that subset neatly.

Crucially, pasting in images as part of that rich text content Just Works: Substack automatically copies any images to their substack-post-media S3 bucket and embeds links to their CDN in the body of the newsletter.

So... if I can generate the intended rich-text HTML for my whole newsletter, I can copy and paste it directly into the Substack.

That's exactly what my new Observable notebook does: https://observablehq.com/@simonw/blog-to-newsletter

Generating HTML is a well trodden path, but I also wanted a "copy to clipboard" button that would copy the rich text version of that HTML such that pasting it into Substack would do the right thing.

With a bit of help from MDN and ChatGPT (my TIL) I figured out the following:

function copyRichText(html) {
  const htmlContent = html;
  // Create a temporary element to hold the HTML content
  const tempElement = document.createElement("div");
  tempElement.innerHTML = htmlContent;
  document.body.appendChild(tempElement);
  // Select the HTML content
  const range = document.createRange();
  range.selectNode(tempElement);
  // Copy the selected HTML content to the clipboard
  const selection = window.getSelection();
  selection.removeAllRanges();
  selection.addRange(range);
  document.execCommand("copy");
  selection.removeAllRanges();
  document.body.removeChild(tempElement);
}

This works great! Set up a button that triggers that function and clicking that button will copy a rich text version of the HTML to the clipboard, such that pasting it directly into the Substack editor has the desired effect.

Assembling the HTML

I love using Observable Notebooks for this kind of project: quick data integration tools that need a UI and will likely be incrementally improved over time.

Using Observable for these means I don't need to host anything and I can iterate my way to the right solution really quickly.

First, I needed to retrieve my entries, blogmarks and quotations.

I never built an API for my Django blog directly, but a while ago I set up a mechanism that exports the contents of my blog to my simonwillisonblog-backup GitHub repository for safety, and then deploys a Datasette/SQLite copy of that data to https://datasette.simonwillison.net/.

Datasette offers a JSON API for querying that data, and exposes open CORS headers which means JavaScript running in Observable can query it directly.

Here's an example SQL query running against that Datasette instance - click the .json link on that page to get that data back as JSON instead.

My Observable notebook can then retrieve the exact data it needs to construct the HTML for the newsletter.

The smart thing to do would have been to retrieve the data from the API and then use JavaScript inside Observable to compose that together into the HTML for the newsletter.

I decided to challenge myself to doing most of the work in SQL instead, and came up with the following absolute monster of a query:

with content as (
  select
    'entry' as type, title, created, slug,
    '<h3><a href="' || 'https://simonwillison.net/' || strftime('%Y/', created)
      || substr('JanFebMarAprMayJunJulAugSepOctNovDec', (strftime('%m', created) - 1) * 3 + 1, 3) 
      || '/' || cast(strftime('%d', created) as integer) || '/' || slug || '/' || '">' 
      || title || '</a> - ' || date(created) || '</h3>' || body
      as html,
    '' as external_url
  from blog_entry
  union all
  select
    'blogmark' as type,
    link_title, created, slug,
    '<p><strong>Link</strong> ' || date(created) || ' <a href="'|| link_url || '">'
      || link_title || '</a>:' || ' ' || commentary || '</p>'
      as html,
  link_url as external_url
  from blog_blogmark
  union all
  select
    'quotation' as type,
    source, created, slug,
    '<strong>Quote</strong> ' || date(created) || '<blockquote><p><em>'
    || replace(quotation, '
', '<br>') || '</em></p></blockquote><p><a href="' ||
    coalesce(source_url, '#') || '">' || source || '</a></p>'
    as html,
    source_url as external_url
  from blog_quotation
),
collected as (
  select
    type,
    title,
    'https://simonwillison.net/' || strftime('%Y/', created)
      || substr('JanFebMarAprMayJunJulAugSepOctNovDec', (strftime('%m', created) - 1) * 3 + 1, 3) || 
      '/' || cast(strftime('%d', created) as integer) || '/' || slug || '/'
      as url,
    created,
    html,
    external_url
  from content
  where created >= date('now', '-' || :numdays || ' days')   
  order by created desc
)
select type, title, url, created, html, external_url
from collected 
order by 
  case type 
    when 'entry' then 0 
    else 1 
  end,
  case type 
    when 'entry' then created 
    else -strftime('%s', created) 
  end desc

This logic really should be in the JavaScript instead! You can try that query in Datasette.

There are a bunch of tricks in there, but my favourite is this one:

select 'https://simonwillison.net/' || strftime('%Y/', created)
  || substr(
    'JanFebMarAprMayJunJulAugSepOctNovDec',
    (strftime('%m', created) - 1) * 3 + 1, 3
  ) ||  '/' || cast(strftime('%d', created) as integer) || '/' || slug || '/'
  as url

This is the trick I'm using to generate the URL for each entry, blogmark and quotation.

These are stored as datetime values in the database, but the eventual URLs look like this:

https://simonwillison.net/2023/Apr/2/calculator-for-words/

So I need to turn that date into a YYYY/Mon/DD URL component.

One problem: SQLite doesn't have a date format string that produces a three letter month abbreviation. But... with cunning application of the substr() function and a string of all the month abbreviations I can get what I need.

The above SQL query plus a little bit of JavaScript provides almost everything I need to generate the HTML for my newsletter.

Excluding previously sent content

There's one last problem to solve: I want to send a newsletter containing everything that's new since my last edition - I don't want to send out the same content twice.

I came up with a delightfully gnarly solution to that as well.

As mentioned earlier, Substack provides an RSS feed of previous editions. I can use that data to avoid including content that's already been sent.

One problem: the Substack RSS feed does't include CORS headers, which means I can't access it directly from my notebook.

GitHub offers CORS headers for every file in every repository. I already had a repo that was backing up my blog... so why not set that to backup my RSS feed from Substack as well?

I added this to my existing backup.yml GitHub Actions workflow:

- name: Backup Substack
  run: |-
    curl 'https://simonw.substack.com/feed' | \
      python -c "import sys, xml.dom.minidom; print(xml.dom.minidom.parseString(sys.stdin.read()).toprettyxml(indent='  '))" \
      > simonw-substack-com.xml

I'm piping it through a tiny Python script here to pretty-print the XML before saving it, because pretty-printed XML is easier to read diffs against later on.

Now simonw-substack-com.xml is a copy of my RSS feed in a GitHub repo, which means I can access the data directly from JavaScript running on Observable.

Here's the code I wrote there to fetch that RSS feed, parse it as XML and return a string containing just the HTML of all of the posts:

previousNewsletters = {
  const response = await fetch(
    "https://raw.githubusercontent.com/simonw/simonwillisonblog-backup/main/simonw-substack-com.xml"
  );
  const rss = await response.text();
  const parser = new DOMParser();
  const xmlDoc = parser.parseFromString(rss, "application/xml");
  const xpathExpression = "//content:encoded";

  const namespaceResolver = (prefix) => {
    const ns = {
      content: "http://purl.org/rss/1.0/modules/content/"
    };
    return ns[prefix] || null;
  };

  const result = xmlDoc.evaluate(
    xpathExpression,
    xmlDoc,
    namespaceResolver,
    XPathResult.ANY_TYPE,
    null
  );
  let node;
  let text = [];
  while ((node = result.iterateNext())) {
    text.push(node.textContent);
  }
  return text.join("\n");
}

Then I span up a regular expression to extract all of the URLs from that HTML:

previousLinks = {
  const regex = /(?:"|&quot;)(https?:\/\/[^\s"<>]+)(?:"|&quot;)/g;
  return Array.from(previousNewsletters.matchAll(regex), (match) => match[1]);
}

Added a "skip existing" toggle checkbox to my notebook:

viewof skipExisting = Inputs.toggle({
  label: "Skip content sent in prior newsletters"
})

And added this code to filter the raw content based on whether or not the toggle was selected:

content = skipExisting
  ? raw_content.filter(
      (e) =>
        !previousLinks.includes(e.url) &&
        !previousLinks.includes(e.external_url)
    )
  : raw_content

The url is the URL to the post on my blog. external_url is the URL to the original source of the blogmark or quotation. A match against ether of those should exclude the content from my next newsletter.

My workflow for sending a newsletter

Given all of the above, sending a newsletter out is hardly any work at all:

Ensure the most recent backup of my blog has run, such that the Datasette instance contains my latest content. I do that by triggering this action.
Navigate to https://observablehq.com/@simonw/blog-to-newsletter - select "Skip content sent in prior newsletters" and then click the "Copy rich text newsletter to clipboard" button.
Navigate to the Substack "publish" interface and paste that content into the rich text editor.
Pick a title and subheading, and maybe add a bit of introductory text.
Preview it. If the preview looks good, hit "send".

Copy and paste APIs

I think copy and paste is under-rated as an API mechanism.

There are no rate limits or API keys to worry about.

It's supported by almost every application, even ones that are resistant to API integrations.

It even works great on mobile phones, especially if you include a "copy to clipboard" button.

My datasette-copyable plugin for Datasette is one of my earlier explorations of this. It makes it easy to copy data out of Datasette in a variety of useful formats.

This Observable newsletter project has further convinced me that the clipboard is an under-utilized mechanism for building tools to help integrate data together in creative ways.

Tags: blogging, projects, datasette, observable, cors, newsletter, substack, site-upgrades

Introducing sqlite-vss: A SQLite Extension for Vector Search

2023-02-10T22:53:14+00:00

Introducing sqlite-vss: A SQLite Extension for Vector Search

This latest SQLite extension from Alex Garcia is possibly his best yet: it adds FAISS-powered vector similarity search directly to SQLite, enabling fast KNN similarity lookups against a virtual table that feels a lot like SQLite’s own built-in full text search feature. This write-up includes interactive demos using Datasette called from an Observable notebook, running similarity searches against an index of 200,000 news headlines and summaries in less than 50ms.

Via @simon on Mastodon

Tags: sqlite, datasette, observable, alex-garcia, vector-search

Tracking Mastodon user numbers over time with a bucket of tricks

2022-11-20T07:00:54+00:00

Mastodon is definitely having a moment. User growth is skyrocketing as more and more people migrate over from Twitter.

I've set up a new git scraper to track the number of registered user accounts on known Mastodon instances over time.

It's only been running for a few hours, but it's already collected enough data to render this chart:

I'm looking forward to seeing how this trend continues to develop over the next days and weeks.

Scraping the data

My scraper works by tracking https://instances.social/ - a website that lists a large number (but not all) of the Mastodon instances that are out there.

That site publishes an instances.json array which currently contains 1,830 objects representing Mastodon instances. Each of those objects looks something like this:

{
    "name": "pleroma.otter.sh",
    "title": "Otterland",
    "short_description": null,
    "description": "Otters does squeak squeak",
    "uptime": 0.944757,
    "up": true,
    "https_score": null,
    "https_rank": null,
    "ipv6": true,
    "openRegistrations": false,
    "users": 5,
    "statuses": "54870",
    "connections": 9821,
}

I have a GitHub Actions workflow running approximately every 20 minutes that fetches a copy of that file and commits it back to this repository:

https://github.com/simonw/scrape-instances-social

Since each instance includes a users count, the commit history of my instances.json file tells the story of Mastodon's growth over time.

Building a database

A commit log of a JSON file is interesting, but the next step is to turn that into actionable information.

My git-history tool is designed to do exactly that.

For the chart up above, the only number I care about is the total number of users listed in each snapshot of the file - the sum of that users field for each instance.

Here's how to run git-history against that file's commit history to generate tables showing how that count has changed over time:

git-history file counts.db instances.json \
  --convert "return [
    {
        'id': 'all',
        'users': sum(d['users'] or 0 for d in json.loads(content)),
        'statuses': sum(int(d['statuses'] or 0) for d in json.loads(content)),
    }
  ]" --id id

I'm creating a file called counts.db that shows the history of the instances.json file.

The real trick here though is that --convert argument. I'm using that to compress each snapshot down to a single row that looks like this:

{
    "id": "all",
    "users": 4717781,
    "statuses": 374217860
}

Normally git-history expects to work against an array of objects, tracking the history of changes to each one based on their id property.

Here I'm tricking it a bit - I only return a single object with the ID of all. This means that git-history will only track the history of changes to that single object.

It works though! The result is a counts.db file which is currently 52KB and has the following schema (truncated to the most interesting bits):

CREATE TABLE [commits] (
   [id] INTEGER PRIMARY KEY,
   [namespace] INTEGER REFERENCES [namespaces]([id]),
   [hash] TEXT,
   [commit_at] TEXT
);
CREATE TABLE [item_version] (
   [_id] INTEGER PRIMARY KEY,
   [_item] INTEGER REFERENCES [item]([_id]),
   [_version] INTEGER,
   [_commit] INTEGER REFERENCES [commits]([id]),
   [id] TEXT,
   [users] INTEGER,
   [statuses] INTEGER,
   [_item_full_hash] TEXT
);

Each item_version row will tell us the number of users and statuses at a particular point in time, based on a join against that commits table to find the commit_at date.

Publishing the database

For this project, I decided to publish the SQLite database to an S3 bucket. I considered pushing the binary SQLite file directly to the GitHub repository but this felt rude, since a binary file that changes every 20 minutes would bloat the repository.

I wanted to serve the file with open CORS headers so I could load it into Datasette Lite and Observable notebooks.

I used my s3-credentials tool to create a bucket for this:

~ % s3-credentials create scrape-instances-social --public --website --create-bucket
Created bucket: scrape-instances-social
Attached bucket policy allowing public access
Configured website: IndexDocument=index.html, ErrorDocument=error.html
Created  user: 's3.read-write.scrape-instances-social' with permissions boundary: 'arn:aws:iam::aws:policy/AmazonS3FullAccess'
Attached policy s3.read-write.scrape-instances-social to user s3.read-write.scrape-instances-social
Created access key for user: s3.read-write.scrape-instances-social
{
    "UserName": "s3.read-write.scrape-instances-social",
    "AccessKeyId": "AKIAWXFXAIOZI5NUS6VU",
    "Status": "Active",
    "SecretAccessKey": "...",
    "CreateDate": "2022-11-20 05:52:22+00:00"
}

This created a new bucket called scrape-instances-social configured to work as a website and allow public access.

It also generated an access key and a secret access key with access to just that bucket. I saved these in GitHub Actions secrets called AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

I enabled a CORS policy on the bucket like this:

s3-credentials set-cors-policy scrape-instances-social

Then I added the following to my GitHub Actions workflow to build and upload the database after each run of the scraper:

    - name: Build and publish database using git-history
      env:
        AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
        AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
      run: |-
        # First download previous database to save some time
        wget https://scrape-instances-social.s3.amazonaws.com/counts.db
        # Update with latest commits
        ./build-count-history.sh
        # Upload to S3
        s3-credentials put-object scrape-instances-social counts.db counts.db \
          --access-key $AWS_ACCESS_KEY_ID \
          --secret-key $AWS_SECRET_ACCESS_KEY

git-history knows how to only process commits since the last time the database was built, so downloading the previous copy saves a lot of time.

Exploring the data

Now that I have a SQLite database that's being served over CORS-enabled HTTPS I can open it in Datasette Lite - my implementation of Datasette compiled to WebAssembly that runs entirely in a browser.

https://lite.datasette.io/?url=https://scrape-instances-social.s3.amazonaws.com/counts.db

Any time anyone follows this link their browser will fetch the latest copy of the counts.db file directly from S3.

The most interesting page in there is the item_version_detail SQL view, which joins against the commits table to show the date of each change:

https://lite.datasette.io/?url=https://scrape-instances-social.s3.amazonaws.com/counts.db#/counts/item_version_detail

(Datasette Lite lets you link directly to pages within Datasette itself via a #hash.)

Plotting a chart

Datasette Lite doesn't have charting yet, so I decided to turn to my favourite visualization tool, an Observable notebook.

Observable has the ability to query SQLite databases (that are served via CORS) directly these days!

Here's my notebook:

https://observablehq.com/@simonw/mastodon-users-and-statuses-over-time

There are only four cells needed to create the chart shown above.

First, we need to open the SQLite database from the remote URL:

database = SQLiteDatabaseClient.open(
  "https://scrape-instances-social.s3.amazonaws.com/counts.db"
)

Next we need to use an Obervable Database query cell to execute SQL against that database and pull out the data we want to plot - and store it in a query variable:

SELECT _commit_at as date, users, statuses
FROM item_version_detail

We need to make one change to that data - we need to convert the date column from a string to a JavaScript date object:

points = query.map((d) => ({
  date: new Date(d.date),
  users: d.users,
  statuses: d.statuses
}))

Finally, we can plot the data using the Observable Plot charting library like this:

Plot.plot({
  y: {
    grid: true,
    label: "Total users over time across all tracked instances"
  },
  marks: [Plot.line(points, { x: "date", y: "users" })],
  marginLeft: 100
})

I added 100px of margin to the left of the chart to ensure there was space for the large (4,696,000 and up) labels on the y-axis.

A bunch of tricks combined

This project combines a whole bunch of tricks I've been pulling together over the past few years:

Git scraping is the technique I use to gather the initial data, turning a static listing of instances into a record of changes over time
git-history is my tool for turning a scraped Git history into a SQLite database that's easier to work with
s3-credentials makes working with S3 buckets - in particular creating credentials that are restricted to just one bucket - much less frustrating
Datasette Lite means that once you have a SQLite database online somewhere you can explore it in your browser - without having to run my full server-side Datasette Python application on a machine somewhere
And finally, combining the above means I can take advantage of Observable notebooks for ad-hoc visualization of data that's hosted online, in this case as a static SQLite database file served from S3

Tags: github, projects, datasette, observable, github-actions, git-scraping, git-history, s3-credentials, datasette-lite, mastodon, cors

Spevktator: OSINT analysis tool for VK

2022-09-05T20:48:20+00:00

Spevktator: OSINT analysis tool for VK

This is a really cool project that came out of a recent Bellingcat hackathon. Spevktator takes 67,000 posts from five popular Russian news channels on VK (a popular Russian social media platform) and makes them available in Datasette, along with automated translations to English, post sharing metrics and sentiment analysis scores. This README includes some detailed analysis of the data, plus a link to an Observable notebook that implements custom visualizations against queries run directly against the Datasette instance.

Tags: political-hacking, datasette, observable, bellingcat

Open every CSV file in a GitHub repository in Datasette Lite

2022-09-01T19:24:21+00:00

Open every CSV file in a GitHub repository in Datasette Lite

I built an Observable notebook that accepts a GitHub repository as input, scans it for CSV files and generates a link to open all of those CSV files in Datasette Lite.

Via @simonw

Tags: github, projects, observable, datasette-lite

Introducing sqlite-lines - a SQLite extension for reading files line-by-line

2022-07-30T19:18:53+00:00

Introducing sqlite-lines - a SQLite extension for reading files line-by-line

Alex Garcia wrote a brilliant C module for SQLIte which adds functions (and a table-valued function) for efficiently reading newline-delimited text into SQLite. When combined with SQLite’s built-in JSON features this means you can read a huge newline-delimited JSON file into SQLite in a streaming fashion so it doesn’t exhaust memory for a large file. Alex also compiled the extension to WebAssembly, and his post here is an Observable notebook post that lets you exercise the code directly.

Via @agarcia_me

Tags: json, sqlite, observable, webassembly, alex-garcia

Datasette table diagram using Mermaid

2022-02-14T19:43:15+00:00

Datasette table diagram using Mermaid

Mermaid is a DSL for generating diagrams from plain text, designed to be embedded in Markdown. GitHub just added support for Mermaid to their Markdown pipeline, which inspired me to try it out. Here’s an Observable Notebook I built which uses Mermaid to visualize the relationships between Datasette tables based on their foreign keys.

Via @simonw

Tags: dsl, github, visualization, datasette, observable, mermaid

GitHub Burndown

2022-02-10T16:29:04+00:00

GitHub Burndown

Neat Observable notebook by Tom MacWright—give it a GitHub access token and the name of a repo and it pulls the details of every issue and plots a burndown chart over time, showing how long issues stay open for. The code is worth spending some time with—the way it fetches data from the paginated JSON API is a really great example of using generators with Observable, and the chart itself is a lovely clear example of Observable Plot.

Via @tmcw

Tags: github, observable, tom-macwright, observable-plot

Observable Plot Cheatsheets

2022-01-25T22:12:45+00:00

Observable Plot Cheatsheets

Beautiful new set of cheatsheets by Mike Freeman for the Observable Plot charting library. This is really top notch documentation—the cheatsheets are available as printable PDFs but the real value here is in the interactive versions of them, which include Observable-powered sliders to tweak the different examples and copy out the resulting generated code.

Via @mf_viz

Tags: visualization, observable, observable-plot

Datasette downloads per day (with Observable Plot)

2021-07-17T17:01:46+00:00

Datasette downloads per day (with Observable Plot)

I built an Observable notebook that imports PyPI package download data from datasette.io (itself scraped from pypistats.org using a scheduled GitHub Action) and plots it using Observable Plot. Datasette downloads from PyPI apparently jumped from ~800/day in May to ~4,000/day in July—would love to know why!

Via @simonw

Tags: pypi, datasette, observable, observable-plot