<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: dave-guarino</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/dave-guarino.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-02-12T22:01:42+00:00</updated><author><name>Simon Willison</name></author><entry><title>Building a SNAP LLM eval: part 1</title><link href="https://simonwillison.net/2025/Feb/12/building-a-snap-llm/#atom-tag" rel="alternate"/><published>2025-02-12T22:01:42+00:00</published><updated>2025-02-12T22:01:42+00:00</updated><id>https://simonwillison.net/2025/Feb/12/building-a-snap-llm/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.propel.app/insights/building-a-snap-llm-eval-part-1/"&gt;Building a SNAP LLM eval: part 1&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Dave Guarino (&lt;a href="https://simonwillison.net/2023/Jul/26/dave-guarino/"&gt;previously&lt;/a&gt;) has been exploring using LLM-driven systems to help people apply for &lt;a href="https://en.wikipedia.org/wiki/Supplemental_Nutrition_Assistance_Program"&gt;SNAP&lt;/a&gt;, the US Supplemental Nutrition Assistance Program (aka food stamps).&lt;/p&gt;
&lt;p&gt;This is a domain which existing models know &lt;em&gt;some&lt;/em&gt; things about, but which is full of critical details around things like eligibility criteria where accuracy really matters.&lt;/p&gt;
&lt;p&gt;Domain-specific evals like this are still pretty rare. As Dave puts it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There is also not a lot of public, easily digestible writing out there on building evals in specific domains. So one of our hopes in sharing this is that it helps others build evals for domains they know deeply.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Having robust evals addresses multiple challenges. The first is establishing how good the raw models are for a particular domain. A more important one is to help in developing additional systems on top of these models, where an eval is crucial for understanding if RAG or prompt engineering tricks are paying off.&lt;/p&gt;
&lt;p&gt;Step 1 doesn't involve writing any code at all:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Meaningful, real problem spaces inevitably have a lot of &lt;em&gt;nuance&lt;/em&gt;. So in working on our SNAP eval, the first step has just been using lots of models — a lot. [...]&lt;/p&gt;
&lt;p&gt;Just using the models and taking notes on the nuanced “good”, “meh”, “bad!” is a much faster way to get to a useful starting eval set than writing or automating evals in code.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've been complaining for a while that there isn't nearly enough guidance about evals out there. This piece is an excellent step towards filling that gap.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/evals"&gt;evals&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/dave-guarino"&gt;dave-guarino&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="evals"/><category term="dave-guarino"/></entry><entry><title>Quoting Dave Guarino</title><link href="https://simonwillison.net/2023/Jul/26/dave-guarino/#atom-tag" rel="alternate"/><published>2023-07-26T19:10:01+00:00</published><updated>2023-07-26T19:10:01+00:00</updated><id>https://simonwillison.net/2023/Jul/26/dave-guarino/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://daveguarino.substack.com/p/what-might-llmsgenerative-ai-mean"&gt;&lt;p&gt;Much of the substance of what constitutes “government” is in fact text. A technology that can do orders of magnitude more with text is therefore potentially massively impactful here. [...] Many of the sub-tasks of the work of delivering public benefits seem amenable to the application of large language models to help people do this hard work.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://daveguarino.substack.com/p/what-might-llmsgenerative-ai-mean"&gt;Dave Guarino&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/dave-guarino"&gt;dave-guarino&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="dave-guarino"/></entry></feed>