<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: computer-vision</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/computer-vision.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2023-04-28T14:24:29+00:00</updated><author><name>Simon Willison</name></author><entry><title>Trainbot</title><link href="https://simonwillison.net/2023/Apr/28/trainbot/#atom-tag" rel="alternate"/><published>2023-04-28T14:24:29+00:00</published><updated>2023-04-28T14:24:29+00:00</updated><id>https://simonwillison.net/2023/Apr/28/trainbot/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/jo-m/trainbot"&gt;Trainbot&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
“Trainbot watches a piece of train track, detects passing trains, and stitches together images of them”—check out the site itself too, which shows beautifully stitched panoramas of trains that have recently passed near Jo M’s apartment. Found via the best Hacker News thread I’ve seen in years, “Ask HN: Most interesting tech you built for just yourself?”.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=35729232"&gt;Ask HN: Most interesting tech you built for just yourself?&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/go"&gt;go&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hacks"&gt;hacks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/raspberry-pi"&gt;raspberry-pi&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-vision"/><category term="go"/><category term="hacks"/><category term="raspberry-pi"/></entry><entry><title>LLaVA: Large Language and Vision Assistant</title><link href="https://simonwillison.net/2023/Apr/19/llava-large-language-and-vision-assistant/#atom-tag" rel="alternate"/><published>2023-04-19T01:14:37+00:00</published><updated>2023-04-19T01:14:37+00:00</updated><id>https://simonwillison.net/2023/Apr/19/llava-large-language-and-vision-assistant/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://llava-vl.github.io/"&gt;LLaVA: Large Language and Vision Assistant&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Yet another multi-modal model combining a vision model (pre-trained CLIP ViT-L/14) and a LLaMA derivative model (Vicuna). The results I get from their demo are even more impressive than MiniGPT-4. Also includes a new training dataset, LLaVA-Instruct-150K, derived from GPT-4 and subject to the same warnings about the OpenAI terms of service.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=35621023"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vicuna"&gt;vicuna&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-vision"/><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="llms"/><category term="vicuna"/></entry><entry><title>MiniGPT-4</title><link href="https://simonwillison.net/2023/Apr/17/minigpt-4/#atom-tag" rel="alternate"/><published>2023-04-17T14:21:40+00:00</published><updated>2023-04-17T14:21:40+00:00</updated><id>https://simonwillison.net/2023/Apr/17/minigpt-4/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Vision-CAIR/MiniGPT-4"&gt;MiniGPT-4&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
An incredible project with a poorly chosen name. A team from King Abdullah University of Science and Technology in Saudi Arabia combined Vicuna-13B (a model fine-tuned on top of Facebook’s LLaMA) with the BLIP-2 vision-language model to create a model that can conduct ChatGPT-style conversations around an uploaded image. The demo is very impressive, and the weights are available to download—45MB for MiniGPT-4, but you’ll need the much larger Vicuna and LLaMA weights as well.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=35598281"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vicuna"&gt;vicuna&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-vision"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="vicuna"/></entry><entry><title>jantic/DeOldify</title><link href="https://simonwillison.net/2018/Nov/2/deoldify/#atom-tag" rel="alternate"/><published>2018-11-02T11:13:02+00:00</published><updated>2018-11-02T11:13:02+00:00</updated><id>https://simonwillison.net/2018/Nov/2/deoldify/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/jantic/DeOldify"&gt;jantic/DeOldify&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
“A Deep Learning based project for colorizing and restoring old images”. Delightful (and well documented) project that uses a Self-Attention Generative Adversarial Network to colorize old black and white photos, with extremely impressive results. Built on an older version of the fastai library, and trained by running for several days on a 1080TI graphics card.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://www.reddit.com/r/MachineLearning/comments/9tcfls/p_introducing_deoldify_a_progressive/"&gt;r/MachineLearning&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fastai"&gt;fastai&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-vision"/><category term="machine-learning"/><category term="fastai"/></entry><entry><title>Automatically playing science communication games with transfer learning and fastai</title><link href="https://simonwillison.net/2018/Oct/29/transfer-learning/#atom-tag" rel="alternate"/><published>2018-10-29T03:16:33+00:00</published><updated>2018-10-29T03:16:33+00:00</updated><id>https://simonwillison.net/2018/Oct/29/transfer-learning/#atom-tag</id><summary type="html">
    &lt;p&gt;This weekend was the 9th annual &lt;a href="https://sf.sciencehackday.org/"&gt;Science Hack Day San Francisco&lt;/a&gt;, which was also the 100th Science Hack Day held worldwide.&lt;/p&gt;
&lt;p&gt;Natalie and I decided to combine our interests and build something fun.&lt;/p&gt;
&lt;p&gt;I’m currently enrolled in Jeremy Howard’s &lt;a href="http://course.fast.ai/"&gt;Deep Learning course&lt;/a&gt; so I figured this was a great opportunity to try out some computer vision.&lt;/p&gt;
&lt;p&gt;Natalie runs the &lt;a href="https://natbat.github.io/scicomm-calendar/"&gt;SciComm Games calendar&lt;/a&gt; and accompanying &lt;a href="https://twitter.com/SciCommGames"&gt;@SciCommGames&lt;/a&gt; bot to promote and catalogue science communication hashtag games on Twitter.&lt;/p&gt;
&lt;p&gt;Hashtag games? Natalie &lt;a href="https://natbat.github.io/scicomm-calendar/"&gt;explains them here&lt;/a&gt; - essentially they are games run by scientists on Twitter to foster public engagement around an animal or topic by challenging people to identify if a photo is a #cougarOrNot or participate in a #TrickyBirdID or identify #CrowOrNo or many others.&lt;/p&gt;
&lt;p&gt;Combining the two… we decided to build a bot that automatically plays these games using computer vision. So far it’s just trying #cougarOrNot - you can see the bot in action at &lt;a href="https://twitter.com/critter_vision/with_replies"&gt;@critter_vision&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Training_data_from_iNaturalist_14"&gt;&lt;/a&gt;Training data from iNaturalist&lt;/h3&gt;
&lt;p&gt;In order to build a machine learning model, you need to start out with some training data.&lt;/p&gt;
&lt;p&gt;I’m a big fan of &lt;a href="https://www.inaturalist.org/"&gt;iNaturalist&lt;/a&gt;, a citizen science project that encourages users to upload photographs of wildlife (and plants) they have seen and have their observations verified by a community. Natalie and I used it to build &lt;a href="https://www.owlsnearme.com/"&gt;owlsnearme.com&lt;/a&gt; earlier this year - the API in particular is fantastic.&lt;/p&gt;
&lt;p&gt;iNaturalist has &lt;a href="https://www.inaturalist.org/observations?place_id=1&amp;amp;taxon_id=41944"&gt;over 5,000 verified sightings&lt;/a&gt; of felines (cougars, bobcats, domestic cats and more) in the USA.&lt;/p&gt;
&lt;p&gt;The raw data is available as &lt;a href="http://api.inaturalist.org/v1/observations?identified=true&amp;amp;photos=true&amp;amp;identifications=most_agree&amp;amp;quality_grade=research&amp;amp;order=desc&amp;amp;order_by=created_at&amp;amp;taxon_id=41944&amp;amp;place_id=1&amp;amp;per_page=200"&gt;a paginated JSON API&lt;/a&gt;. The &lt;a href="https://static.inaturalist.org/photos/27333309/medium.jpg"&gt;medium sized photos&lt;/a&gt; are just the right size for training a neural network.&lt;/p&gt;
&lt;p&gt;I started by grabbing 5,000 images and saving them to disk with a filename that reflected their identified species:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Bobcat_9005106.jpg
Domestic-Cat_10068710.jpg
Bobcat_15713672.jpg
Domestic-Cat_6755280.jpg
Mountain-Lion_9075705.jpg
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;a id="Building_a_model_32"&gt;&lt;/a&gt;Building a model&lt;/h3&gt;
&lt;p&gt;I’m only one week into the &lt;a href="http://www.fast.ai/"&gt;fast.ai&lt;/a&gt; course so this really isn’t particularly sophisticated yet, but it was just about good enough to power our hack.&lt;/p&gt;
&lt;p&gt;The main technique we are learning in the course is called &lt;a href="https://machinelearningmastery.com/transfer-learning-for-deep-learning/"&gt;transfer learning&lt;/a&gt;, and it really is shockingly effective. Instead of training a model from scratch you start out with a pre-trained model and use some extra labelled images to train a small number of extra layers.&lt;/p&gt;
&lt;p&gt;The initial model we are using is &lt;a href="https://www.kaggle.com/pytorch/resnet34"&gt;ResNet-34&lt;/a&gt;, a 34-layer neural network trained on 1,000 labelled categories in the &lt;a href="http://www.image-net.org/"&gt;ImageNet&lt;/a&gt; corpus.&lt;/p&gt;
&lt;p&gt;In class, we learned to use this technique to get 94% accuracy against the &lt;a href="http://www.robots.ox.ac.uk/~vgg/data/pets/"&gt;Oxford-IIIT Pet Dataset&lt;/a&gt; - around 7,000 images covering 12 cat breeds and 25 dog breeds. In 2012 the researchers at Oxford were able to get 59.21% using a sophisticated model - it 2018 we can get 94% with transfer learning and just a few lines of code.&lt;/p&gt;
&lt;p&gt;I started with an example provided in class, which loads and trains images from files on disk using a regular expression that extracts the labels from the filenames.&lt;/p&gt;
&lt;p&gt;My full Jupyter notebook is &lt;a href="https://github.com/simonw/cougar-or-not/blob/master/inaturalist-cats.ipynb"&gt;inaturalist-cats.ipynb&lt;/a&gt; - the key training code is as follows:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from fastai import *
from fastai.vision import *
cat_images_path = Path('/home/jupyter/.fastai/data/inaturalist-usa-cats/images')
cat_fnames = get_image_files(cat_images_path)
cat_data = ImageDataBunch.from_name_re(
    cat_images_path,
    cat_fnames,
    r'/([^/]+)_\d+.jpg$',
    ds_tfms=get_transforms(),
    size=224
)
cat_data.normalize(imagenet_stats)
cat_learn = ConvLearner(cat_data, models.resnet34, metrics=error_rate)
cat_learn.fit_one_cycle(4)
# Save the generated model to disk
cat_learn.save(&amp;quot;usa-inaturalist-cats&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Calling &lt;code&gt;cat_learn.save(&amp;quot;usa-inaturalist-cats&amp;quot;)&lt;/code&gt; created an 84MB file on disk at &lt;code&gt;/home/jupyter/.fastai/data/inaturalist-usa-cats/images/models/usa-inaturalist-cats.pth&lt;/code&gt; - I used &lt;code&gt;scp&lt;/code&gt; to copy that model down to my laptop.&lt;/p&gt;
&lt;p&gt;This model gave me a 24% error rate which is pretty terrible - others on the course have been getting error rates less than 10% for all kinds of interesting problems. My focus was to get a model deployed as an API though so I haven’t spent any additional time fine-tuning things yet.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Deploying_the_model_as_an_API_67"&gt;&lt;/a&gt;Deploying the model as an API&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://github.com/fastai/fastai"&gt;fastai library&lt;/a&gt; strongly encourages training against a GPU, using &lt;a href="https://pytorch.org/"&gt;pytorch&lt;/a&gt; and &lt;a href="https://mathema.tician.de/software/pycuda/"&gt;PyCUDA&lt;/a&gt;. I’ve been using n1-highmem-8 Google Cloud Platform instance with an attached Tesla P4, then running everything in a Jupyter notebook there. This costs around $0.38 an hour - fine for a few hours of training, but way too expensive to permanently host a model.&lt;/p&gt;
&lt;p&gt;Thankfully, while a GPU is essential for productively training models it’s not nearly as important for evaluating them against new data. pytorch can run in CPU mode for that just fine on standard hardware, and the &lt;a href="https://github.com/fastai/fastai/blob/master/README.md"&gt;fastai README&lt;/a&gt; includes instructions on installing it for a CPU using pip.&lt;/p&gt;
&lt;p&gt;I started out by ensuring I could execute my generated model on my own laptop (since pytorch doesn’t yet work with the GPU built into the Macbook Pro). Once I had that working, I used the resulting code to write a tiny Starlette-powered API server. The code for that can be found in &lt;a href="https://github.com/simonw/cougar-or-not/blob/8adafac571aad3385317c76bd229448b3cdaa0ac/cougar.py"&gt;in cougar.py&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;fastai is under very heavy development and the latest version doesn’t quite have a clean way of loading a model from disk without also including the initial training images, so I had to hack around quite a bit to get this working using clues from &lt;a href="https://forums.fast.ai/"&gt;the fastai forums&lt;/a&gt;. I expect this to get much easier over the next few weeks as the library continues to evolve based on feedback from the current course.&lt;/p&gt;
&lt;p&gt;To deploy the API I wrote &lt;a href="https://github.com/simonw/cougar-or-not/blob/8adafac571aad3385317c76bd229448b3cdaa0ac/Dockerfile"&gt;a Dockerfile&lt;/a&gt; and shipped it to &lt;a href="https://zeit.co/now"&gt;Zeit Now&lt;/a&gt;. Now remains my go-to choice for this kind of project, though unfortunately their new (and brilliant) v2 platform imposes &lt;a href="https://github.com/zeit/now-cli/issues/1523"&gt;a 100MB image size limit&lt;/a&gt; - not nearly enough when the model file itself weights in at 83 MB. Thankfully it’s still possible to &lt;a href="https://github.com/simonw/cougar-or-not/commit/5ad3d5b49c6419e4c2440291bc5fb204625aae83"&gt;specify their v1 cloud&lt;/a&gt; which is more forgiving for larger applications.&lt;/p&gt;
&lt;p&gt;Here’s the result: an API which can accept either the URL to an image or an uploaded image file: &lt;a href="https://cougar-or-not.now.sh/"&gt;https://cougar-or-not.now.sh/&lt;/a&gt; - try it out with &lt;a href="https://cougar-or-not.now.sh/classify-url?url=https://upload.wikimedia.org/wikipedia/commons/9/9a/Oregon_Cougar_ODFW.JPG"&gt;a cougar&lt;/a&gt; and &lt;a href="https://cougar-or-not.now.sh/classify-url?url=https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/Bobcat2.jpg/1200px-Bobcat2.jpg"&gt;a bobcat&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;a id="The_Twitter_Bot_81"&gt;&lt;/a&gt;The Twitter Bot&lt;/h3&gt;
&lt;p&gt;Natalie built &lt;a href="https://github.com/natbat/CritterVision"&gt;the Twitter bot&lt;/a&gt;. It runs as a scheduled task on Heroku and works by checking for new #cougarOrNot tweets from &lt;a href="https://twitter.com/drmichellelarue"&gt;Dr. Michelle LaRue&lt;/a&gt;, extracting any images, passing them to my API and replying with a tweet that summarizes the results. Take a look at &lt;a href="https://twitter.com/critter_vision/with_replies"&gt;its recent replies&lt;/a&gt; to get a feel for how well it is doing.&lt;/p&gt;
&lt;p&gt;Amusingly, Dr. LaRue frequently tweets memes to promote upcoming competitions and marks them with the same hashtag. The bot appears to think that most of the memes are bobcats! I should definitely spend some time tuning that model.&lt;/p&gt;
&lt;p&gt;Science Hack Day was great fun. A big thanks to the organizing team, and congrats to all of the other participants. I’m really looking forward to the next one.&lt;/p&gt;
&lt;p&gt;Plus… we won a medal!&lt;/p&gt;
&lt;blockquote class="twitter-tweet" data-lang="en"&gt;&lt;p lang="en" dir="ltr"&gt;Enjoyed &lt;a href="https://twitter.com/hashtag/scienceHackday?src=hash&amp;amp;ref_src=twsrc%5Etfw"&gt;#scienceHackday&lt;/a&gt; this weekend, made &amp;amp; launched a cool machine learning hack to process images &amp;amp; work out if they have a cougar in them or not! &lt;a href="https://twitter.com/hashtag/CougarOrNot?src=hash&amp;amp;ref_src=twsrc%5Etfw"&gt;#CougarOrNot&lt;/a&gt; &lt;a href="https://twitter.com/critter_vision?ref_src=twsrc%5Etfw"&gt;@critter_vision&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;... we won a medal!&lt;br /&gt;&lt;br /&gt;Bot code: &lt;a href="https://t.co/W2jZcGCnFr"&gt;https://t.co/W2jZcGCnFr&lt;/a&gt;&lt;br /&gt;Machine learning API: &lt;a href="https://t.co/swNiKlcTp0"&gt;https://t.co/swNiKlcTp0&lt;/a&gt; &lt;a href="https://t.co/dcdIhNZy63"&gt;pic.twitter.com/dcdIhNZy63&lt;/a&gt;&lt;/p&gt;&amp;#8212; Natbat (@Natbat) &lt;a href="https://twitter.com/Natbat/status/1056717060116369410?ref_src=twsrc%5Etfw"&gt;October 29, 2018&lt;/a&gt;&lt;/blockquote&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/natalie-downe"&gt;natalie-downe&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/inaturalist"&gt;inaturalist&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fastai"&gt;fastai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transferlearning"&gt;transferlearning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jeremy-howard"&gt;jeremy-howard&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/starlette"&gt;starlette&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="computer-vision"/><category term="machine-learning"/><category term="natalie-downe"/><category term="inaturalist"/><category term="fastai"/><category term="transferlearning"/><category term="jeremy-howard"/><category term="starlette"/></entry><entry><title>BearID: Bear Face Detector</title><link href="https://simonwillison.net/2018/Mar/1/bearid/#atom-tag" rel="alternate"/><published>2018-03-01T17:31:41+00:00</published><updated>2018-03-01T17:31:41+00:00</updated><id>https://simonwillison.net/2018/Mar/1/bearid/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://medium.com/@bluevalhalla/bearid-bear-face-detector-7cc43fc12ab6"&gt;BearID: Bear Face Detector&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Comprehensive tutorial on building a computer vision system to identify faces of bears, using dlib and the  Histogram of Oriented Gradients (HOG) technique. Bears!


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-vision"/><category term="machine-learning"/></entry><entry><title>Family fun with deepfakes. Or how I got my wife onto the Tonight Show</title><link href="https://simonwillison.net/2018/Feb/2/family-fun-with-deepfakes/#atom-tag" rel="alternate"/><published>2018-02-02T16:06:36+00:00</published><updated>2018-02-02T16:06:36+00:00</updated><id>https://simonwillison.net/2018/Feb/2/family-fun-with-deepfakes/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://svencharleer.com/blog/2018/02/02/family-fun-with-deepfakes-or-how-i-got-my-wife-onto-the-tonight-show/"&gt;Family fun with deepfakes. Or how I got my wife onto the Tonight Show&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
deepfakes is dystopian nightmare technology: take a few thousand photos of two different people with similar shaped faces and you can produce an extremely realistic video where you swap one person’s face for the other.  Unsurprisingly it’s being used for porn. This is a pleasantly SFW explanation of how it works, complete with a demo where Sven Charleer swaps his wife Elke for Anne Hathaway on the Tonight Show.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-vision"/></entry><entry><title>6M observations total! Where has iNaturalist grown in 80 days with 1 million new observations?</title><link href="https://simonwillison.net/2018/Jan/28/inaturalist/#atom-tag" rel="alternate"/><published>2018-01-28T20:18:58+00:00</published><updated>2018-01-28T20:18:58+00:00</updated><id>https://simonwillison.net/2018/Jan/28/inaturalist/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.inaturalist.org/blog/11590-6m-observations-total-where-has-inaturalist-grown-in-80-days-with-1-million-new-observations"&gt;6M observations total! Where has iNaturalist grown in 80 days with 1 million new observations?&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Citizen science app iNaturalist is seeing explosive growth at the moment—they’ve been around for nearly a decade but 1/6 of the observations posted to the site were added in just the past few months. Having tried the latest version of their iPhone app it’s easy to see why: snap a photo of some nature and upload it to the app and it will use surprisingly effective machine learning to suggest the genus or even the individual species. Submit the observation and within a few minutes other iNaturalist community members will confirm the identification or suggest a correction. It’s brilliantly well executed and an utter delight to use.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/crowdsourcing"&gt;crowdsourcing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/science"&gt;science&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/citizenscience"&gt;citizenscience&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/inaturalist"&gt;inaturalist&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-vision"/><category term="crowdsourcing"/><category term="machine-learning"/><category term="science"/><category term="citizenscience"/><category term="inaturalist"/></entry><entry><title>How to train your own Object Detector with TensorFlow’s Object Detector API</title><link href="https://simonwillison.net/2017/Nov/14/how-to-train-your-own-object-detector/#atom-tag" rel="alternate"/><published>2017-11-14T04:24:48+00:00</published><updated>2017-11-14T04:24:48+00:00</updated><id>https://simonwillison.net/2017/Nov/14/how-to-train-your-own-object-detector/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9"&gt;How to train your own Object Detector with TensorFlow’s Object Detector API&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Dat Tran built a TensorFlow model that can detect raccoons! Impressive results, especially given it was only trained on 200 raccoon images from Google Image search.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://github.com/datitran/raccoon_dataset"&gt;GitHub - datitran/raccoon_dataset: The dataset is used to train my own raccoon detector and I blogged about it on Medium&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tensorflow"&gt;tensorflow&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/raccoons"&gt;raccoons&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-vision"/><category term="tensorflow"/><category term="raccoons"/></entry><entry><title>PythonInterface - OpenCV</title><link href="https://simonwillison.net/2010/Jan/4/opencv/#atom-tag" rel="alternate"/><published>2010-01-04T11:33:28+00:00</published><updated>2010-01-04T11:33:28+00:00</updated><id>https://simonwillison.net/2010/Jan/4/opencv/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://opencv.willowgarage.com/wiki/PythonInterface"&gt;PythonInterface - OpenCV&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenCV’s new Python interface looks very nice. I’d love to see some full fledged examples of using it to solve real-world computer vision problems.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://delicious.com/mcroydon/opencv+python+reference"&gt;Matt Croydon&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/opencv"&gt;opencv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-vision"/><category term="opencv"/><category term="python"/></entry><entry><title>Looking for tennis courts on aerial photos</title><link href="https://simonwillison.net/2009/Dec/5/wow/#atom-tag" rel="alternate"/><published>2009-12-05T08:56:18+00:00</published><updated>2009-12-05T08:56:18+00:00</updated><id>https://simonwillison.net/2009/Dec/5/wow/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://ahathereitis.blogspot.com/2009/12/how-it-works.html"&gt;Looking for tennis courts on aerial photos&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
ahathereitis.com shows a map of tennis courts in the Bay Area, identified using computer vision techniques (with OpenCV) applied to satellite photos.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/opencv"&gt;opencv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/satellite"&gt;satellite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tennis"&gt;tennis&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-vision"/><category term="opencv"/><category term="satellite"/><category term="tennis"/></entry><entry><title>PhotoSketch turns a rough sketch in to a photo montage</title><link href="https://simonwillison.net/2009/Oct/6/photosketch/#atom-tag" rel="alternate"/><published>2009-10-06T07:59:20+00:00</published><updated>2009-10-06T07:59:20+00:00</updated><id>https://simonwillison.net/2009/Oct/6/photosketch/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://gizmodo.com/5374890/this-is-a-photoshop-and-it-blew-my-mind"&gt;PhotoSketch turns a rough sketch in to a photo montage&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Computer vision is really exciting at the moment—Photosketch is an application which takes a rough labeled sketch, finds images matching the labels, filters them by the sketched shapes and composes them in to a not-too-bad photo montage. As wmf on Hacker News points out, “this technology has epic potential in the LOLcat market”.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://news.ycombinator.com/item?id=863294"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/photos"&gt;photos&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/photosketch"&gt;photosketch&lt;/a&gt;&lt;/p&gt;



</summary><category term="computer-vision"/><category term="photos"/><category term="photosketch"/></entry><entry><title>Building Rome in a Day</title><link href="https://simonwillison.net/2009/Jul/29/building/#atom-tag" rel="alternate"/><published>2009-07-29T15:41:03+00:00</published><updated>2009-07-29T15:41:03+00:00</updated><id>https://simonwillison.net/2009/Jul/29/building/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://grail.cs.washington.edu/rome/"&gt;Building Rome in a Day&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
“The ﬁrst system capable of city-scale reconstruction from unstructured photo collections”—computer vision techniques used to construct 3D models of cities using 10s of thousands of photos from Flickr. Reminiscent of Microsoft PhotoSynth.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://digitalurban.blogspot.com/2009/07/building-rome-in-day-3d-city-via-flickr.html"&gt;Digital Urban&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/3d"&gt;3d&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/computer-vision"&gt;computer-vision&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/flickr"&gt;flickr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/photos"&gt;photos&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/photosynth"&gt;photosynth&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/research"&gt;research&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rome"&gt;rome&lt;/a&gt;&lt;/p&gt;



</summary><category term="3d"/><category term="computer-vision"/><category term="flickr"/><category term="photos"/><category term="photosynth"/><category term="research"/><category term="rome"/></entry></feed>