<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: semanticweb</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/semanticweb.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2023-03-16T01:09:52+00:00</updated><author><name>Simon Willison</name></author><entry><title>Quoting Me</title><link href="https://simonwillison.net/2023/Mar/16/gpt4-scraping/#atom-tag" rel="alternate"/><published>2023-03-16T01:09:52+00:00</published><updated>2023-03-16T01:09:52+00:00</updated><id>https://simonwillison.net/2023/Mar/16/gpt4-scraping/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://fedi.simonwillison.net/@simon/110030289294541249"&gt;&lt;p&gt;I expect GPT-4 will have a LOT of applications in web scraping&lt;/p&gt;
&lt;p&gt;The increased 32,000 token limit will be large enough to send it the full DOM of most pages, serialized to HTML - then ask questions to extract data&lt;/p&gt;
&lt;p&gt;Or... take a screenshot and use the GPT4 image input mode to ask questions about the visually rendered page instead!&lt;/p&gt;
&lt;p&gt;Might need to dust off all of those old semantic web dreams, because the world's information is rapidly becoming fully machine readable&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://fedi.simonwillison.net/@simon/110030289294541249"&gt;Me&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/scraping"&gt;scraping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/semanticweb"&gt;semanticweb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-4"&gt;gpt-4&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="scraping"/><category term="semanticweb"/><category term="gpt-4"/><category term="llms"/></entry><entry><title>Linked Data at the Guardian</title><link href="https://simonwillison.net/2010/Oct/19/linked/#atom-tag" rel="alternate"/><published>2010-10-19T19:11:00+00:00</published><updated>2010-10-19T19:11:00+00:00</updated><id>https://simonwillison.net/2010/Oct/19/linked/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.guardian.co.uk/open-platform/blog/linked-data-open-platform"&gt;Linked Data at the Guardian&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The Guardian’s Open Platform API can now be queried by MusicBrainz ID and ISBN, opening up some extremely useful new types of query.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/guardian"&gt;guardian&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openplatform"&gt;openplatform&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/semanticweb"&gt;semanticweb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;&lt;/p&gt;



</summary><category term="guardian"/><category term="openplatform"/><category term="semanticweb"/><category term="recovered"/></entry><entry><title>4store Amazon Machine Image</title><link href="https://simonwillison.net/2009/Nov/1/4store/#atom-tag" rel="alternate"/><published>2009-11-01T12:12:24+00:00</published><updated>2009-11-01T12:12:24+00:00</updated><id>https://simonwillison.net/2009/Nov/1/4store/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://thinklinks.wordpress.com/2009/10/27/4store-amazon-machine-image-and-billion-triple-challenge-data-set/"&gt;4store Amazon Machine Image&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Instructions for firing up an EC2 AMI running the recently released 4store high performance triple store and loading in 1.14 billion statements collected by crawling the semantic web.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/4store"&gt;4store&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ami"&gt;ami&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ec2"&gt;ec2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/semanticweb"&gt;semanticweb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/semweb"&gt;semweb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/triplestore"&gt;triplestore&lt;/a&gt;&lt;/p&gt;



</summary><category term="4store"/><category term="ami"/><category term="ec2"/><category term="semanticweb"/><category term="semweb"/><category term="triplestore"/></entry><entry><title>Learning to Fear the Semantic Web</title><link href="https://simonwillison.net/2008/Oct/23/learning/#atom-tag" rel="alternate"/><published>2008-10-23T16:14:03+00:00</published><updated>2008-10-23T16:14:03+00:00</updated><id>https://simonwillison.net/2008/Oct/23/learning/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://ftrain.com/a-semantic-web-fear.html"&gt;Learning to Fear the Semantic Web&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Paul Ford raises the liability issue with regards to building sites around other people’s metadata, pointing out that OpenCalais is owned by Thomson Reuters who have a bad track record with regards to intellectual property lawsuits elsewhere in the organisation.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/intellectualproperty"&gt;intellectualproperty&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/opencalais"&gt;opencalais&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/paul-ford"&gt;paul-ford&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/semanticweb"&gt;semanticweb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/thomson-reuters"&gt;thomson-reuters&lt;/a&gt;&lt;/p&gt;



</summary><category term="intellectualproperty"/><category term="opencalais"/><category term="paul-ford"/><category term="semanticweb"/><category term="thomson-reuters"/></entry><entry><title>Quoting Kellan Elliott-McCrea</title><link href="https://simonwillison.net/2008/Sep/29/secretsauce/#atom-tag" rel="alternate"/><published>2008-09-29T15:29:34+00:00</published><updated>2008-09-29T15:29:34+00:00</updated><id>https://simonwillison.net/2008/Sep/29/secretsauce/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://laughingmeme.org/2008/09/29/on-the-freebase-custom-tuple-store-graphd/"&gt;&lt;p&gt;The only down side is everyone I’ve talked to at Freebase seems pretty solid on this being their proprietary secret sauce, because a good, fast scalable open source tuple store might actually jump start a real semantic (small-S) web after all these years.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://laughingmeme.org/2008/09/29/on-the-freebase-custom-tuple-store-graphd/"&gt;Kellan Elliott-McCrea&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/freebase"&gt;freebase&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/graphd"&gt;graphd&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kellan-elliott-mccrea"&gt;kellan-elliott-mccrea&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/proprietary"&gt;proprietary&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/semanticweb"&gt;semanticweb&lt;/a&gt;&lt;/p&gt;



</summary><category term="freebase"/><category term="graphd"/><category term="kellan-elliott-mccrea"/><category term="open-source"/><category term="proprietary"/><category term="semanticweb"/></entry><entry><title>Giant Global Graph</title><link href="https://simonwillison.net/2007/Nov/22/timbl/#atom-tag" rel="alternate"/><published>2007-11-22T00:30:23+00:00</published><updated>2007-11-22T00:30:23+00:00</updated><id>https://simonwillison.net/2007/Nov/22/timbl/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://dig.csail.mit.edu/breadcrumbs/node/215"&gt;Giant Global Graph&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Tim Berners-Lee points out that the Semantic Web is designed to solve problems such as portable social networks.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openid"&gt;openid&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/portablesocialnetworks"&gt;portablesocialnetworks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/semanticweb"&gt;semanticweb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/social-graph"&gt;social-graph&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tim-berners-lee"&gt;tim-berners-lee&lt;/a&gt;&lt;/p&gt;



</summary><category term="openid"/><category term="portablesocialnetworks"/><category term="semanticweb"/><category term="social-graph"/><category term="tim-berners-lee"/></entry><entry><title>dbpedia.org</title><link href="https://simonwillison.net/2007/Aug/7/dbpediaorg/#atom-tag" rel="alternate"/><published>2007-08-07T15:24:20+00:00</published><updated>2007-08-07T15:24:20+00:00</updated><id>https://simonwillison.net/2007/Aug/7/dbpediaorg/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://dbpedia.org/docs/"&gt;dbpedia.org&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
They scrape Wikipedia and extract useful information from it so you don’t have to.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/dbpedia"&gt;dbpedia&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/semanticweb"&gt;semanticweb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/wikipedia"&gt;wikipedia&lt;/a&gt;&lt;/p&gt;



</summary><category term="dbpedia"/><category term="semanticweb"/><category term="wikipedia"/></entry><entry><title>Triplr</title><link href="https://simonwillison.net/2007/Mar/30/triplr/#atom-tag" rel="alternate"/><published>2007-03-30T15:30:15+00:00</published><updated>2007-03-30T15:30:15+00:00</updated><id>https://simonwillison.net/2007/Mar/30/triplr/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://triplr.org/"&gt;Triplr&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Ultra simple GET-based web service for converting RSS / Atom / RDF / Microformats+GRDDL to HTML / ntriples / RDF / RSS / JSON / Turtle. Small pieces, loosely joined.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/atom"&gt;atom&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/grddl"&gt;grddl&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/html"&gt;html&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/json"&gt;json&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/microformats"&gt;microformats&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ntriples"&gt;ntriples&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rdf"&gt;rdf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rss"&gt;rss&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/semanticweb"&gt;semanticweb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/triplr"&gt;triplr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/turtle"&gt;turtle&lt;/a&gt;&lt;/p&gt;



</summary><category term="atom"/><category term="grddl"/><category term="html"/><category term="json"/><category term="microformats"/><category term="ntriples"/><category term="rdf"/><category term="rss"/><category term="semanticweb"/><category term="triplr"/><category term="turtle"/></entry></feed>