Finding Feld's Finest Thoughts

I wanted to read Brad Feld's best blog posts, but where to begin? He's incredibly prolific, having racked up thousands of posts in the past eight years. I certainly didn't have the time to comb through that many posts in search of the best, so I decided to see if I could hack something together that would use existing, public social proof to help me find the diamonds.  My first thought was that I'd use the PostRank API, but  it no longer existed, so I decided to roll my own social scoring algorithm. If you just want to see Brad's best posts, scroll to the bottom of this article.

If you want to see how I made the sausage, keep reading.  It's a quick and dirty implementation, but the general concepts are sound.


Step 1: Get a complete list of blog posts

There are a lot of ways to do this, but I decided to start with Brad's sitemap, which was located by looking in robots.txt. As such, I wrote a short script to download that sitemap, uncompress it, fetch the additional sitemaps that it referred to, and find the posts I was interested in. I processed the XML with sed and grep to get the post URLS while filtering what appeared to be the artifacts of a now-resolved intrusion on

TMP=`mktemp -t $0.$$.XXXXXXXXXX`
function clean {
 rm $TMP
wget -q -O $TMP
gzcat $TMP | grep loc | sed -e 's/.*<loc>//' -e 's/<\/loc>//' | xargs -n 1 wget -q -O /dev/stdout {} \
 | gzcat - | grep loc | sed -e 's/.*<loc>//' -e 's/<\/loc>//' \
 | egrep '^[0-9]{3}/[0-9]{2}/' | egrep 'html$' > feldurls.txt

This generated a file, feldurls.txt with the URLs of 4,636 posts.

Step 2: Gather social data

I wrote a short Ruby script to count the number of times each of his articles was shared on Twitter, LinkedIn, Delicious, and Facebook, and to save that off into a JSON blob for later processing. Here's the script I used. I started it, and then went for a short hike.

#!/usr/bin/env ruby
require 'rubygems'
require 'curb'
require 'CGI'
require 'json'
urls = {}
ARGF.each_with_index do |url, i|
 $stderr.puts "#{i}: #{url}"
 urls[url] = {}

 urls[url][:year] = url.match(/archives\/([0-9]{4})\/[0-9]{2}/)[1].to_i
# Twitter
 twitter_url = "{CGI::escape(url.chomp)}"
 c = do |curl|
 curl.headers["User-Agent"] = "feldfinder-1.0"
 urls[url][:twitter] = JSON.parse(c.body_str)['count']
# Facebook
 fb_url = "{url.chomp}"
 c = do |curl|
 curl.headers["User-Agent"] = "feldfinder-1.0"
 urls[url][:fb] = JSON.parse(c.body_str)['shares'].nil? ? 0 : JSON.parse(c.body_str)['shares']
# LinkedIn
 linked_url = "{CGI::escape(url.chomp)}&format=json"
 c = do |curl|
 curl.headers["User-Agent"] = "feldfinder-1.0"
 urls[url][:linked] = JSON.parse(c.body_str)['count']
# Delicious
 delicious_url = "{CGI::escape(url.chomp)}"
 c = do |curl|
 curl.headers["User-Agent"] = "feldfinder-1.0"
 urls[url][:delicious] = JSON.parse(c.body_str)[0]['total_posts'] rescue 0
$stderr.puts urls[url].to_json

 rescue => e
 $stderr.puts "ERROR: #{e}"
puts urls.to_json

Step 3: Convert the data

I arrived home from my hike, and found Step 2 had completed execution, and produced a JSON object that had the year of publishing and number of shares on each site for every URL. I decided to convert this to CSV so I could manipulate it in a spreadsheet

#!/usr/bin/env ruby
require 'rubygems'
require 'json'
urls = JSON.parse("feldout.json"))
puts "url,year,twitter,fb,linked,delicious"
urls.each do |k, v|
 puts "#{k.chomp},#{v['year']},#{v['twitter']},#{v['fb']},#{v['linked']},#{v['delicious']}"

That gave me a CSV, which I uploaded to Google Docs.

Step 4: Scoring the data

After uploading the file to Google Docs, I decided to create a score for each post. My goal here was to find the simplest method that would work reasonably well in this application. Counting raw shares wouldn't be useful because items from 2012 were, naturally, going to have been shared far more often than those from 2004, regardless of quality. The formula I decided upon was:

(post's raw twitter shares) / (average number of twitter shares for that year) + (post's raw facebook shares)  / (average number of facebook shares for that year) + (post's raw linkedin shares)  / (average number of linkedin shares for that year) + (post's raw delicious shares) / (average number of delicious shares for that year)


Brad Feld's Fifteen Finest posts are:

15. The torturous world of powerpoint [2004]

14. Venture Capital deal algebra [2004]

13. Discovering work life balance [2005]

12. The difference between Christmas and Chanukah [2005]

11. What's the best structure for a pre-VC investment [2006]

10. How convertible debt works [2011]

9. Sample board meeting minutes [2006]

8. The best board meetings [2009]

7. Zynga Texas Holdem Poker on MySpace [2008]

6. Fear is the mindkiller [2007]

5. The Treadputer [2006] 

4. Term sheet series wrap up [2005]

3. Why most VCs don't sign NDAs [2006]

2. CTO vs VP Engineering [2007]

1. Revisiting the term sheet [2008]

And here is the complete Google Doc.

Is that all?

I did this because I wanted to find some good Brad Feld posts, but the general approach is also useful when evaluating competitors, and others for whom direct analytics aren't available.