Navigate / search

Delete old tweets selectively using Python and Tweepy

For some time I’ve used an online service to delete tweets that are more than one week old. I do this because I use Twitter for levity, for throwaway comments and retweets on issues of the day, and I don’t really want those saved for posterity. Thanks to search crawlers and caches I can never be certain that tweets are gone forever, but this is a small step in that direction.

When I joined Keybase I discovered that I needed to prevent my ‘proof’ tweet from being deleted, and the simple method used by the online deletion service was no longer an option. My solution uses an exception list containing the IDs of the tweets I wish to save, and these are ignored when their contemporaries are merged with the infinite.

I’ve written a Python script that uses Tweepy to scan the contents of my timeline and delete any tweet that meets two criteria – more than seven days old and not in my exception list. It’s very simple, there are probably better ways of doing it (please let me know), but it works well for me as a nightly cron job.

Please note that since I’ve been deleting my old tweets this way for some time I’ve never had issues with the Twitter API rate limits. Every deletion is an API call, so if you have many tweets you may need to consider initially limiting the number returned via the .items() method. This is demonstrated in the Tweepy cursor tutorial.

To get the required authentication keys you will need to register a Twitter application.

import tweepy
from datetime import datetime, timedelta

# options
test_mode = True
verbose = True
days_to_keep = 7
tweets_to_save = [
	573245340398170114, # the keybase proof tweet
	573395137637662721, # a tweet to this very post

# auth
consumer_key = 'xxxxxxxxxxxxxxxxxxxxxx'
consumer_secret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
access_token = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
access_token_secret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

print "Retrieving timeline tweets"

# get all tweets
api = tweepy.API(auth)
timeline = tweepy.Cursor(api.user_timeline).items()

# set cutoff date, use utc to match twitter
cutoff_date = datetime.utcnow() - timedelta(days=days_to_keep)

deletion_count = 0
ignored_count = 0

for tweet in timeline:
	# where tweets are not in save list and older than cutoff date
	if not in tweets_to_save and tweet.created_at < cutoff_date:
		if verbose:
			print "Deleting %d: [%s] %s" % (, tweet.created_at, tweet.text)
		if not test_mode:
		deletion_count += 1
		ignored_count += 1

print "Deleted %d tweets" % (deletion_count)
print "Ignored %d tweets" % (ignored_count)

Future work

Rather than maintain an exception list there might be a way to prevent tweets that the user has favourited from being deleted. If I ever want more than a handful of tweets to be saved and/or control the list from Twitter itself this would be a nice way to do it.

Favourite songs of 2014

My top thirty tracks of the year. Starts with my top ten, the rest are ordered by (Spotify) track length.

Spotify playlist
YouTube playlist

“I’m wearing Win Butler’s hair
There’s a scalpless singer of a Montreal rock band somewhere
And he’s all right”

  1. Jessica Lea Mayfield – Do I Have The Time
  2. Röyksopp & Robyn – Monument (The Inevitable End Version)
  3. Happyness – Montreal Rock Band Somewhere
  4. St. Vincent – Birth In Reverse
  5. La Roux – Kiss And Not Tell
  6. Bombay Bicycle Club – Luna
  7. Alvvays – Archie, Marry Me
  8. Eleanor Dunlop – Disguise
  9. The Preatures – Better Than It Ever Could Be
  10. Bertie Blackman – War Of One
  • Bad//Dreems – Dumb Ideas
  • The Bohicas – XXX
  • Broods – Mother & Father
  • East – Your Ghost
  • St. Vincent – Digital Witness
  • The Griswolds – Beware The Dog
  • The Preatures – Somebody’s Talking
  • Highasakite – Darth Vader
  • Ecca Vandal – White Flag
  • Kimbra – 90s Music
  • First Aid Kit – My Silver Lining
  • Jack White – Lazaretto
  • CHVRCHES – Bela Lugosi’s Dead
  • The Babe Rainbow – Secret Enchanted Broccoli Forest
  • Superfood – Right On Satellite
  • Interpol – All The Rage Back Home
  • City Calm Down – Pavement
  • Glass Animals – Pools
  • Lana Del Rey – Shades Of Cool

Honourable mention:

My top thirty are those I can play over and over again, and although this one doesn’t qualify on that count it’s the most entertaining song of 2014. And it did us a favour by allowing us to listen to the catchy tune of the original without being subjected to its lyrics.

Use Getflix or Unblock-Us servers selectively with Dnsmasq

I subscribe to Getflix, which is quite similar to Unblock-Us in that it allows users to access geo-blocked content. The basic method to use these services is to set one’s device to use their provided DNS servers, but this sends all DNS requests their way. I wanted only to use their DNS servers to resolve specific geo-blocked URLs.

There are a couple of reasons you might want to do this – you may be concerned about yet another party being privy to your site visits, and in my case I wanted to retain the faster, closer DNS servers provided by my ISP for the majority of my web requests.

Dnsmasq is present in several flavours of custom firmware available for many consumer routers, but since that was unavailable to me I have set it up on my NAS, which runs the Ubuntu-server linux distro. There are many guides for setting up Dnsmasq on many systems (for me it was as easy as “sudo apt-get install dnsmasq”), so I’ll just stick to explaining why I’ve configured it as I have.

Here is my Dnsmasq configuration file. Much of this isn’t necessary for this goal but I’ve kept it intact for context. I’ll go through why I’ve made certain decisions and it may help someone else.

# /etc/dnsmasq.conf

# regular dns servers (IPs redacted)

# getflix primary dns

# getflix secondary dns

# settings
interface=em1       # accept requests from the em1 interface
bogus-priv          # don't forward non-routable (local) addresses
domain-needed       # don't forward incomplete hostnames (names without dots)
no-resolv           # don't read /etc/resolv.conf to get upstream servers
all-servers         # use all servers, use the first returned
#strict-order       # query servers in the order they appear
domain=local        # set the domain name of this network
local=/local/       # set selected domains to only resolve locally
expand-hosts        # add our domain name to our local hostnames
cache-size=10000    # increase the cache to 10k records
no-hosts            # don't use the regular hosts file
addn-hosts=/etc/dnsmasq.hosts   # use alternate hosts file

# dhcp: set range, netmask and lease time for unidentified clients
read-ethers                     # read the /etc/ethers file for static assignment
dhcp-option=3,       # set the gateway (router)

# logging
log-facility=/var/log/dnsmasq   # log file
#log-queries                    # log dns queries
#log-dhcp                       # log dhcp activity

# disable a bunch of windows stuff
filterwin2k                     # block certain unnecessary windows requests
dhcp-option=19,0                # set ip-forwarding off
dhcp-option=44,          # set netbios-over-TCP/IP (WINS) nameserver(s)
dhcp-option=45,          # netbios datagram distribution server
dhcp-option=46,8                # netbios node type
dhcp-option=252,"\n"            # tell windows not to ask for proxy info
dhcp-option=vendor:MSFT,2,1i    # tell windows to release lease on shutdown

The upstream DNS servers have been selected by their speed from my location (according to namebench). Farther down I’ve also set the “all-servers” flag, which means that every request I make is resolved by each server that I’ve configured, and the first response is accepted. Like this fellow, I found that it resulted in a tremendous resolution speed increase. This is a terrible setting for a big network to use because of the increased traffic, but since I’m just a home user and since I’m caching my requests it’s not such a big deal. Were I not using this I might have gone for the “strict-order” option, to ensure that the faster servers I’ve listed at the top are tried first.

The Getflix server block defines which URLs are to be resolved via the Getflix servers, using some domains I found here, plus a few more that they hadn’t updated at the time of writing. Each server line is saying that for each of these addresses, use this DNS server to resolve it. I could have put all of them on one line, but preferred to separate them according to the service being accessed. I have repeated this whole block for the secondary Getflix DNS server.

I’ve commented the settings but it’s worth mentioning a few. I have specified the interface to listen on even though there’s only the one point of entry on my network. Recent versions of Dnsmasq block all traffic if nothing is specified here, which is the opposite to its previous behaviour.

I’ve specified that Dnsmasq is not to read nameservers from the /etc/resolv.conf file and not to read hostnames from the /etc/hosts file. Both of these are used by the system for other purposes as well, and I wanted to keep Dnsmasq ‘clean’. I’ve specified my own hosts file specifically for Dnsmasq instead. It looks something like this:

# /etc/dnsmasq.hosts     red     green    blue    yellow    purple

Dnsmasq is also being used as a DHCP server, so I’m specifying my gateway (the router) and an IP range to be used for unidentified clients. This includes a subnet value, which is required because my router is a DHCP relay. Thanks to the “read-ethers” option I can specify clients requiring static IPs in the /etc/ethers file, which looks a little like this:

# /etc/ethers

While troubleshooting my setup I was logging DHCP and DNS activity on top of the standard Dnsmasq reporting, but I’ve turned both off now. The final block of the config turns off a bunch of stuff related to Windows clients, which I do have, but my network is so small that they are pointless overheads.

That’s about it! Let me know if you have any questions about my configuration, or if you can help me improve upon it. My thanks to these articles, which pointed me in the right direction:


17 May 2014: Since posting this I’ve changed router, and the new one doesn’t support DHCP relaying. So I’m now doing DHCP on the router itself and am simply using Dnsmasq for DNS. I have commented out all of the DHCP lines in /etc/dnsmasq.conf and therefore no longer use /etc/ethers, but everything still works as before.

Favourite songs of 2013

Here are my thirty favourite tracks of 2013. I won’t say ‘best’ because that’d suggest knowledge and objectivity.

While Parquet Courts’ sensational Stoned And Starving is way out on top, Chvrches has eight of the thirty. Their debut – The Bones Of What You Believe - is my album of the year.

The full list

The top ten are my votes for the Triple J Hottest 100 (in order, despite the ballot’s approval voting system).

  1. Parquet Courts – Stoned And Starving
  2. The Preatures – Is This How You Feel?
  3. World’s End Press – To Send Our Love
  4. Chvrches – We Sink
  5. Grouplove – Borderlines And Aliens
  6. Boy & Bear – Southern Sun
  7. Lorde – Royals
  8. Big Scary – Belgian Blues
  9. Chvrches – Lies
  10. Haim – The Wire
  11. RAC, Kele, MNDR – Let Go
  12. Wolf Alice – Bros
  13. Chvrches – Science/Visions
  14. Bad//Dreams – Hoping For
  15. The Cairos – Obsession
  16. Chvrches – Broken Bones
  17. Arcade Fire – Reflektor
  18. The Blow – Make It Up
  19. Bibio – À tout à l’heure
  20. Chvrches – Lungs
  21. Noah And The Whale – There Will Come A Time
  22. Vampire Weekend – Step
  23. San Cisco – Beach
  24. Chvrches – The Mother We Share
  25. Daughter – Human
  26. Chvrches – Recover
  27. Chvrches – Gun
  28. Regina Spektor – You’ve Got Time
  29. Holy Ghost! – Dumb Disco Ideas
  30. Hookworms – Away / Towards

Fix the Triple J Hottest 100 voting system

The cover for Triple J's upcoming compilation
The upcoming compilation

The twentieth anniversary of the Hottest 100 inspired a “best of the last twenty years” version, the winners of which were announced last weekend. As always, there was much angst as to what appeared, what didn’t, and where they ranked. As with the “hottest of all time” count from 2009 the the biggest criticism of this latest poll seems to be the lack of women.

I’ve read a number of articles giving reasons for why this might be, and each of those may be correct. I’ve also read some things about how and why popularity isn’t a good metric for quality, and they’re probably also correct. What I’d like to question is whether the Hottest 100 is even a good measure of popularity, full-stop. Although I accept that the results are skewed towards the particular section of the community that votes in the poll, I don’t even think they accurately represent the opinions of that group.

I believe that the wrong voting system creates this problem, and the sheer number of tracks from which listeners can choose exacerbates it. When the Hottest 100 began, there was no way around this – the current method would have been the easiest way to process phone votes. Today, voting is done via the Web, so it’d be pretty easy to switch to a more appropriate system.

The problem with the current system

I’ll be using the recent vote as my example, but it’s a similar case for the annual events. Listeners were asked to pick a maximum of twenty tracks out of the tens of thousands of songs that might appeal to their demographic as a whole. They were not given the opportunity to rank those songs – each track listed in each ballot would put a single vote next to that song. A total is calculated, and the winners announced. Unfortunately I don’t have access to the ballots themselves and I can’t prove that the voting system skews the results, but I suggest that it’s a possibility.

Album cover for Oasis' 'Wonderwall'
Oasis – Wonderwall

The poll was topped by Oasis’ Wonderwall. Now, it may very well be that a plurality of listeners thinks that it’s the best song of the last twenty years. But it’s also possible that a large number of listeners voted for a bunch of other songs as their favourites, and put Wonderwall somewhere else in their lists for nostalgic reasons, perhaps as a shout-out to a fondly-remembered time in their lives. Tweep @NatalieGaronzi made this point somewhat more pithily.

There are a few Hottest 100 number ones that I (perhaps cynically) presume had been given votes for novelty reasons, despite their voters not necessarily considering the song the top track for the year. The flat voting system means that if enough people do this, the song can win. Perhaps that’s not a bad thing. Like the Condorcet voting method, it will favour candidates generally acceptable to the majority over candidates passionately supported by a minority. Depending on your definition of “hottest song”, this may be fine. Condorcet, however, at least allows the voters to rank their choices.

The problem with the number of tracks available

Album cover for Radiohead's 'Kid A'
Radiohead – Kid A

I like Radiohead, they have a mountain of quality songs, and I love a few of them equally. Paranoid Android, Karma Police, How To Disappear Completely, Everything In Its Right Place. So, I guess I could vote for all of them. But then, I love lots of different types of music and I don’t think I like Radiohead enough to give them four votes out of twenty. So, I decided to choose between them. Out of this lot my favourite is probably How To Disappear Completely, and I voted for it knowing it was unlikely to feature in the final count, so should I have voted for Paranoid Android or Karma Police to boost their votes?

Album cover for PJ Harvey's 'Rid Of Me'
PJ Harvey – Rid Of Me

Picking on Oasis again – they had the one song feature, and it topped the count. Radiohead had two songs (Paranoid and Karma) feature at 13 and 35. Numerous other bands had two or three songs appear. Is it possible that prolific, long-lived, well-loved, consistently-good bands suffer in such counts? I love PJ Harvey and voted for a few of her tracks. But it was hard to choose only a few. Are there other PJ fans who were in the same boat and chose differently to me? Maybe not, maybe I’m inventing problems here. But I think it’s a possibility worth considering. Oasis has many popular songs, but none stands out as obviously as Wonderwall.

A proposal

To solve these problems I would like to see the Hottest 100 allow voters to rank their songs, and use an STV-based proportional system such as Hare-Clark to tally the results. Let the users select as few or as many tracks as they like (perhaps limited to the number of vacancies to prevent people going overboard and crashing the system), and give them the ability to drag and drop them into the order they choose. The formulas for calculating quotas, surpluses, and exclusions are pretty straightforward, let computers do the work.

This would also go some way to alleviate the ‘number of eligible songs’ problem. Since I can vote for as many songs as I like, I’d vote for all four Radiohead songs somewhere in my list without too much concern for wasting my vote. I’d vote for all of the PJ Harvey songs I like and still have plenty of room for my other favourites.

Alternatively, some form of run-off voting system could be employed to whittle down the list first (perhaps to 500 or so) and then in the main vote the listeners would be restricted to those tracks alone. But this would create its own problems, and I would much prefer to kill two birds with the solution above.

Perhaps nothing I’ve suggested here would make a difference. Perhaps Triple J listeners genuinely aren’t fans of women in music, and maybe they genuinely love novelty songs. But to fix the voting system would at least remove these doubts and give us a clearer idea. Tracing preference flows would also provide some interesting metadata. Are fans of Mumford & Sons also into Of Monsters and Men? I bet they are.

Finally, it may be that I’m trying to wedge the wrong voting system into the wrong paradigm. If any psephologists read this, feel free to poke holes in it, but I’d love to hear some alternatives.