Yahoo! Openhack EU (Bucharest)

A pair of history enthusiastsLast weekend, I was invited to attend the Yahoo! Openhack EU event that was held in Bucharest, Romania as part of a team of “History Enthusiasts” to try and help participants generate ideas using cultural sector data. This came about from the really successful History Hack Day that Matt Patterson organised earlier this year and due to this, Yahoo!’s Murray Rowan invited him to assemble a team to go to Romania and evangelise. Our team comprised myself, Jo Pugh from the National Archives and our leader Matt; we went armed with the datasets that were made available for the hackday and a list of apis from Mia Ridge (formerly of the Science Museum and now pursuing a PhD).

The Openhack event (hosted in the Crystal Palace Ballrooms – don’t leave the complex we were told, the wild dogs will get you!) started with a load of tech talks, most interesting for me was the YQL one (to see how things had progressed), Douglas Crockford (watched this on video later) on JSON and also Ted Drake‘s accessibility seminar. One thing I thought that was absent was the Geo element, something that is extremely strong at Yahoo! (api wise before you moan about maps) and an element that always features strongly at hack days in mashups or hacks. Our team then gave a series of short presentations to the Romanians who were interested in our data unfortunately not too many, but that seemed to be the norm for the enthusiasts. We felt that a lot of people had already come with ideas and were using the day as a collaborative catalyst to present their work, not that this is a bad thing, be prepared and your work will be more focused at these events. Between us we talked about the success of the hackday at the Guardian and Jo presented material from the National Archives and then we discussed ideas with various people throughout the day; for example:

  1. Accessing shipping data – one of the teams we spoke to wanted some quite specific data about routes. However, we found a static html site with a huge amount of detail and suggested scraping and then text extraction for entities mentioned and producing a mashup based on this – see submarines hack
  2. How to use Google Earth time slider to get some satellite imagery for certain points in time (the deforestation project was after this)
  3. Where you can access museum type information – history hack days list
  4. Which apis they could use – Mia Ridge’s wiki list

I tried to do a few things whilst there, some Twitter analysis with Gephi and R (laptop not playing ball with this) and building some YQL opentables for Alchemy’s text extraction apis and Open Library (I’ll upload these when tested properly). Matt looked at trying to either build a JSON api or a mobile based application for Anna Powell-Smith‘s excellent Domesday mapping project (django code base) and Jo played with his data for Papal bullae from the National Archives using Google’s fusion tables and also looking at patterns within the  syntax via IBM’s Manyeyes tool.

Ursus blackHacking then progressed for the next 24 hours, interspersed with meals and  some entertainment provided by the Algorythmics (see the embedded video below from Ted Drake) who danced a bubble sort in Romanian folk style, and 2 brief interludes to watch the Eurovision (Blue and the Romanian entry). We retired to the bar at the JW Marriot for a few Ursus beers and then back to the Ibis for the night before returning the next day to see what people had produced to wow their fellow hackers and a panel of judges. Unfortunately, I had to head back to the UK (to help run the ARCN CASPAR conference) from OTP when the hacks were being presented, so I didn’t get to see the finished products. The internet noise reveals some awesome work and a few that I liked the sound of are commented on below. I also archived off all the twitter chat using the #openhackeu hashtag if anyone would like these (currently over 1700 tweets). There was also some brilliant live blogging by a very nice chap called Alex Palcuie, which gives you a good idea of how the day progressed.

So, after reading through the hacks list, these are my favourites:

  1. Mood music
  2. The Yahoo! Farm – robotics and web technology meshed, awesome
  3. Face off (concept seems good)
  4. Pandemic alert – uses webgl (only chrome?)
  5. Where’s tweety

And these are the actual winners (there was also a proper ‘hack’, which wasn’t really in the vein of the competition as laid out on the first day, but shows skill!):

  • Best Product Enhancement – TheBatMail
  • Hack for Social Good – Map of Deforested Areas in Romania
  • Best Yahoo! Search BOSS Hack – Take a hike
  • Best Local Hack –  Tourist Guide
  • Hacker’s choice – Yahoo farm
  • Best Messenger Hack – Yahoo Social Programming
  • Best Mashup – YMotion
  • Best hacker in show – Alexandru Badiu, he built 3 hacks in 24 hours!

To conclude, Murray Rowan and Anil Patel‘s team produced a fantastic event which for once had a very high proportion of women (maybe 10-25% of an audience of over 300) in attendance – which will please many of the people I know via Twitter in the UK and beyond. We met some great characters (like Bogdan Iordache) and saw the second biggest building on the planet (it was the biggest on 9/11 the taxi drivers proudly claim) and met a journalist I never want to meet again….. According to the hackday write up, 1090 cans of Red Bull, 115 litres of Pepsi and 55 lbs of coffee were consumed (and a hell of a lot of food seeing some of the food mountains that went past!)

Here’s to the next one. Maybe a cultural institution can set a specific challenge to be cracked at this. And I leave you with Ted Drake‘s video:

The caspar workshop banner

#ACRNCASPAR Workshop on Archaeologists & the Digital: Towards Strategies of Engagement

The caspar workshop banner

Yesterday UCL’s Institute of Archaeology (which incidentally is top ranked archaeological institution in the UK) hosted a workshop of papers centred around archaeological engagement using digital technologies and the development and implementation of strategy to achieve this. The conference/workshop was badged under the Centre for Audio Visual Study and Practice in Archaeology and the Archaeology and Communication Research Network and was organised by Chiara Bonacchi (with some help from me, but she did the majority of the work!).  We managed to assemble a varied audience (full house) via eventbrite and a wide range of speakers and the below (sorry this will be long and maybe dull and I didn’t make notes so most of this is from memory!) will recap on the presentations that were given and in the next month or so, they will be available in video podcast format. There was some backchannel discussion by various Twitter users (myself included) and I’ll also show some analysis of what they were saying. Another bonus from this workshop, was the networking opportunities that it produced. I met lots of people that I knew digitally, but not in real life, so it was excellent to make their acquaintance!

If anything is wrong please do correct me!

Morning session one ~ chaired by Don Henson

Smart phones and site interpretation: The Street Museum application ~ Meriel Jeater (Museum of London)

Robert Nesta MarleyMeriel spoke about the Museum of London’s recent launches of a dual platform smart phone application entitled Street Museum. This used location based technology via the GPS chip of the user’s phone to pinpoint their position and deliver content from the Museum’s archive of pictures that related to their present location. The application was created by Brother and Sisters agency, originally on just the iPhone during a 5 month cycle (concept created in January 2010, specified in February and delivered in May 2010)  at a discounted (!) cost of £20,000 (they were told the actual cost would be £40-50k, but I think that this could be built in house or at a hackday using appropriate technology – jQuery mobile, HTML 5 etc). The Android application that was delivered subsequently to this cost a further £28,000.

Re enactorsOther applications are now being developed by the Museum, specifically a partnership with Nokia is producing an application entitled “Soundtrack to London” and this was shown with an example of the great Robert Marley and a Romanised version of Street Museum (a name they have trademarked) called Streetmuseum Londinium.  The Londinium app uses green screen generated Roman re-enactors and layer them over current images. I am unclear if this uses their API (a hack was produced at history hack day using this for mobile devices). If you want to find out more about their applications, you can view more on their dedicated page . This does raise the issue of social exclusion for those without access to premium handsets, but that comment was raised via twitter…. Other issues that came up with these apps, included questions relating to why the application wasn’t just a new layer with the current Street Museum application.

Social media as marketing  tools at the British Museum ~ Lena Zimmer (British Museum)

Unfortunately, Lena couldn’t attend the workshop, so I was persuaded to give her paper and I probably didn’t do it justice! This talk centred around the use of Social media platforms for the dissemination of propaganda about the British Museum and to facilitate dialogue between the institution and fans/followers etc. This talk was pretty straight forward and there’s a lot of statistical information that can be used to benchmark with archaeological and museums.  The BM started to explore the concept of Social Media when refreshing the web presence back in 2006, but the full on uptake of these facilities only began in January 2009 (Twitter) and April (Facebook). A relaunched Youtube channel came online in August 2010 and a blog came online in April 2010. (These are all subsequent to the Portable Antiquities Scheme using these facilities and other departments have a presence as well.)

Two points Lena wanted iterated, that I probably didn’t push heavily enough:  “A recent study by the Arts Council England, MLA and Arts & Business Digital audiences: engagement with arts and culture online (November 2010) has shown that 53% of the online population have used the internet to engage with arts and culture in the past 12 months…. The report also states that in particular Facebook ‘has become a major tool for discovering as well as sharing information about arts and culture, second only to organic search through Google and other search engines.'”

The Museum’s use of social media is closely allied to the Museum’s  strategy,  specifically “to enhance access to the collection (engagement)” and “to increase self-generated income through growth” and therefore Lena’s use of social media allows for:

  • Increase engagement and discussion around exhibitions and the collection with a world-wide audience
  • Drive income streams for exhibition tickets sales, events, Membership, donations, BMco, and Do&Co
  • And also to grow audiences across our online platforms by March 2012  (targets given by Lena are Facebook 200,000 fans and  Twitter 100,000 followers)

Of interest to many would be the methodology for supporting analysis of social media and the fact that the Facebook audience shows a 60% figure of female fans! The analysis of social media uses several measures:

  1. Popularity matrix TweetLevel for influenece, populaity, engagement and trust and Klout for other influence measures (measures that Lorna Richardson will use in future for her research).
  2. Audience advocacy (re-tweets, likes, shares) on average we get 11 re-tweets per tweet (measured between Oct 10 – Jan 11) and Facebook likes average 158 per post and 12.2 comments (measured Aug 2010 – April 2011)
  3. Observation and close monitoring of social media sites à What type of comments are being made, how are people responding to our content using Tweetdeck etc.

Mobile learning: connecting pupils, curriculum, and informal learning environments ~ Theano Moussouri (UCL)

Theano presented about a collaboration between the National Maritime Museum (NMM) and UCL which investigated mobile phone based learning and its outcomes centred around the trans-Atlantic Slave trade of Black Africans. This used the ookl platform to deliver content and had a structured visitor study to work how well the project had engaged with the target audience (remember this workshop is all about engagement and digital!) The study was based around social-constructivismprinciples and followed a three step model of qualitative research i.e. before, during and after the visit to the NMM and involved staff of the school, the children and the museum staff themselves.  Key elements in this process included the people, the content, the technology and the context in which all of these interacted. Theano stated the relevancy of content was imperative, that there were issues with the choices of technology involved and that the people involved would also limit the experimental nature of this work.

I lost track of the outcome of this talk as I was fixing our work website via SSH, so I am unsure if this project was deemed a success of not. If you can help finish the notes on this, do comment!

Morning session two ~ Chiara Bonacchi (UCL)

The nature of archaeological communities and spatial data online ~ Andy Bevan (UCL)

A blue plaque for Morty!
Mortimer Wheeler plaque CC Image from Flickr by

Andy Bevan, GIS wunderkind, presented the seminar that he gave during the CASPAR series. It included references to Asterix, Mortimer Wheeler and agency and models. I enjoy these sort of theoretical talks! Andy covered the following concepts and also referred to the Long Tail idea of digital engagement:

  1. Free and Open Source software vs  vendor lock in
  2. The concept of authoritative content/authorship
  3. monetisation of content
  4. game based theory or gamification of applications and websites
  5. agency (online and offline) – this is where Asterix featured heavily with pictures for the individual, the household, institutions and artefacts (a menhir, I must have another menhir!)
  6. Relational models – covering lots of theory from Alan Fiske’s work from 1991 and 2005
  7. Online community building within social networks (including geocacher for example).
  8. Open data and Open archaeological initiatives (specific mention was made to, Ordnance Survey opendata, the ADS and the Open Knowledge Foundation
  9. Geo and neo-geo concepts
  10. Augmented reality experiences

Twitter and archaeology online  ~ Lorna Richardson (UCL)

Use of twitter in europe

Lorna Richardson, current PhD student at the Centre for Digital  Humanities UCL, presented on her research into the use of twitter and archaeological engagement online (you can read far more on her research aims on her website). Lorna’s been working with me since her Masters, so I’m quite familiar with her work and so I hope I have captured the essence of her presentation below. As we should quite rightly assume, not every one uses or is au fait with the concept of Twitter (very small percentage of the world’s population use it – newspapers get over the fixation please) and so she gave an introduction as to what the social media platform is and how it works, specifically:

  1. explaining its genesis (140 characters dictated by concept originally being for mobile phone sms)
  2. How the @ syntax worked
  3. How hashtags came about to facilitate sharing
  4. How retweets work
  5. User figures of use – peer index sent a good stat out on twitter – “Twitter Confirms It Has Passed 200 Million Accounts, 70% of Traffic Now International” which I passed on a few weeks ago
  6. Where the peak usage seems to be in Europe taken from a recent visualisation posted by a company called “eeve”  (shown in this post) with some interesting spikes on capital cities, with London being the most dominant centre.

Lorna then went on to talk about her twitter and archaeologists survey, how long it ran for and what sort of feedback she managed to get from the survey monkey questionnaire she created. She showed some graphics showing how people accessed twitter, top words used in some tweet analysis and 2 wordle graphics for some basic text based analysis of responses – for example news and field work updates were the things that people wanted to see shared and research and networking the main uses of Twitter. She then also talked about the academic problems that she will now face due to Twitter’s change in terms of use for their api and that to get access to full data hose of tweets, she’ll now need to find £2000 (or $ can’t remember!) to get access from a company called gnip. Questions were raised from the floor regarding use of RSS for monitoring twitter, but this feature is being phased out from their interface, twapperkeeper has had to pull the export functions etc and Lorna needs some quite detailed information.

Lorna finished by making a call to arms, asking everyone there to consider using twitter to facilitate archaeological engagement and her research. Welcome Elizabeth Warry and Chiara today!!

Wessex Archaeology on the web ~ Tom Goskar (Wessex Archaeology)

The man of Kernow, Tom Goskar then presented on his fantastic work for Wessex Archaeology, managing their open source-tastic web presence. Tom operates in a very similar manner to me, using the best open souce tools and software to complete the task in hand. His talk resonated deeply with what I normally lecture on for Tim Schadla-Hall’s course.  Tom’s organisation is at the forefront of commercial archaeological work, employing nearly 200 people from 4 locations around Britain. It has a relatively high turnover of £7million pounds and generates reams of web content and ‘grey’ literature (a pigeon hole term that Tom wants to eliminate) and his skills turn this work into a cohesive web presence that is visited by on average 12,000 people per month. Their website was orginally produced to cope with the interest around the Amesbury Archer, which was discovered in 2002 and this served to highlight the benefit of pushing their really interesting work out digitally.

Wessex archaeology from 2002, wayback machine
Wessex archaeology from 2002, wayback machine

Tom has moved on substantially since 2002, away from static HTML to sophisticated content management and blogging tools (currently have 14) and he uses the best software for the job:

  • Drupal for the main site
  • WordPress for blogs
  • Omeka for the forthcoming collections and publications data

Tom has fully embraced social media, creating podcasts (first in 2005) and has used flickr to disseminate images (like us) and they have received over 600,000 views since they started to use the platform and been remixed into interesting work due to the use of the Creative Commons licence. In his current stewardship, Tom now has to manage over 4000 pages, has hundreds and hundreds of downloads (on Scribd etc), manages social media presences and has managed to get them a high profile on sites such as Time Team on channel 4. His work is definitely an exemplar in the field of Public Archaeology.

Tom’s last slide summed up today’s workshop completely for me – It’s all about the public – spot on young man.

Afternoon session ~ Chair Tim Schadla-Hall (UCL)

Blogs and wikipedia: New frontiers for archaeological research? ~ Amara Thornton (UCL)

Flinders Petrie on facebookAmara Thornton has recently completed her PhD at UCL (award pending) on social networks around famous Palestine linked archaeologists (Garstang, Kitchener etc) and at the conference, she presented on blogging and wikipedia use in the archaeological research sphere. Amara ran through the concept and history of Wikipedia and gave some statistical analysis of the most popular pages for their articles – from her research, the most popular archaeological linked article was for the Acropolis and she also showed how many views archaeological topic pages received and how you could access the editing logs!

There were a couple more wordle graphics, one showing non archaeological thoughts on  archaeology (am I right here?), which has Indiana Jones and pith helmets to the fore. She then went on to talk about the convergence of wikipedia and facebook – or wikipedia meets facebook and she showed how people had befriended John Garstang (6 likes)  and Flinders Petrie (over 200 likes). Sean Graham’s Electric archaeologist blog and his research on blogging was then mentioned, with the network graphic being shown to the audience which demonstrated that Coleen Morgan’s blog appears in the middle with links from all over the place. (If you’re interested in how he produced this, check out Gephi.) Amara also suggested that a concentrated effort to increase reliability and scope of archaeological wiki articles could be co-ordinated – something that has also been mooted on Twitter by various people.

Open access and open data ~Brian Hole (UCL)

I was starting to flag a little at this point, so maybe my memory is a little hazy here. Brian currently is completing a PhD at UCL and also works at the British Library and Ubiquity Press. What stood out for me at the start of Brian’s presentation was my hatred of the Prezi format – nausea inducing as it goes backwards and forwards around his Indian figurine of Shiva. Brian was talking about the attempts being made to make access to data more accessible via the implementation of open licences and adoption of DOIs for cataloguing or archiving and providing an endpoint resolver for journals and data. Many in the audience felt that the process was simplified too much for the data aspect, but that can probably be overcome.  Brian also presented his concept of a new archaeological journal that would interact with the Archaeological Data Service and have DOIs built in.

If anyone has better recollection of this paper, please do comment!

Strategy games and engagement strategies ~ Andrew Gardner (UCL)

Andrew presented the strategy game seminar that had been part of the CASPAR seminar series and this centred around the use of strategy games as a means for archaeological engagement. Andrew ran through the development of games covering Age of Empires, Civilisation and the use of historians to try and ensure historical integrity and accuracy; he also covered how much money the games industry turned over annually, some concepts of strategy development and how personas and people’s own minds can shape the virtual worlds that they create online or offline (if the game isn’t internet based.) The archaeological type games that I tend to play are the more swashbuckling type – Indiana Jones or Lara Croft – and I have tended to stay away from the turn based game concept. Andrew stated that if they had been around when he was doing his PhD, he would never have finished it!

Archaeological TV channels online: an assessment of potential ~ Chiara Bonacchi (UCL), Charles Furneaux (Kaboom Film and Television), Dan Pett (The British Museum)

The final paper of the day was presented as a combination paper between myself, the conference organiser (Chiara) and Charles Furneaux and it was centred around the potential engagement provided by the use of archaeological television – TV and web TV. Charles presented at the CASPAR series, but I unfortunately missed out on that! Rounding up this talk is easier than for the others, as I have the presentation!

Charles in his section covered  traditional archaeological programming from 2003 to the present by presenting some statistics:

  1. 45 hours of archaeological TV in the UK in 2010
  2. 25 hours in 2009
  3. 2 types of archaeological TV – presenter led and Time Team type shows
  4. in 2003, terrestrial TV showed 185 hours of archaeology and ancient history and 90 hours of heritage
  5. Traditional TV has suffered due to the massive take up of multi-channel TV in homes
  6. Since 1983, ITV’s market share has declined from 48% to 17%!
  7. Charles sees the take up of web TV as a chance for the takeup of narrowcasting rather than broadcasting and this presents a great opportunity for archaeologists to engage!

Chiara then presented her section on archaeological web TV, specifically with regards to two Italian channels – Archeologia Viva TV and Sperimentarea TV and the Archaeology Channel . Chiara covered the uptake of their websites and presented some interesting statistical analysis:

  1. Archeologia Viva TV has a bounce rate of 0.5% (amazing figure! Scheme website is 30%)
  2. They have 177  videos available online in three categories – news and events, documentaries and conversational pieces
  3. They page view level and number of visitors is low in comparison to many sites, but the users have high quality engagement! They stay for an average of 8 minutes – does this show a maximum length of video?
  4. Visitors from 82 countries, but dominated by Italy

I then presented badly on institutional web TV channels, something that is being explored in many national and local institutions as the technological and cost barriers come down when making video footage. Things I identified were:

  1. low cost platforms – youtube, vimeo, amazon streaming etc
  2. high costs for training and sustainability
  3. The UK produces some very high quality footage online – eg Wessex Archaeology, 360 Productions
  4. There’s a wide variety of institutions with successful channels – V&A, ArtBabble at Indianapolis Museum, Thames Discovery Project
  5. the most successful videos are shorter, with high production levels, punchy  & dynamic story telling
  6. people need training in editing, in filming and you need to get engaging presenters
  7. videos aren’t necessarily viewed a lot
  8. strong brands can help you get the most views

Overall, our paper tried to show that archaeological TV can survive and thrive on the web platform!

Discussion and next steps ~ Chair: Dan Pett (The British Museum)

For some reason, I was assigned the task of running the discussion at the end of the event and hopefully I managed to help facilitate some good discussion on the day’s topics and the issues that it raised. As I was chairing, my memory of this isn’t great! Things covered included the digital divide between those who can do the digital because they have the skills, how to get trained to produce digitally, sustainability, burden of web not being put on just one person’s shoulders, the need for institutional buyin.  We also covered possible topics for the next CASPAR event – a conference at some point and what things the centre could eventually do. Audience research is an angle that could be pursued, training in digital resources and providing templates etc for how to create and engage in the digital sphere.  Mention was made of the excellent work of the Samsung Digital Discovery Centre at the British Museum by Elizabeth Warry, comments were raised relating to twitter chat and I basically bored everyone senseless.

We also covered the upcoming Day of Archaeology project that has been generated by Matthew Law, Lorna Richardson, Jess Ogden, Andy Dufton, Stu Eve, Tom Goskar and myself. We also had a very brief presentation by Kathryn Piquette covering the use of RTI for the analysis of archaeological objects, I’m sorry we had to rush that.

We hope that after the workshop, people will consider joining in with this project and document the day of archaeology and what you do in your working life. Special mention to Pat Hadley for coming down from York as he saw the hashtags and got on the train.

Thank you to Chiara for organising the day.

Twitter wise, there was a reasonable backchannel, 60+ people attended and some brief analysis of tweets can be made using various social media tools and looking at all tweets that utilised the hashtag #acrncaspar. I haven’t really done this, but here’s some basics:

  1. Twittersentiment showed 78% of people were positive in their tweets about the day
  2. My id was the only one geo locating the tweets
  3. ACRN CASPAR tweets in spreadsheet format can be downloaded
  4. 215 tweets are available
  5. The dreaded wordle looks like the below:
    Wordle of tweets
  6. 38 people tweeted about the #acrncaspar hashtag
  7. Top tweeter was Jess Ogden

Twitterati were:

Jess Ogden also produced a Storify for the workshop, this is embedded below. Many thanks to her!

Archiving twitter via open source software

Over the last few months I’ve been helping Lorna Richardson, PhD student at the Centre for Digital Humanities at UCLThe Twitter logo from their official set. Her research is centred around the use of Twitter and social media by archaeologists and others who have an interest in the subject. I’ve been using the platform for around 3 years (starting in January 2008) and I’ve been collecting data via several methods for several reasons; for a backup of what I have said, to analyse the retweeting of what I’ve said and to see what I’ve passed on. To do this, I’ve been using several different open source software packages. These are Thinkupapp, Twapperkeeper (open source own install) and Tweetnest. Below, I’ll run through how I’ve found these platforms and what problems I’ve had getting them to run. I won’t go into the Twitter terms and conditions conversation and how it has affected academic research, just be aware of it…..

Just so you know the server environment that I’m running all this on is as follows, the Portable Antiquities Scheme‘s dedicated Dell  machine located at the excellent Dedipower facility in Reading, running a Linux O/S (Ubuntu server), Apache 2, PHP 5.2.4, MySql 5.04 and with the following mods that you might find useful curl, gd, imagemagick, exif, json and simplexml. I have root access, so I can pretty much do what I want (as long as I know what I’m doing, but Google teaches me what I need to know!) To install these software packages you don’t need to know too much about programming or server admin unless you want to customise scripts etc for your own use (I did….) You can probably install all this stuff onto Amazon cloud based services if you can be bothered. I’ve no doubt made some mistakes below, so correct me if I am wrong!

Several factors that you must remember with Twitter:

  1. The system only lets you retrieve 3200 of your tweets. If you chatter a lot like Mar Dixon or Janet Davis, you’ll never get your archive 🙂 Follow them though, they have interesting things to say….
  2. Search only goes back 7 days (pretty useless, hey what!)
  3. Twitter change their T&C, so what is below might be banned under these in the future!
  4. Thinkuppapp and Twapperkeeper use oauth to connect your Twitter account so that no passwords are compromised.
  5. You’ll need to set up your twitter account with application settings – secrets and tokens are the magic here – to do this go to and register a new app and follow the steps that are outlined in the documentation for each app (if you run a blog and have connected your twitter account, this is old hat!)


Tweetnest is open source software from Andy Graulund at Pongsocket. This is the most lightweight of the software that I’ve been using. It provides a basic archive of your own tweets, no responses or conversation threading, but it does allow for customisation of the interface via editing of the config file. Installing this is pretty simple, you need a server with PHP 5.2 or greater and also the JSON extension. You don’t need to be the owner of the Twitter account to mine the Tweets, but each install can only handle one person’s archive. You could have an install for multiple members of your team, if you wanted to…..

Source code is available on github and the code is pretty easy to hack around if you are that way inclined. The interface also allows for basic graphs of when you tweeted, search of your tweet stream and has .htaccess protection of the update tweets functionality (or you can cron job if you know how to do this.) My instance of this can be found at Below are a few screen shots of the interfaces and updating functions. The only issue I had with installing this was changing the rewriteBase directive due to other things I am up to.

Tweet update interface
Tweet update interface
Monthly archive of tweets
Monthly archive of tweets


Thinkupapp has been through a couple of name changes since I first started to use it (I think it was Thinktank when I first started), and has been updated regularly with new β releases and patches released frequently. I know of a couple of other people in the heritage sector that use this software (Tom Goskar at Wessex and Seb Chan of Sydney’s Powerhouse Museum mentioned he was using it this morning on Twitter.)

This is originally a project by Gina Trapani (started in 2009), and it now has a group of contributors who enhance the software via github and is labelled as an Expertlabs project and is used by the Whitehouse (they had impressive results around the time of the State of the Union speech). This open source platform allows you to archive your tweets (again within the limits) and their responses, retweets and conversations (it also has a bonus of being able to mine Facebook for pages or your own data and it can have multiple user accounts). It also has graphical interfaces that allow you to visualise how many followers you have gathered over time, number of tweets, geo coding of tweets onto a map (you’ll need an api key for googlemaps), export to excel friendly format and search facility. You can also publish your tweets out onto your own site or blog via the api and the system will also allow you to view images and links that your virtual (or maybe real) friends have published on their stream of conciousness. You can also turn on or off the ability for other users to register on your instance and have multiple people archiving their Tweet stream.

This is slightly trickier than tweetnest to install, but anyone can manage this if they follow the good instructions and if you run into problems read their google group. One thing that might present as an issue if you have a large amount of tweets is a memory error – solve this by setting ini_set(‘memory_limit’,’32M’); in the config file that throws this exception, or you might time out as a script takes longer than 30 seconds to run. Again this can be solved by adding set_time_limit ( 500 );  to your config file. Other things that went wrong on my install included the SQL upgrades (but you can do these manually via phpmyadmin or terminal if you are confident) or the Twitter api error count needed to be increased. All easy to solve.

Things that I would have preferred on this are clean urls from mod_rewrite as an option and that maybe it was coded using one of the major frameworks like Symfony or Zend. No big deal though. Maybe there will also be a solr type search interface at some point as well, but as it is open source, fork it and create plugins like this visualisation.

You can see my public instance at and there’s some screen shots of interfaces below.

My thinkup app at
My thinkup app at
Staffordshire hoard retweets
Staffordshire hoard retweets

Embed interface
Script to embed your tweet thread into another application

Graphs of followers etc
Graphs of followers etc


The Twapperkeeper archiving system has been around for a while now, and has been widely used to archive hashtags from conferences and events. Out of the software that I’ve been using, this is the ugliest, but perhaps the most useful for trend analysis. However, it has recently fallen foul of the changes in Twitter’s T&C, so the functionality of the original site has had the really useful features expunged – namely data export for analysis. However, the creator of this excellent software created an opensource version you can download and install on your own instance; this has been called yourTwapperkeeper. I’ve set this up for the Day of Archaeology project and added a variety of hashtags to the instance so that we can monitor what is going on around the day (I won’t be sharing this url I am afraid….) Code for this can be downloaded from the Google code repository and again this is an easy install and you just need to follow the instructions. Important things to remember here include setting up the admin users and who is allowed t0 register archives, working out whether you want to associate this with your primary account in case you get pinged for violation of the terms of service, setting up your account with the correct tokens etc by registering your app with twitter in the first place.

Once everything is set up, and you start the crawler process, your archive will begin to fill with tweets (from the date at which archiving started) and you can filter texts for retweets, dates created, terms etc. With your own install of twapperkeeper, you can still export data, but at your own risk so be warned!