All Points Blog
Our Opinion, Your Views of All Things Location

  • HOME

    About Us


    Contact Us

    Follow Us

    Feed  Twitter 


    All Points Blog

    Catching geospatial news that others miss. Delivered daily.

    Preview Newsletter | Archive

    << February 2013 >>
    S M T W T F S
             1 2
    3 4 5 6 7 8 9
    10 11 12 13 14 15 16
    17 18 19 20 21 22 23
    24 25 26 27 28    

Monday, February 25, 2013

Harvard’s TweetMap (ALPHA): Explore 125 Million Tweets

Ben Lewis, who's over at the Harvard Center for Geographic Analysis shared that TweetMap, built on MapD (general purpose SQL database) and Harvard's WorldMap, is up and running in ALPHA.

Officially, "TweetMap ALPHA is an instance of the MapD big data platform developed through a collaboration between Todd Mostak and Harvard CGA." I corresponded with Mostak to learn a bit about the project and its future.

TweetMap allows the exploration of some 125 million tweets from 12/10/2012 to 12/31/2012. Visitors can query them by time, space, and keyword.  The hope is to increase the size of the database, perhaps to billions. Real time streaming from tweet-tweeted to tweet-on-the-map in under a second has been implemented.  MapD makes use of any number of commodity Graphic Processing Units - so it will use whatever it has access to use. Todd Mostak notes, "it runs equally well on my laptop with 1 GPU as our demo server with 4 as a Dell GPU server with 16 (of course the more GPUs you have, the faster things will run and the more data you can store)." GPUs, and their role and geospatial, are covered in this Directions Magazine article.

Harvard users (with a log-in) can even download the tweets found by their queries. The rest of us can see the results as individual "dots" (with details of the tweet content, data, lat/long, etc.) and/or see a heat map. The one at right is a query for "Obama" across the entire time frame. I also searched for "adena" and found but a handful - many around a geography with that name.

What's next? Mostak shares:

...we will soon allow for spatial joins/intersections of points to polygons.  This means that the user could upload an arbitrary shapefile of say census districts and basically find the average sentiment of tweets containing the word "Obama" in each district and then regress that against attribute data, such as income or education level for the district.  On 4 GPUs we should be able to do around 4 billion such joins per second, as opposed to PostGIS or ArcGIS which seem to top out at 10,000-20,000 such operations per second, allowing real-time choroplething and regression analysis of spatial data for datasets which might take PostGIS or ArcGIS many days to do the same thing.

by Adena Schutzberg on 02/25 at 04:34 AM | Comments | Bookmark and Share

More from Directions Magazine

All Points Blog Newsletter

Catching geospatial news that others miss. Delivered daily.

Preview Newsletter | Archive


Feed  Twitter 

Recent Comments

Publications: Directions Magazine | Directions Magazine India
Conferences: Location Intelligence Conference | .Map Conference | GEO Huntsville
© 2014 Directions Media. All Rights Reserved