Showing posts with label inequality. Show all posts
Showing posts with label inequality. Show all posts

March 11, 2015

New paper accepted in Landscape and Urban Planning!

We're very happy to announce that three-fifths of the collective have recently had a new paper accepted for publication in the journal Landscape and Urban Planning, as part of a special issue on critical visualization edited by Annette Kim, Katherine Foo, Emily Gallagher and Ian Bishop. Taylor, Ate and Matt's paper, "Social media and the city: rethinking urban socio-spatial inequality using user-generated geographic information", builds on our earlier calls to go 'beyond the geotag' in order to develop alternative conceptual and methodological approaches for the use of geotagged social media data, drawing attention to the variety and complexity of socio-spatial processes embedded in such data.

Using Louisville, Kentucky as a case study, our paper examines the socio-spatial imaginary of the '9th Street Divide', separating the city's largely poor and African-American West End from its more affluent and predominantly white areas to the east.


While a more conventional analysis of these inequalities as reflected in geotagged tweets might look a bit like the map above, we argue that such maps of isolated, atomistic dots do little to reveal the nature of inequality between places, and do a disservice to the data itself by stripping it of much of its context. So, rather than just arguing that the West End seems to have a relative lack of tweeting activity compared to other parts of the city -- and thus deducing that the digital divide is persistently reflected in this data -- we put these different areas of the city in comparison to one another in order to understand how both individuals and groups move through the city and (re)produce landscapes of segregation and inequality through their everyday practices and mobilities.


Using a novel method for analyzing this data, we attempt to demonstrate how the idea of the West End as a separate and apart from the rest of the city is challenged by the realities of people's everyday movements. Rather than being isolated, West End residents are actually much more spatially mobile within the city, while East End residents tend to be much more confined to their own neighborhoods.

So while the 9th Street Divide remains a key way of understanding and highlighting the spatial dimension of urban inequality in Louisville, we tend to think that this framing actually reinforces the understanding of the West End as a kind of 'problem area'. And while only a partial contribution to this argument, we hope that understanding the West End through its relations with, and connections to, other spaces and places ameliorates the vilification and pathologizing that is so common in discussion of racial and socio-economic inequality in highly segregated cities.

Ultimately, we hope this paper can allow for an alternative conceptualization of urban inequality in Louisville and the West End, while also demonstrating the utility of a situated and contextualized, mixed methods approach to the study of geotagged social media data, emphasizing the full range of socio-spatial processes embedded in this data that can't be captured in just a single point on a map.

The full citation for our paper is below:
Shelton, Taylor, Ate Poorthuis, and Matthew Zook. (Forthcoming) Social media and the city: rethinking urban socio-spatial inequality using user-generated geographic information. Landscape and Urban Planning.

December 18, 2014

Deconstructing the (most detailed tweet) map (ever)

If you’re the kind of person who visits our blog with any regularity, you’re almost certainly also the kind of person who would have seen some version of the map below in the last couple of weeks. Created by Eric Fischer of Mapbox, this map was released along with a blogpost entitled “Making the most detailed tweet map ever”, discussing some of the data cleaning and visualization methods necessary to produce such a striking map. The map is undoubtedly interesting and has sparked a great deal of interest from all corners of the internet, but there’s just something about the framing that rubs us the wrong way. While Eric’s post emphasizes the making part of the equation, the internet hype cycle around it has caused us to read the title a bit more along the lines of:

"Making THE MOST DETAILED tweet map EVARRRR!!!!"

That is to say, for all of the admittedly really great detail about what went into making this map, the framing of this map as not only a detailed map of six billion or so geotagged tweets, but as the most detailed tweet map ever, raises more questions than it answers. For example, what constitutes ‘detail’ in tweet maps? What do competing definitions of ‘detail’ reveal about what we value in this kind of analysis? What do these particular ideas of ‘detail’ foreclose in terms of other possibilities for analysis?

These are important questions, regardless of whether they’re applied to this particular map or any other one. The issue in this case, however, seems to be that the answers to some of these questions conflict with one another, or with the ways the project is itself described. The detail that seems to be valued here is of the “every tweet ever” variety, or, put simply “more = better”, the fetish for bigger data at the expense of all else.

But more data isn’t necessarily better, and it certainly doesn’t mean that there’s more detail, especially when the only bit of detail you're concerned with in each of these six billion points is the latitude and longitude coordinates. Each of these individual tweets contains a wealth of other interesting information, from information about the user and the way they describe themselves, to the time the tweet was created to the text of the tweet itself, which might contain hashtags that link up with bigger conversations, or @-mentions to other Twitter users that might be used to understand social networks and interactions. All of these bits of information represent a kind of detail that is not included in this, the most detailed tweet map ever

As we’ve been arguing for the past two years or so, there are a range of social and spatial processes represented in geotagged tweets that we can’t get at if all we’re concerned with is the latitude and longitude coordinates. So to say that this represents the most detailed tweet map ever serves to reify what we see as two of the most problematic assumptions of contemporary big data/social media research: (1) that more data is equivalent to better data, and (2) that the only important aspect of the data is the geographic coordinates attached to it. There's lots of interesting stuff that can be done with this kind of data, and we can do better than simply plotting points on a map and calling it a day [1].

Even if one were inclined to accept the argument that more tweets equals more detail, how should we interpret the fact that this map only visualizes about 9% of all geotagged tweets, due to the design decisions necessary in order to make the map nice and pretty [2]? Due to the existence of exact or near-duplicate coordinates that would make points indistinguishable from one another, this, the most detailed tweet map ever, actually eliminates about 91% of the detail that it seems to value most (i.e., the presence or absence of points on the map). The Gizmodo headline about the map reads, “The Most Detailed Tweet Map Ever Includes 6,341,973,478 Tweets”... except that, you know, it doesn’t [3].

Of course, there’s also good bit of imprecision in the locational accuracy associated with geotagged tweeting; our iPhones don’t come with military grade GPS units installed in them. So while Mapbox CEO Eric Gunderson was marveling at the detailed micro-geographies of an airport gate seen in the map, he was ignoring both the fact that all of those folks on the jet bridge could just have well been 40 feet away, and that a number of tweets might have been eliminated from the initial dataset due to a lack of precision in the geotagging process. Take all of that together and a lot of the detail that’s being celebrated here starts to give way to fuzziness. This map is more art than science, though the striking visuals and discursive framing give the illusion of precision and absolute insight. 

To be clear, there’s no problem with fuzziness. It’s something we all live with every day, it’s something we academics may embrace from time to time through the use of overly obtuse language. But taking all of this fuzziness and then repackaging it as the most detailed tweet map ever, comes off a bit wrong to us. These initial misgivings were only amplified when brought down to a more local level, when we saw a post from a local urbanist blog in Louisville wondering “What we can learn from where people in Louisville are using Twitter”. While relatively mundane, and certainly not nearly as celebratory, the blog’s ultimate conclusion was that "These locations [with the highest concentrations of tweets] make sense as they are places where people gather and are often held captive by events.”


This, in general, is true, but also a bit… how do we put it? Meh. More fundamentally, people tweet where people are. It comes as no surprise to anyone with even the vaguest familiarity with Louisville that people tweet in larger numbers from downtown (including 4th Street Live!), the University of Louisville campus, Bardstown Road and the St. Matthews / Oxmoor Mall area than anywhere else in the city. These are (some of) the primary gathering points on a day-to-day basis within the city.

But just identifying these locations doesn’t really help us to ‘learn’ anything beyond the fact that those are, indeed, the places with the highest concentrations of geotagged tweets in Louisville [4]. In fact, the map doesn’t even really show us actual concentrations of tweeting activity, but rather concentrations of unique tweeting locations. Take, say, two hypothetical city squares, one of just 50 x 50 meters, and another much larger one of 500 x 500 meters, both the originating point of one million geotagged tweets spread randomly over the squares. In Fischer’s method, these two squares would not 'glow' in equal amounts, but rather the larger square would show up as much more visually prominent because it has many more unique tweeting locations while many of the tweets from the smaller square would be filtered out due to a duplication of coordinates.

Further, from a data collection standpoint, all of these tweets in Louisville reveal little that isn't revealed by mapping a random sample of tweets (say 1% of tweets from 2013, see map below). If all we’re really concerned about is the question of where people are tweeting from, there isn’t much that looking at all the tweets reveals that couldn't also be found from a smaller subset, and it’s much easier to collect or analyze a few hundred thousand tweets than it is to collect 6,341,973,478 of them. But even still, all we can ‘learn’ from these kinds of maps is where people have created geotagged tweets and, to some extent, where they have not [5].


But if that’s all we can learn from this map, again, why call it the most detailed tweet map ever? Again, there are any number of details that are excluded from analysis by only looking at the locations of geotagged tweets. What if we instead took a different approach to this data, such as examining at the use history of individual Twitter users, or even collectives of Twitter users based on some kind of shared experience or identity, such as association with particular neighborhoods or other places?

OK, you're right. This particular question is a bit self-serving, as this is precisely the kind of thing we've been working on for some time now. And so rather than just offering a critique of someone else's work, we really want to see if we can push this kind of analysis in more productive directions. So we offer up the map below, which comes from a paper we currently have under review, that attempts to demonstrate how geotagged tweets can help us to better understand urban socio-spatial inequality beyond simply identifying the presence or absence of tweets in a given area, as is so often done.


Using Louisville and the now-common ‘9th Street Divide’ trope as a starting point, we sought out to understand how people from different parts of the city used and moved around the city in different ways. So in a manner not uncommon to some other things Eric Fischer has done previously, we identified a number of Twitter users as belonging to one of two groups, those with close ties to either the West End (traditionally a poorer and predominantly African-American part of the city) or the East End (a more affluent and largely white part of the city), and collected all of the geotagged tweets from those users [6]. We then compared the spatial footprint of these groups' tweeting activity via an odds-ratio measure. On the map areas in purple represent places with greater-than-usual levels of West End user tweeting activity, while orange hexagons represent places where East End users were relatively more dominant than expected. Those places which demonstrate roughly equivalent or expected levels of tweeting are signified by those hexagons with hashes.

This map, in short, represents those places in the city of Louisville which are more socially heterogeneous and homogeneous, dominated either by West End or East End residents, or characterized by a relative mix of people from parts of the city. Though it’s evident that there is indeed a kind of divide between the West End and the rest of the city, this map also shows that West End residents are incredibly spatially mobile within the city, while East End residents tend to be much more spatially constrained, sticking to their own parts of town.

While there are certainly a lot of underlying factors driving this process, suffice it to say that this map provides an alternative way of understanding socio-spatial inequality than simply identifying those places that do or do not have significant concentrations of geotagged tweets [7]. Through our analysis, we also learned that contrary to the kind of assumptions often made about this kind of informational inequality, West End users actually produce a significantly greater number of geotagged tweets than their East End counterparts, it's just that many of these tweets are created in other parts of the city. This is, of course, an important kind of detail that we can draw from the mapping and analysis of geotagged tweets and one that, in many ways, is more detailed than the most detailed tweet map ever.

There is, of course, a whole lot more detail in the paper that this one map and blog post can’t capture, just as is the case with Eric Fischer’s map. Just to be clear, we think Eric Fischer does some fantastic and beautiful work with geotagged social media data, and commend him for openly discussing and sharing his methods. And yet, we can’t help but feel like the characterization of his map as being the most detailed tweet map ever is at best a half-truth, and helps to reproduce some of the most common problems with the analysis of geotagged social media data. But the more we think about it, we’re not so sure that a single most detailed tweet map could exist, or that it’s even desirable to have such a thing. Instead, we should be striving to create any number of highly-detailed, geographically-situated tweet maps, that collectively contribute to better understandings of the complex social and spatial processes that are represented and reproduced through this kind of data. 

----------------
[1] That’s the royal we. 
[2] Which it most certainly is.
[3] As Fischer notes, there are actually no more than about 590 million dots on the map due to his filtering process. When one zooms all the way out on the map so that the entire globe is represented in a single map tile, there are only 1,586 visible tweets, a far cry from the 6 billion number that seems so, well… big.
[4] #tautology
[5] This is qualified in this way because, as Kenneth Field pointed out in a Twitter exchange with Eric Fischer about these maps, geotagged tweets that he has consciously created from his house do not appear on the map. So while we know that all of the tweets on the map were created in that place, we can't say definitively that tweets were not also created in places where they do not appear on the map.
[6] In order to do this classification, we collected all geotagged tweets created within the defined boundaries of these two areas, and then identified those users with more than 40 tweets within either area, where those 40+ tweets represented greater than 50% of their overall geotagged tweeting activity. This concentration of activity indicates that users had a strong association with, and presence within, either area, while also making sure that no users were identified as belonging to both areas.
[7] We also see this map as complicating the conventional narrative in Louisville of 9th Street as representing a kind of impenetrable barrier within the city. But since this is less directly relevant to our argument here, we'll make you wait to hear more about that particular line of reasoning.

July 19, 2010

Obesity, Beer and Christianity: Or Correlation does not equal causation

One of the basic rules in statistical analysis is that correlation does not equal causation. But in the hot days of a Kentucky summer one often gives into temptation, especially if the graphs look good.

We therefore leave it to our readers to jump to the unsupported causal relationship. Sorry, you'll have to work/think through this one yourself.

Y-axis: Percentage of a state's population that is Obese
X- axis: Number of Placemarks with
Keyword Beer / Total number of Placemarks

Bivaritate correlation (-0.45)


Y-axis: Percentage of a state's population that is Obese
X- axis: Number of Placemarks
with Keyword Christianity / Total number of Placemarks
Bivaritate correlation (0.729)



Although the nature of the graphs invite one to believe that Christianity is somehow responsible for obesity this is no doubt a spurious correlation. It is well known that obesity and religious practice are strongly related to income. One can see this in which states are clustered at the extremes.

Why places with a high percentage of beer reference are less obese is a bit more difficult to explain.

Don't worry, we have more. We particularly like relationship between placemarks with the terms falafel and feminist.

June 09, 2010

2010 Internet Penetration Rates

Today's post comes courtesy of data available from Internet World Stats. The map below presents the most recent statistics on global internet usage. The shading reflects the proportion of the population that uses the internet within each country. The height of each bar indicates the total number of internet users in each country.

Iceland has the world's highest penetration rate: over 93% of the population are internet users. Almost all of Europe and North America also have relatively high rates (at least at the national scale, as there are likely to be significant digital divides in every country). China, interestingly, is already home to the world's largest population of internet users (384 million) despite having a penetration rate of less than 30%. India is another interesting case. 81 million Indians are internet users (there are more Indian internet users than there are people in the UK), yet this represents only 7% of the Indian population.

June 03, 2010

International Internet Bandwith

Today's map displays international internet bandwidth globally. "International bandwidth" is another way of referring to the contracted capacity of international connections between countries for transmitting Internet traffic. These data are kindly made available from the World Bank's new open data initiative.

Like most other geographies of Internet-related data, the patterns in this map are highly uneven. Countries in northern Europe generally have the most available kilobits per person. The Netherlands has 78kb per person, Sweden 50kb, and the UK 40kb. A number of micro-states and small nations also score highly on this measure: Hong Kong (not displayed on the map) has 315kb per person, Singapore has 23kb, Antigua and Barbuda has 17kb and Panama has 16kb. Surprisingly, the United States has fewer available kilobits per person than any of these countries (11kb).

At the other end of the scale, there is a long-tail of countries in Africa, Asia and South America that have less than 1kb per person. Guinea, for instance, has only 0.21 bits (0.00021kb) per person (our next post will focus specifically on bandwidth in Africa).

These data seem to mirror the geographies of content at the global scale, a topic we plan on exploring in much more detail in a future paper.

March 08, 2010

References to Slum, Ghetto and Poverty

Building upon our earlier maps of rich and poor, we were curious whether there was much difference between user generated references to slum, ghetto and poverty.

As the global map below indicates the differences seem to be primarily based on language with English speaking countries (U.S., Canada, the U.K. Australia and New Zealand) where references to poverty dominate.

Global Map of Slum, Ghetto, and Poverty
This difference becomes clearer as one zooms into the European regional level where references to ghetto appear to be most prevalent in non-English speaking zones. It is likely that this is tied to ghetto being a more internationalized term than poverty and thus shows up more outside the Anglophone world. But overall one can see that these particular search terms are not used ubiquitously across language groups, highlighting again the importance of using non-linguistic keywords for search, e.g., the number 1, or words that are generally unchanged across space. For example, maps of the names of well know international figures like "paris hilton" [1] or "osama bin laden" (this is probably the first time those two have been in the same sentence!).

European Map of Slum, Ghetto, and Poverty


Looking at North America therefore is helpful as it represents largely English speaking (apologies to Quebec and Mexico). While it is clear that there are more mentions of poverty than either slum or ghetto there are some intriguing patterns.

North American Map of Slum, Ghetto, and Poverty

For example, places where references to slum are the most prevalent are relatively rare but do seem to correspond with poor areas such as Watts in Los Angeles and some neighborhoods in Philadelphia and New York. The term ghetto also appears to be most frequent in urban settings (although not all) with the cities of Tampa, Gainesville, Atlanta, Dallas, Houston, San Antonio, Phoenix, Oakland and Sacramento representing clusters.

Since the term poverty greatly overshadows occurrences of slum or ghetto we also generated a map which just those terms. It is not clear why these differences are here but may simple point to regional linguist preferences with the U.S.

North American Map of Slum and Ghetto
Again, this mapping does not signifying a particular economic fortune in any one area but the prevalence of an array of terms associated with economically disadvantaged areas. Still it produces some intriguing patterns.

[1] Which of course bring up its own problems in the city of Paris and the Hilton Hotel. Maybe we should try Nicole Richie instead?

March 01, 2010

Rich and Poor Placemarks

So what happens when you search for user generated placemarks containing the words rich and poor? We didn't know but now we do.

Overall the world of user generated data seems to be a fairly rich place. Which is not altogether surprising since the ability to even create a Google placemark (access and ability to use a computer) suggests a certain level of affluence in a world where half the population lives on less that $2.50 a day. That's one reason why much of the globe doesn't have any placemarks at all.

Global Map of Rich and Poor


So it makes most sense to more closely examine East Asian countries of Japan, South Korea (note the clear difference with North Korea) and Taiwan are mostly spotted with "rich" placemarks. Likewise in China (which doesn't have many placemarks in general, a topic for another posting) "rich" is associated with the wealthy coastal regions such as the economic powerhouse of Shanghai, Fuzhou and Ghangzhou.

East Asian Map of Rich and Poor


Moving westward one sees that Europe is much more placemarked (is that even a word?) in general than Asia. But within this, there are interesting patterns as one moves south, east and north from the historic core of Europe. France, the Benelux countries, Germany and Italy systematically have more placemarks referencing rich than poor. But as one moves into the areas of Spain/Portugal and Greece/Turkey, the pattern becomes more varied. There are both fewer placemarks in general and those that do exist are more likely to have references to poor. Perhaps the most striking example is Britain with the core region around London tagged as rich and as one moves northward there is an increasing amount of placemarks referencing poor.

European Map of Rich and Poor


The pattern in the North American context is much less clear. One can see the Northeast (stretching from Massachusetts to DC) is primarily tagged as rich. This tendency toward rich is mostly maintained along the entire coastline. Moving inland, the patterns become much less clear, with the rest of the country seeming to be a nearly equal combination of rich and poor.

U.S. Map of Rich and Poor

February 10, 2010

Where Users Like to Vacation

Over the past few months, we've published a number of maps showing the automatically- and user-generated online representations of place, from the seedy to the holy to the hoppy. Perhaps you've found yourself thinking, "I'd sure like to go there!", wherever there may be. So where exactly is it that people want to go?

The following maps show the incongruities between these automatically- and user-generated representations of place when searching for "tourism" and "vacation" in Google Maps. The values in each of the four maps were normalized using the national average for each search term, with any points not 20% greater than the average (indexed value >1.2) being excluded. These maps thus specifically show the places in which there is a higher-than-average concentration of placemarks (either user-generated or directory) mentioning the words "tourism" or "vacation".

Tourism: Directory

Tourism: User-Generated

Perhaps the starkest contrast between these maps of tourism is the much smaller number of user-generated placemarks as compared to the automatically-generated directory placemarks, usually drawn from pre-existing sources like the Yellow Pages. In moving from directory to user-generated representations, almost all rural locations disappear from the map, although the vast areas west of the Mississippi River with no information at all show that even some urban areas don't possess larger-than-average amounts of tourism-related information.

Vacation: Directory

Vacation: User-Generated

Shifting our attention to searches for "vacation", it is interesting that in this case, user-generated representations still have considerable coverage across the United States. Moreover user generated references to vacation differ from the "official" map of vacation based on Google Maps directory listings.[1] That is, "vacation" shows up most often in New York City in the Google Maps directory but user-generated representations show that Orlando, Florida, the home of Disney World, is the place to go on your coveted break each year.

God help us all.

Take note as well, that coastal areas all across the United States are prominent in the peer produced constructions of vacation, from the coastal Carolinas and Georgia to the Gulf Coast, and even throughout California, Oregon and Washington. So perhaps there is hope of eluding our mouse overlords after all.

Most importantly, these maps call our attention to the significant variances in how place is perceived online, depending on what measures are being used to represent these constructions. Even if it's possible to dig a hole through the planet on Google Earth, the difference between, and within, places remains as important as ever.

[1] This is also one of the few cases in which the maximum value in a map deviates from one of the nation's largest urban areas.

June 22, 2009

Information Inequality

Following on from the last post, here are some examples of Google placemark inequality:

First of all, China offers perhaps one of the most striking examples of regional disparities. Beijing, Shanghai, and the Pearl River Delta Region all are characterized by heavy information densities. In other words, a lot of information has been created and uploaded about these places. However, much of the rest of the country has very little cyber-presence within the Google Geoweb. In the map below, the height of each bar is an indicator the number of placemarks in each location.


The U.S.-Mexico border along the Rio Grande river offers a similarly striking contrast between high and low information densities.


The border between North and South Korea offers another example of placemark density not being correlated to population density. For obvious reasons, very little information is being created and uploaded about North Korea. In the map below (top), each dot represents 100+ placemarks. Interestingly, there are strong similarities between the map of placemarks on the Korean Peninsula, and satellite maps of lights visible from the Peninsula at night (bottom).


image source: globalsecurity.org

Information inequalities are clearly a defining characteristic of the Geoweb. Some places are highly visible, while others remain a virtual terra incognita. In particular, Africa, South America, and large parts of Asia are being left out of the flurry of mapping that is happing online (e.g. the Tokyo/Yokohama metro region has almost three times as many 0/1 placemark hits (923,034) as the entire continent of Africa (311,770)). Some of the geographical implications of cyber-visibility and invisibility have been examined in part (e.g. here and here), but there is clearly a lot more to be discussed. In particular, because Google allows any keyword to be searched for (not only "0" and "1"), we are able to explore not only the raw amounts of information attached to each place, but also the contents of that information.