Living in Data by Jer Thorp

Adam Marks
8 min readApr 13, 2023

For whatever reason, this book was a bargain basement find at my local bookstore, but lucky me because although I had never heard of Jer Thorp before I picked up his book, by all accounts he is a very reputable and experienced source of knowledge regarding all things data. Living in Data is labeled as “a citizen’s guide to a better information future”, and Mr. Thorp writes about his past decades of experience as a data researcher, engineer, team leader, teacher, first data artist in residence at The New York Times, innovator in residence at the Library of Congress, and National Geographic explorer. His wealth of experience in the field inform his layers of advice and conclusions, namely that we need to create a world of data that is more livable, more respectable to those that we are collecting it from, and that we need to better understand the imperfections and bias of data we are looking at. Further, in his words, we need to “treat the data and the systems it is lived in not as an abstraction but as a real thing with particular properties, and understand that how data is collected and stored deeply affects the ways in which it might be later used to make choices”. Mr. Thorp is a deep and curious thinker, and his stories of being on the front lines of data collection all around the world really make an impact on the reader, as the book doesn’t drown at all in too many facts, statistics, and yes, data. Rather, Mr. Thorp emphasizes the visualization of the data he has collected, and makes the case that although human beings have created the current systems of data collection, human beings also have the power to change them for the greater good. A lofty goal, perhaps, but one that we simply need to re-envision so that the proper information and data reaches the people that truly need access to it. My notes below reflect many of Mr. Thorp’s summaries and conclusions from his excellent book.

  • risk is something that does not affect people equally or at the same time
  • question farming: using visualization to unfurl its complexities in interesting ways, exposing things that weren’t there before able to be seen
  • book is a guide for those who have to live in data and for those who want to create data worlds that are more livable
  • closing the loop between data and the people from whom the data comes — critical theme of the book as well
  • data is never perfect, never truly objective or truly real
  • measurement, a document
  • data comes from us, but it rarely returns to us
  • from us, but it isn’t for us
  • to live in data is to be used, to be without agency, and to be overwhelmed with complexity
  • we live in data but have not yet figured out how to be citizens of it
  • participate in it, find agency in it, actively engage and meaningfully resist
  • “data” first appeared in the English language on loan from Latin
  • “a thing given, a gift delivered or sent”
  • we can map the unique relation between a pair of words and a different pair of words
  • using a neural network
  • data is reductive
  • authorship, after all, is not only what is created but also what is selected
  • “capta”, from the Greek word “to take”, rather than “to give” — Drucker created a new word for data
  • knowledge is something that is constructed, we accept the truth of our own role in its creation
  • data about anything as much a record of the human doing the measuring as it is of the thing that is being measured
  • what if data was a verb?
  • data is not collected and then left alone: used as a substrate for decision making and as an instrument for differentiation, discrimination, damage
  • data is not found, it is constructed
  • we make it ourselves
  • a human artifact
  • data’s construction acts in a real way on the world, that in making data we change the systems from whence it came
  • need to be able to use it as a system and a process
  • treat the data and the systems it lived in not as an abstraction but as a real thing with particular properties, understand these unique conditions as deeply as the author can
  • how data is collected and stored deeply affects the ways in which it might be later used to make choices, and tell stories
  • very decisions of what to collect or what not to collect is political
  • mismatch between incentives and resources
  • burden of collection can be higher than the benefits that come from having it
  • has to be believed
  • some data resist metrification
  • data can be missing because someone didn’t want it to be collected
  • collection of data is an act of power, so is the decision not to collect it — census data
  • in 2019, Black people killed by police was 1,164
  • no correlation between the per capita violent crime rate and police killings in America’s fifty largest cities
  • policies are very different, depending on the police department
  • twin strategies, see what is absent, ask what is being collected but also what is not
  • recognize these areas of data neglect as places where we can find power
  • expedition is a word that rings with romance, and colonialism
  • crucial questions: who is in the data we are collecting benefiting?
  • how might the data we’re collecting put people, animals, ecosystems at risk?
  • do the benefits of data collection outweigh the potential benefits?
  • to go out and collect data from a place where you don’t live, or from people who aren’t your own, brush up closely to authoritarianism and colonialism
  • pair of principles for the author: no datafication without representation, listening carefully to those whom you’re collecting from
  • when in doubt, don’t collect
  • the act of collection does touch those who live in data
  • accrue harm over time
  • around the turn of the century the web slowed down, content got bigger, placement of advertising
  • cookies are local files stored on your computer that hold information about your online behavior
  • cookie stored by the New York Times contains 177 pieces of data about you
  • the ad targeting machine itself is biased, breaking the protections laid out in the federal Fair Housing Act and the Constitution — one study
  • the name “Loyalty Program” is a lie — point wasn’t to keep you coming back but to follow you when you went elsewhere: Air Miles programs
  • connection of a consumer’s activity across a number of different corporate terrains
  • data between data might be called interdata
  • Interdata are records or measurements that act as bridges between two data sets, like your SS number
  • all three of the most popular facial recognition tools performed worse on Black faces than on white ones
  • software industry is predominantly white and male
  • revenue from ads where a user’s cookie is available is just 4 percent higher than when it isn’t — one study
  • increase of earnings of just 0.00008 cent per ad
  • your information as capital, wielding it in aggregate as a bargaining chip for their customers
  • data is the new oil
  • amount of data that can be conjured from any given thing is almost limitless
  • data begets data, which begets megadata, repeat, repeat
  • quantitative data is often the only written record of people who never learned to write, such as slaves, the poor, and, in early America, many women
  • often the only extant information about all kinds of people who went about their daily business making no special mark on history — most people
  • true/false values are called Booleans
  • Boolean logic, the ways in which these true/false values can be added or multiplied or divided, is central to computer code
  • where data is trimmed or transfigured to match the expectations of the machine can be called schematic bias
  • computers run by executing a set of instructions, an algorithm is a set
  • series of tasks are repeated until some particular condition is matched
  • they can also be biased, coded to weight some certain value over others
  • magnifiers, metastasizing existing schematic biases
  • 95 percent of the 70 percent of the earth that is ocean lives permanently in blackness
  • most important thing about visualization as a form of telling: that it is a simple thing
  • data representation of any kind is a human act, full of human choices
  • at any given point in time, there are more than a million people in the air
  • kooky comparison: when a piece of information is held against another that purposely contrasts, forcing a viewer to think about both data in different ways
  • data, presented in new ways, might act at a critical time as a tether between our own human lives and the natural world around us
  • the word “advocate” comes from the Latin “advocare”, to call for help
  • is it possible that by letting data speak across larger scales and by listening together, we might better understand and address today’s urgent issues?
  • “Public” is any data that is available in the public domain
  • some kind of civic purpose
  • “open” typically refers to if and how the data is accessible
  • the internet as it exists is hardly a public space
  • today’s public data mostly supports technically adept entrepreneurs
  • Silicon Valley has pushed a particular libertarian agenda in which public spaces have little value
  • a particular group of people, living in a specific place, have their own unique experience of living in the data — all data are local
  • data sovereignty is the idea that the information stored in each of these data centers is governed not by the laws of the nation in which it is stored but by the laws of the nation in which it is collected
  • infrastructures of data are expensive and controlled almost entirely by large corporations or governments
  • we need to reenvision the way data is stored and told, to imagine farm to table data systems that are within the financial and technical reach of those communities that need them the most
  • the internet doesn’t care what anything is, it cares about where everything is
  • URL = universal resource locator
  • new web = dispose of the wherecentric machinery of URLs to a place that cares about what
  • Hashing is the process of reducing data of variable size down to data of a fixed size
  • old web is like a postal system, delivered busily from address to address
  • new web could be like stock market trading floor, announcing what it is looking for
  • no central server needed, data passes directly fro one “peer” to another
  • no third party server anywhere having touched it
  • “Dat” enables one scientists to send data to another scientist without having to involve any other servers
  • data sovereignty of self defined permissions
  • Assault on Privacy (book), Miller, captured benefits of data will always be matched with the risks
  • narrative about data systems will be in large part guided by the parties that benefit most by their deployment
  • book was written in the 70's
  • Algorithmic Accountability Act is promising because it addresses how data technologies are designed and built
  • data practitioners and the corporations that employ them have to commit to change not as a way to duck regulations but to become better actors, make more fair and equitable systems
  • Facebook was fined nearly 5B dollars in 2019, stock price rose 2 percent
  • diversify development and management teams
  • German citizens can access any information collected about them, whether it is by the state itself or any corporation operating in Germany
  • if we want to see change in our lives, we have to change things ourselves
  • see data’s imperfections clearly, a human thing, full of human bias, and also of hope and promise, error and neglect

--

--

Adam Marks

I love books, I have a ton of them, and I take notes on all of them. I wanted to share all that I have learned and will continue to learn. I hope you enjoy.