Data has been called “the new oil,” a comparison that emphasizes how data functions as the raw material that drives the digital economy—and, consequently, much of twenty-first century life. But unlike oil, data does not exist in a natural state. Even before a dataset is first collected, it is influenced by people—often with social and political agendas of their own. This course thus examines how information becomes data, in terms of its technical requirements as well as in terms of the social and political contexts that surround it. Through a series of examples, accompanied by readings from the emerging field of data studies, we will explore the significance of data, past and present. We will also explore how visualization has been employed in order to enhance data’s social, political, and rhetorical force. We will work towards final projects focused on visualizations of Georgia state data, using the recently rediscovered visualizations of W.E.B. Du Bois as our point of departure. Through classroom discussion, lab exercises, and several guest lectures, we will emerge with a deeper understanding of the power of data, as well as its constraints. 

In Borgman’s, “What are data?” he discusses data by describing it different than a natural object, but existing in a context. The data takes on meaning from the context from the observer’s perspective. He attempts to describe data in theoretical and operational terms. Apparently, the term is meant to be broadly inclusive and typically includes minimum, scientific monitoring and is generally input to the research process. According to Uhlir and Cohen, it can be created by people or machines and there is a relationship between it, computers, models and software. The concrete definition is that data is information formalized in a manner suitable for communication and processing. There has not been a consensus on the definition of data because it is not a pure concept. “The most inclusive summary,” according to Borgman, “is to say that data are representations of observations, objects, or other entities used as evidence of phenomena for the purposes of research or scholarship.

Loukissas further explains that data is not singular. He states, “’all data are local’ is grammatically correct.” To properly acknowledge of the human production of date, Drucker notes how we should reconsider our use of the term. Using data in plural form acknowledges how it arises different circumstances worth studying. Apparently, all data is local as well. It is universal and invariable. I believe in this manner he emphasizes that communities uniquely determine what counts as data, outside of scientific and engineering work. Overall, the point Loukissas wanted to make was that local does not mean lesser but rather describes it as grounded knowledge practices.

I’m sure we’ve all used at least one, if not multiple, voice assistants in the past decade or so. Alexa is becoming increasingly popular in homes and Siri continues to intrigue curious young minds playing with their parents’ iPhones. Ultimately, all this interaction is only beneficial to the companies rolling these products out especially because these companies are training voice models using the samples their users ‘generously’ provide every day.

Every time a request is made via any platform, the sentence/phrase is decoded. Action words are located then using the company’s model, this request is mapped to a response said back to the user. But the work doesn’t stop there. If this response is what was intended, usually either the user replies “thank you” or “okay” or doesn’t respond at all. With invalid responses, however, the voice assistant is usually stopped and the question asked again. Machine learning helps the model understand when a current mapping is invalid and needs to be adjusted.

This voice model is also important for natural language recognition. I remember when Siri was first rolled out, it never understood anything said in my accent. But with machine learning and massive data collection, it has been trained to completely understand and act on everything I say.

We have even gotten to a point that Google claims it can differentiate between different users after collecting just three spoken sentences from each user.

My question is with all this data refining and voice model training, who now owns the data? Borgman mentions in “What are data?” a primary source of data is the original document, in this case multiple voice samples from billions of users. While a secondary source would be any analysis or later work done on that entity, in this case the trained model. Do the users deserve access to this trained voice model or do these companies deserve sole access to do as they please because they developed the algorithms used?

One one hand their algorithms would be vastly inaccurate without billions of voice samples to train their model on. While on the other hand, our voice samples would be useless when they aren’t being contributed to the voice services they provide. They need us just as much as we need them.

I’ve been working part time on campus at the Human Resources Department for nearly a year. While being a student employee who works on their website, I’ve become very familiar with Google Analytics and the multitude of ways data can be collected. As we continue our class discussion on data, I realized that Google Analytics is the perfect thing to talk about as it’s sole purpose is to gather data and relying on the user to find use in this data. As a little background, Google Analytics is an online resource that can be used with a created website to gather data on all of the pages within that site. Data that it can gather includes: Page views, demographics of viewers, clicks on different sections of a page, things that are being searched on the site, and more. With our discussion on the Borgman piece, we talked a little bit about how data shouldn’t be wasted and how it’s a source of friction. Google Analytics is as useful as you make it. It provides so much data, and there have been many times where I’ve found trends based off data I didn’t think I needed. With a lot of this data, I’m able to make improvements to the Human Resources site. The more data that’s collected, the more ideas I’m able to come up with, based off that data, to make those improvements. Overall, data has been able to support a lot of what I do at my job, and with Google Analytics specifically, I’m able to collect data and make it meaningful. As mentioned in class, data is useful is you make connections from it, and I believe if you’re given enough data, these significant connections will be made. Below is a picture of Google Analytics (not my picture):

Image result for google analytics

As 2017 wrapped up, I began to see the same graphic on many of my friends’ Facebook and Twitter feeds. Over and over, I saw how many minutes of music Sally had listened to and what Bobby’s favorite artists were; all of this was information collected by the very popular music streaming service, Spotify. My first thought was: how many minutes of music have I listened to? My second was: they can see what music I listen to? 

Well, of course they can. That’s how they track which artists are trending and which are falling behind. It’s how Spotify can use algorithms to recommend music that it predicts we will like. None of these things could be done without the collection, storage, and processing of data that I provide to them. In return, they provide me with music for free and eerily correct playlists that predict what I will like.

Then, I began to see ads that used the data collected from all Spotify listeners. The ads  were eye-catching and entertaining. I included one below:

This seems harmless, and in all likelihood the user in question got a big kick out of their goofy listening habits being displayed to the world. However, I began to think about the other ways that we may be singled out for the trails of data that we leave behind us. There are all kinds of corporations and agencies monitoring our online activities for abnormal behaviors. Online cookies generate ads for the websites we visit, buzzcards track which buildings we enter and when we swipe to get a meal, and the NSA trolls for terrorist-like activities. It is important to keep in mind that in this age, our activities are rarely seen by only ourselves. We live in an age of transparency and accountability, for better or for worse.

Recommender systems are technology that takes a user’s interests for items such as products, movies, events, etc. and recommends similar matches through an algorithm. Recommender systems can be found in almost any site now, some more obvious than others. Major sites such as Amazon, Facebook, Instagram, and Netflix are commonly known for this feature but we seldom pay attention to how this data is collected. Libby Plummer explains in her article how Netflix’s recommendation system works.

Every time an account is logged into, the time, length of activity and watched item is gathered. Each user is then placed into a few of thousands of different types of taste groups that make up the 80 percent of the TV shows people watch on Netflix. Plummer compares this gathering of data to a three-legged stool. The first leg consists of data regarding what people watch, what is watched after and before, previous years viewings and the time of day things are watched. The second leg combines the first leg with tags associated with every minute of every show weighed in the importance of behavioral data. The final leg is the choosing of the taste community that in turn suggests shows for users that they could not find on their own.

The data that Netflix uses is categorized as implicit or explicit. Implicit data can be more commonly referred to as behavioral data. Explicit data, on the other hand, is what you tell Netflix; so when you give a show a high rating or a thumbs up, the algorithm knows how to factor other shows into your feed.

Borgman discusses such implications of  data processing levels in “What Are Data” by stating “Data streams from an instrument may be processed multiple times, leading to multiple data releases.” Netflix’s three-part data analyzation demonstrates the ways in which data processing can affect the outcome. Their system is affirmed by users’ engagement with the recommendation system and is improved by the explicit data received after the processing has been done.

To read Plummer’s article and get more details on how recommender systems like Netflix use data, visit http://www.wired.co.uk/article/how-do-netflixs-algorithms-work-machine-learning-helps-to-predict-what-viewers-will-like.

Charles Duhigg is a journalist and the best seller author of the book called The Power of Habit. I recently found Charles Duhigg website and there was a video of him talking about Target and their usage of consumer data. The video itself focuses more towards how consumer habit was important to the company, but regardless of the habit part, what Duhigg mentioned also clearly showed the relationships among data, company, and consumers.

Target gathered the data about what their customers buy, and they used this data to figure out who the pregnant consumers were. By knowing the pregnant consumers, Target wanted to be ready to advertise to future parents because Target knew since people with babies are tired, they tend to buy everything in one place. Basically, the motive of this project was: by figuring out who the pregnant women were, Target tries to get them inside the store to buy the baby products, which will lead them to end up buying everything else they need in Target.

Target was successful at figuring out the future mothers with their customer data, which show what people buy. For example, pregnant women, in about their second trimester, buy very large quantities of lotions compared to other people. Moreover, women with pregnancy in about their 20th week buy lots of vitamins. There were 25 other items that Target analysts could find that relates to pregnancy.  

For more information about this, you can visit Charles Duhigg’s website, http://charlesduhigg.com/target-knows-your-secrets/

Christine Borgman states in “What Are Data” that data “exist in a context, taking on meaning from that context and from the perspective of the beholder.” This Target example clearly shows how data were viewed from the perspective of Target analysts and marketers. Moreover, it shows how people utilized the raw data into the data that they can actually use or make profit out of. With Borgman’s “Provocations”, we talked about the openness related to data. Borgman writes, “Openness is claimed to promote the flow of information, the modularity of systems and services, and interoperability.” In Target’s viewpoint, this openness can clearly hinder the purpose of their project, which was to increase their profit by figuring out the pregnant consumers. If they were to open to public about this issue, customers would feel uncomfortable, and this would have hindered Target from making more profit. Despite Target’s viewpoint, do you think, as a consumer, you have the right to know what the companies are doing with the data they collected from you?

The project I want to introduce you to is called “The Fallen of World War II”. It is an interactive project that describes itself as a “data-driven documentary”, which were presented to me as part of a different class. It is a visualization of the fallen of World War II, as the name suggests, and the decline in battle deaths in the years since the war. The data is presented in the style of an interactive info-graphic, which is both moving and visually pleasing (that my screendumps do not really do justice).

As numbers of war casualties hardly faze most people today, either because we have heard so much about it, or because the numbers are too abstract to us, this documentary uses data visualizations to add meaning to these figures and compare them to the human losses of other world conflicts. Something it does extremely well, in my opinion. Each little figure in the above picture symbolizes 1000 lives. This screendump is taken fairly early in the video, where it was only counting soldiers – and only some of them.

Throughout the 18-minutes long video, it combines human-shaped cutouts stacked together, seemingly old pictures as well as a narrative storytelling to get its message across; that war has horrible consequences.

Besides being a fresh take on a pivotal moment in human history, of particular relevance to this course is the data it uses.

It is data of casualties in WW II, but usually “casualties” are the numbers of both hurt and dead people, according to the narrators of this documentary. In this documentary, it is only counting deaths, which could be due to a number of reasons. One reason could be that it is simply easier to find; there might be be more data on deaths and more places to find it. Another reason could be that only counting deaths is an attempt to ensure, to the best of their capabilities, that the data is not faulty; I imagine that writing “burn injury” on someone’s journal could come from a war, as well as from playing with matches. This project resonates with Borgmans argument about how data needs to be activated in order to be valuable; without them having set up the info-graphics the way the did, the explanations and cross-referencing to other wars, their attempt at showing the horrors of war, exemplified in WW II, might not have been as effective. That said, they have most likely picked the data that best supported the narrative, they wanted to create.

An interesting aspect of this documentary, and something I have not encountered before, is that they update the info-graphic and the data regularly. By constantly keeping their data relevant and accurate, it is not just a documentary that was of interest in 2018, but potentially might be relevant in 2038 as well.This is an interesting way to try to prevent the data from disappearing or losing value; by keeping it updated, I imagine it can be used in various ways by other scholars and studies, both now as well as in the future.

You can view the documentary here

FiveThirtyEight, “Data Is vs. Data Are

Is “Data” Singular or Plural?

In the essay, “What are Data?”, Borgman starts by tracing the origin of the word “data”.  The word itself was first used in the early 18th century.  As Borgman stated, “data” meant “set of principles accepted as a basis of an argument” or “facts, particularly those taken from scripture” (Borgman, 1).

However, since the 18th century, several things have changed. As technology, communication, and archives have evolved, so have data. We have rules for who can see data, standards on how to collect data, and specific adjectives to describe what type of data. Furthermore, the interpretation of data evolves with different perspectives of the data collectors, different audiences of the data viewers, and the time period and span of the data itself. Or simply put: data is never straightforward.

With these complications, we need to reconsider whether or not data can be defined in the same terms as in the 18th century. Gitelman and Jackson argue, in her essay “Raw Data is an Oxymoron”, that there is no such thing as “raw” data (shocker from the title). Just like in the 18th century, we assume that data is the foundation of fact. It’s evidence. It’s truth. However, Gitelman and Jackson warn specifically about the “interpretation” of data.

In an academic sense, interpretation of data can change based on the standards and methods in different disciplines of study. However, we shouldn’t ignore how our human bias can change our interpretation of the data. This warning seems particularly relevant in the age of “Fake News”. We’ve seen how rhetoric can be molded to fit a specific argument or make specific claims. What’s scarier, is to think about how data, what we previously thought of as truth, can be molded too. As we move forward in a class about how to represent data, I hope to learn more about ways to represent data through accurate, open-minded interpretations and ways to recognize data that’s been interpreted through manipulated lenses.

 

-Emily Bunker

In chapter one of Big Data, Little Data, No Data, Christine Borgman asks the question, “What are data?” This is the very sentence that popped in my head when I decided to register for this course. “This class sounds interesting, but what the heck is data anyways?” Even after reading these scholarly articles, I feel as though I am still faced with this uncertainty. Is it sad to say that I still do not understand what this course truly means when discussing “data”?  I know that data is pretty much a collection of information, facts, and statistics. However, why are we discussing data in the first place? What is the significance of this class and why should data matter to me? Are we going to study how to analyze data more efficiently? Are we just learning the power of data and how it affects the world around us? What is the key objective of Technologies of Representation? I know without a doubt that I will retain something in this class, but it is still not clear what that “something” will be. It seems as though the topic is extremely broad and the articles are constantly leaving a vague taste in my mouth.