Susan Etlinger is an industry analyst who conducts independent research on data intelligence, analytics and strategy. She helps global 2000 companies to integrate frameworks and processes to enable practical, more efficient usage of data. We had the pleasure of picking Susan’s brain on data ethics, avoiding biases in analysis, the imperfections and limitations of data, and more.
1. Ethical data usage has to do with symmetry between the creator of the data, the originator of the data and the users of the data; it’s important to provide disclosure to the customer about data collection, and there needs to be a benefit for the consumer in collecting that data.
2. “Big data” is made up of three characteristics: it’s high-volume, it moves very quickly, and it has a massive amount of variety.
3. Companies should have a clear sense of how they’re going to use data so that they are actually solving for the right problem.
4. It’s important to weigh high-volume data collection against the cost of storage, the likelihood it can be breached and whether consumers could inadvertently be harmed in the process of collection.
5. Ideally, multiple voices should be involved in making decisions about data strategy. Everybody is an analyst now, and there are many different perspectives to consider.
ML: How did you get started in this field, and what path did you take to get to your current role?
SE: What’s funny is that I work in the world of data and analytics, but I have a background in humanities and literature. Since I lived in San Francisco after graduating college, a lot of the writing and marketing jobs I had were based in the technology field. Through that work, I discovered that I had a knack for translating technical language into terms that everyone could understand. Many times when we are looking at technologies, the people who are going to use the technologies are not necessarily top of mind, so I wanted to make data, analytics and social technology accessible for people in the business. I see myself as a translator more than anything else.
ML: I read your framework for ethical use of data, and I truly believe many organizations don’t prioritize or fully understand the concept. How would you define ethical data usage?
SE: I think that ethical data usage has to do with symmetry between the creator of the data, the originator of the data and the users of the data. If the consumer’s data is being taken without their knowledge or understanding, or if it’s being used for purposes that could be harmful to them, then it’s not ethical. Symmetry needs to exist between the organization and the individual surrounding mutual benefit, fairness and respect.
ML: Our industry has a strong desire to know more about people so we can offer more personalized experiences. How do we balance the need to collect more data versus collecting too much information and invading someone’s privacy?
SE: This has become such an enormous global issue because of the way Internet technologies were created. Data collection is ambient, and there’s a lot of information being constantly collected. We need new models for understanding that. And in order for CEOs to handle the issues and the needs of customers who are concerned for their privacy, the first step is to set out a framework to help us start thinking through these questions using principles from the Information Accountability Foundation (IAF).
ML: How would a company begin to implement those principles?
SE: First, go through that framework and start asking yourself questions. For example, “Is the way we collect data beneficial to consumers? Is it fair? Is it respectful?” By respect, I mean that you are letting people know what you are collecting from them.
You need to provide disclosure, and you need to make sure that there is benefit for the consumer in collecting that data. For example, let’s say we are a service or a product that relies on shopper data, and we want to collect information about the food that’s being stored in refrigerators. How do we let people know that we’re collecting the data? How do we let them opt out, and how do we ensure that they have a consistent experience? How do we ensure that there’s benefit for them while making sure that we don’t harm them inadvertently?
ML: Let’s shift a little bit and talk about your TED talk. How would you define “big data” in a way that high school students might understand?
SE: I’m going simplify what Gartner said, because I think it really is one of the most widely-accepted definitions, and it actually goes much deeper than I could explain in my 10-minute TED talk. “Big data” is made up of three characteristics: First, “big data” is high-volume data; there is a lot of it. Secondly, it moves very quickly. Finally, it has a massive amount of variety. (The “3Vs” framework – Volume, Velocity and Variety is taken from Gartner.) If you think about the way businesses use data, like when you input your name, address, account number, etc., it’s all pretty cut and dry. But then when you think about the kinds of things you do online–like when you post a blog, “Like” a video on Facebook, watch a video on YouTube, share a post, tweet about something, look at a GIF or click through a slideshow–the data that’s collected from all those different actions can be very different.
ML: Many organizations are struggling with working all this data because data sets tend to be very messy and imperfect. And in your TED talk, you pointed out that the reason for this is because it’s created by people, and people tend to be messy, generally speaking. Do you have any advice on how to overcome this problem?
SE: Right. Our ability to be precise is relative, of course. I think that it’s important to clean up the data and process it so the end result is actually readable and understandable. But maybe more important is understanding what you’re actually going to use that data for. For example, let’s say a customer bought a car from a dealer three to five years ago. What’s the likelihood they might buy another car from that dealer? We’d want to consider the consumer’s driving history, how old their car is, and even things like gas prices and unemployment rates. It’s important to have a clear sense of how you’re going to use that data so you’re actually solving for the right problem.
The second thing, related to data use, is that the more data you have in the databases, the more data there is to be breached, so you need a particular strategy when thinking about data as an asset. It’s really important to prioritize what we really need to be collecting. Data collectors often feel like the more data they have, the more analysis possibilities they can leverage. But at the same time, you have to weigh that against the cost of storing it, the likelihood it can be breached and also whether you could inadvertently harm consumers by collecting data about them that does not necessarily have a clear business case.
ML: As I’ve been doing these interviews, I’ve heard many experts like you who have said that goals and outcomes should be put in place before doing any analysis. But I’ve also heard other experts say that they don’t actually think about specific outcomes, because being less constrained and defined means that they’re more free to gather new and different insights from the data, which can still be very useful. Any thoughts on that?
SE: Right; I think that’s a really important point, and that’s why you need cross-functional teams looking at data strategies. If you have an analyst or a scientist, they don’t want to know too much about what the goal is, because they don’t want it to influence their thinking and their methodology, which is completely valid. At the same time, the business owner needs to understand and determine what they’re looking for so they can narrow the field of inquiry and get to relevant insights. On top of that, you also need to think about security and storage and etc., so I think it’s important to have all those voices involved in decisions about data strategy. One of the things that’s challenging is that everybody is sort of an analyst now, and there are so many different perspectives to consider.
ML: So, on that point of data analysis–in your TED talk, you discussed that data can be very biased and doesn’t always tell us the full story. So, how can we make sure that it’s properly analyzed? Do analysts look at the context? What things should analysts be looking at to make better sense of data?
SE: Well, that whole TED talk was really an argument for context more than anything else. We have way more data at our disposal than we’ve ever had before, which could be a good thing or a bad thing, and there are certain digital signals that are more meaningful than others. For example, if I tweet that I’m thinking of buying a car, I could be buying it in a day, in a month or in three months from now. If I do a Google search for cars after I make that tweet, then chances are that my intent is stronger. Then, if I click through an email that was sent to me about a car sale, my intent is even stronger, and you know I’m probably more likely to buy sooner rather than later. We get more interesting information and a clearer picture of what outcomes the data translates to when we start to pull those things together.
ML: Sometimes, it seems like those of us in the analytics field tend to strive to become better at tools like Google Analytics, but ignore improving the basic critical thinking skills that are so critical for this type of analysis. Do you have any recommendations for how data scientists and analysts can become better at critical thinking?
SE: Scenario planning is so important for this. When considering how you’re collecting and using data, I think it’s important to ask two basic questions: 1. What could happen as a result of using this data in this certain way? and 2. What does it mean to use data in this way? For example, remember the news about the teenager whose father found Target baby coupons addressed to her in the mail, revealing her pregnancy? It’s completely understandable why Target would want to capture people at the beginning of a pregnancy; it’s a genius way to create a very loyal customer for years. But who would have thought this scenario would have happened as a result of that data targeting tactic? We learn from these things, and in the digital world, we can fortunately learn from other people’s mistakes as well as our own. And on the flip side, what happens if we don’t take a risk to use that data in strategic ways? Would we be overtaken by a competitor, or would we perhaps even make customers unhappy in some other way? There’s the risk of doing and the risk of not doing, and I think we need to weigh the implications of both.
ML: Over the years of working in this industry, what has been one of your biggest “aha” moments?
SE: One of the things I’ve been really interested in over the past year is just how important the visual part of the web has become to our understanding. Images are the currency that crosses borders, because anybody can understand an image (barring differences in cultural implications). We are really just babies when it comes to understanding what images mean, how to interpret them, and how to try to predict their impact. That, to me, has been one of the greatest revelations of the past couple of years. We think we’re making progress on numbers and words, but images are going to just clobber us if we’re not careful.
2. Download new research from Altimeter Group – The Trust Imperative: A Framework for Ethical Data Use.
3. TED Talk – What do we do with all this big data?
4. Susan’s blog – Thought Experiments.