Phil Harvey is chief technology officer and a founding member of DataShaka, a London (UK) based data technology start-up, offering a new kind of analytics database built for flexibility first. Phil is also an inventor of a patented method of processing time-series data. He is a passionate technologist, data engineer, published fiction author, international keynote speaker, and has modeled for a Japanese fashion brand. Phil has a Bachelors in Artificial Intelligence with a quarter of Japanese language, culture and society in the school of cognitive and computing sciences [now Informatics] at the University of Sussex.
ML: If you’re on a plane and you have to introduce yourself, what would you say?
PH: “Hi, I’m Phil.” If they ask me about the work I do, then I’d say I’m a CTO of a startup that I helped found.
ML: Now, you’re off the plane, you’re in an elevator, and have to do an elevator pitch of your startup. What would that look like?
PH: We want to make data for analytics easier. In doing that, we have developed the Katsu Analytics workbench, which is a platform for time series-based analytics. At its heart, as a technology, it is a very, very flexible time series database accessed through API.
ML: Where do you see the biggest potential in your business?
PH: Too much focus is put on applications of data and this has caused a very negative cycle within data systems, which builds large change footprints and limits people with long lead times for change. So, if you put flexibility at the heart of a data system, and you really focus on dealing with data in a flexible manner, than, as a whole, your business will be more successful. The opportunity within that is to redefine the way that we work with data, not just through the volume of the data process, but to think about the data that’s being processed in the system.
ML: What do you think are the biggest misconceptions about working with data?
PH: The biggest misconception, at the moment, is that it is easy because of all this technology. That is fundamentally not true. Data, by its very nature, is hard, because data is generated by people and systems, and each have their own perspective, needs and desires. Trying to bring that data together for another purpose is difficult. It is difficult for technological reasons, it is difficult for big data reasons, and it is difficult because of a fundamental lack of empathy that people have. Most people think that it is purely an application-driven world and they forget that data will outlive any application that it is used in.
ML: This might be the first time I have heard empathy and data in one sentence. Can you elaborate more?
PH: Well, that you say that is kind of the point. People that I have met initially approach data as a technological challenge. They see an opportunity, simple analytics, or some form of data when they approach an application, and they think that the problem is building the technology. That is not going to be the problem. The problem is going to be people. The main difficulties people face in data are mostly generated by people. One of the key techniques and tools available to us, which isn’t technology-driven, is human empathy: The ability to understand the needs and feelings of others. If you lack practiced empathy within the business of data, you will find it more difficult to be focused on people’s’ needs and understanding their perspective.
ML: What do you think separates exceptional companies from everyone else? By exceptional, I mean exceptional in their use of analytics.
PH: The majority of companies that start out with their first major data project as a focus will fail, and they need to learn from that, because the companies that succeed, and which I have seen be successful, have been on their third or fourth try. They have had to go through painful learning experiences for the data project to become successful. And this is true of anybody who appears, overnight, to have a magical data system. Data has been around since the dawn of computing.
The first computer used for business applications was a data processing application for payroll. People are still working on that. So, in some ways, it is all a massive waste of time, because we haven’t solved any of the fundamental problems that we deal with in business. Success always comes from those who have learned from their mistakes and are able to embrace people within the process. They have a solid understanding of the business-use case, and a business-use case from a human perspective. Nobody has been successful just because they have been asked to do 30 billion aggregates in one second. That does not define success. That defines academic achievement. The fact that you could do this doesn’t solve any business problem.
ML: In your opinion, what are the attributes of reports of data visualizations that inspire action or create change, versus data visualization reports that are created for the sake of creating a graph or a chart?
PH: It comes down to trust. Visualization, no matter how beautiful, will always fall down if the person receiving it doesn’t trust it. Trust is built from the data process behind the visualization, more than the visualization itself. Visualization can be incredibly simple. Building a basic chart in a spreadsheet can be more impactful to a business than spending six months and several million dollars in building out a skilled data visualization team. If the person trusts the data that they see in front of them, know where it has come from, understand its problems and can be motivated to make a decision on it, this will enable them to make that decision comfortably and to understand the results from making that decision. So, without trust, visualization is basically worthless. Trust is built through the data management process rather than the problem of visualization in particular.
ML: How long have you been in this industry?
PH: I have been working on data problems since 2008.
ML: What do you think are the biggest changes and differences in terms of where we are now versus where we were in 2008?
PH: The definition of big data wraps up those changes; what this means, fundamentally, is that you don’t instantly go straight to SQL anymore. That’s a huge difference from before. When people said data, they thought DataBase, they thought SQL and they thought about the SQL databases available. So, one of the biggest changes is that you don’t, or one shouldn’t, instantly think of SQL as the answer now, whereas pre-2008 you would. I think another big difference is that more areas of the business now think and care about data, so it’s a difference from, say, a marketing department getting information from their agency who just says, “We have done our job, here is one figure.” People want to see that data was used, and understand that data was analyzed in a deep, complex fashion so they can trust in the result. So, we have moved away from HiPPO, Highest Paid Person’s Opinion.
ML: Have you had any examples of organizations that you’ve worked with, where you felt like they misused the data, or had a great opportunity, but went in a completely different direction and failed at the exploration of data?
PH: One of the cases I have heard about is insurance as an industry. This industry has so much detailed information, but is not allowed to use it, so they end up just using whether you smoke or not for your life insurance because as soon as they start to refine their products around more actual use of the data that they have available, bias is called and they are unable to do that. I believe that data should be open and available, to be used and be discussed. When people are too closed off and too scared to use it, we see many more failures.
Use of data is a very interesting question because data and algorithms, in and of themselves, only highlight the abuses and inequalities within the world that we live in. So, if you take any example that people give of algorithms being abusive, such as the case of Facebook’s trending team being fired and the algorithm being let off the hook, suddenly the algorithm is producing bad invalid results. It is not the algorithm’s fault. It is never the algorithm’s fault. It is always the limitations of the data used in the generation of that algorithm that is at fault. The data contains the bias of the world. The algorithm exposes that.
Then, there are very sad cases where the individuals within an organization, in the data team, unconsciously or otherwise, have built an empire for themselves which resists change that the business as a whole needs. There are examples of teams run by one person, and that one person controls a key piece of data in a phone call, rather than any automatic source, which means that whenever the office tried to automate or find inefficiencies in the data process, that data process would always fail because there is an individual who, probably unconsciously, was gate-keeping the entire process and lacked empathy for the process, other team members or the business.
ML: If you could have a billboard anywhere in London, either to promote yourself, the services of your company, or deliver any type of message, what would you say?
PH: Data is hard. If somebody tells you it isn’t, they are lying.
ML: When you say data is hard, what does it really mean? We have IBM, Google, Adobe and many other organizations promising us that they have an answer to analysis, to attribution, to OmniChannel. What is difficult?
PH: I talked about how people telling you it’s easy are lying. I will let you make your own connection there. So, the problem is application over data. With some of the companies you mentioned, I know for a fact that they are wonderful data tools entirely limited by the scope of your own domain. So, if you go to Company X, with the best data tools available, and they only use data within the domain, they are missing half of the world. And as soon as you are missing something, you are not going to build up a true picture. So, for example, ‘attribution’ as a topic is such a wonderful nest of lies.
The fact of the matter is that data is fundamentally hard at its core. Most people end up buying selective information from vendors who call it data. Truly managing and working with the data as it comes, you won’t get that from most marketing tools. You won’t be given access to the true, real data. It will be some processed form of information that they think people want. And the more that you think it is data, the more you are being lied to.
ML: You gave us a wake-up call about the difficulties and misconceptions that you see. What are you optimistic about?
PH: Lots of people think I am a terrible optimist. It may not sound like it in this, but optimism is my main mode. So, what am I optimistic about? I think that we have some fantastic data tools available. There have never been more tools for people to work with. I think the way people are working on the tools, in more open and collaborative ways, means that we are going to get better, faster, more flexible tools coming up in the future.
I think that people have solved the very common and easy problems of being able to process and trust another form of data to be able to generate visualizations more easily, in order to give people more self-serve access to data collaboration. I think that we are now starting to work on real, meaty problems. How do you bring the interesting data sets together? How do you build trust across a group of people? How do we fundamentally work with data and not only portions of information? Now that we’ve gotten through all of the low-hanging fruit, we can really start to work on the big, juicy apples on the top of the tree, and I am seeing that starting to happen. And, if that is happening, we are going to get increasing power out of the applications.
So, whether it is machine learning, which is only as good as the quality of the data that you put in, or whether it is visualizations, which are equally only as good as the data you put in, people are working on data processes and systems under the surface. I think that there is going to be a second wave of explosions of data; I think the first one came when people started talking about embracing it. The second one is coming. We are going to start to see magical, amazing things happen as people start to produce technology which works on real problems, not just on the surface.
ML: Looking ahead, are there any resources, documents, or projects you are working on that you would like to share?
PH: Currently, the next thing is I will be speaking at TDWI, in San Diego, in October. In terms of projects, always come along and see what DataShaka is doing. We would love to hear from you. But also, I think, get in touch if you just want to talk about data. What really excites me is when I have really solid conversations with people about data. What I love to hear about, if anyone wants to get in touch with me on Twitter or something like that, is: Let’s open up the data management box. Let’s open up working fundamentally with data. That is going to be exciting and I think that is yet to come. I am pushing as hard as I can, but finding other people who will work with me on that will be really exciting.