The Quest for Useful Questions and Big Data Answers with Fred Trotter
Open source medical software and security systems expert Fred Trotter will be joining forces with Health 2.0 EDU on Tuesday July 9th for the first class in the online summer course series, Big Data, Big Business. Fred will teach you how to model our changing health care system, and more importantly explain, what exactly is Big Data?
Fred Trotter: Big Data, was a term coined by my friend Roger Magoulas, who watches data trends for O’Reilly Media. The term had been used before to mean “lots of data,” but the use that Roger launched is thinking about Big Data as two sides of a coin: One is purely a problem, where the size and variety of the data become so large that dealing with those issues becomes a primary concern. The second is the new software approaches that we use to handle that problem.
There seems to be a consensus that a problem is not a Big Data problem, unless it has at least two of the V’s:
• Velocity — data coming too fast
• Volume — too much data
• Variety — many different kinds of data
• Variability — many different meanings within the data
The fourth has only recently been popularized, but I think it is critical in health care. When you have any of the first three problems, but you also have to deal with the fact that your data carries more than one specific meaning, then you can get into deep trouble.
The first thing to note is that this is the bad part of Big Data. I spend most of the time that I work with DocGraph (a big data set about the whole health care system) trying to ensure that it does not become a Big Data set. It is hard to underestimate the value of this approach.
The next part of Roger’s notion of Big Data is much more hopeful. This part focuses on a tooling and process strategy to gain value from data on this scale. This is what differentiates his use of the term from earlier uses. Roger is not just talking about the problem, but about the solution. Generally, the solutions to Big Data look something like this:
• Use algorithms to break a problem into smaller chunks that can be processed in parallel
• Use the cloud to scale processing power (i.e. rent your super computers)
• Use more than one kind of database technology to eliminate the weaknesses in any one given approach
• Use lots of different data mining and visualization techniques to gain insights
Why is big data central to, firstly- understanding health care, and secondly- improving it?
Trotter: Without Big Data there is no way to even begin to approach the handling of the complexity of the human genome and phoneme. Anytime we find “a gene for this” we are just lucky. Most of the time, complex networks of genes, proteins, hormones and other “omics” dictate how the body responds to a given environment.
The only way to approach that problem is to do a massive Big Data process that involves the merger of the current Big Data efforts on the cellular level, with new Big Data efforts to quantify and measure the human condition. These two efforts are merging now, and together they will represent the single largest data project ever undertaken by mankind.
What do you think are the possibilities for health data? Are there limits to what it can do?
Trotter: You are thinking about it backwards. There will always be limits to what a doctor can do for a patient. We need data to understand what those limitations are. Data that shows us what doesn’t work is just as valuable as data that shows us what does work. Sometimes the data is going to say “it’s too late to help here,” which sounds awful until you realize just how painful “help” at the end of life can be.
I much prefer a zen approach to data: The data is what the data is … thinking about it in terms of limitations, creates the limitations.
What is the biggest hurdle you think exists when it comes to working with health-related data sets?
Trotter: Both patients and doctors really have no idea how important the Big Data approach is. So most of the problems are cultural and not technical. As personalized medicine and other Big Data approaches continue to rack up results, clinicians will take the technology more and more seriously.
The irony here is that very soon, we will not be asking why doctors need Big Data. We will be asking why we need doctors, when we have Big Data. Take note, when our technology merits first names (i.e. Siri, Watson) there is a fundamental change that is approaching. I can tell you without hesitation that medicine is culturally unprepared for this change.
But then if doctors did not need to be dragged kicking and screaming into the future, there would be much less work available for people like me and my fellow “doctor draggers.”
What are you working on now?
Trotter: Well I am trying to fix HIPAA. http://www.medstartr.com/projects/188-hacking-hipaa
Generally, our Big Data tools for ACO formation and management seem to be doing well. http://notonlydev.com
And we will be making some announcements about new DocGraph data and projects soon: http://docgraph.org
Also, I am trying to learn how to grill vegetables.
What are you most excited about in regards to your upcoming Learn It Live class with Health 2.0?
Trotter: Its really hard to cover anything like Big Data quickly. I am having a short course that people who are smarter than me have gotten PhDs in. What I really want to help people understand is how to separate the hope from the hype. I hope to provide a dose of critical thinking about Big Data. There is just no way to provide the answers that quickly, but I think I might be able to help people understand what the useful questions are.