Building Fortune-Teller Bots for Healthcare

Since the dawn of time, people have sought and claimed to be able to predict the future by looking at the stars, a magic ball, or in the palms of our hands. We tend to believe those who do because there is a very obvious value to knowing the future: We can avoid crises and seize opportunities. Today, many companies have already established pretty accurate predictive abilities. For example, Netflix predicts which movies you’ll want to watch, Amazon knows which books you’ll like and Target can tell which items you will need even before you realize it. They have been doing it for years, and we are only starting to realize the power of that in healthcare.


IBM’s Watson famously winning on Jeopardy

This is being made possible by two things: The first is the increased power of machines to learn statistical patterns and forecast outcomes at radically cheaper prices, and the second is the open data movement. The Center for Disease Control, CMS and Health Departments are making very large datasets available about hospitalizations, claims and disease patterns. In addition to that, spatiotemporal data about the environment, traffic and demographics are readily available on Blending these very different datasets can give us new unimaginable insights into our health, a term coined as the ‘Mosaic Effect’.

To learn more, I got in touch with Jay Bhatt, Internist, Geriatrician and Chief Health Officer at Illinois Hospital Association. During his previous public health tenure as Managing Deputy Commissioner and Chief Innovation Officer, Jay created partnerships and brought together an eclectic team of individuals with the necessary skills to use the power of machine learning, natural language processing, and predictive analytics learning giving insight into how health departments can approach health issues across the nation in a more proactive way. They built models around food inspection, lead poisoning, mammography and West Nile fever, and neighborhood assessment.


 Dr. Jay Bhatt

Hey Jay, thanks for being here. Can you start by telling us why you chose these topics specifically for predictive analytics?

It’s my pleasure to represent our collaborative effort which continues despite having moved to my role at  the Illinois Hospital Association. We started with food inspections and remediation of buildings for lead poisoning because you are dealing with a resource constraint between the amount of work that needs to be done versus the amount of people you have. So you either have to hire more people, expend more resources, or become smarter with data.

The status quo is that food inspection only happens after a poisoning case is reported and buildings were being remediated of lead after someone living inside it had already presented with neurological symptoms and high blood lead levels.

These two efforts were also very reactive in nature. What we did was build models that can predict which restaurants will be subject to food poisoning and which areas had the highest risk of lead poisoning. This allows for a very proactive intervention before the problems happen.

Fascinating…so what kind of data did you use?

 In the food inspection example, we used environmental open data exclusively starting with many factors and narrowing it down to 13 which were relevant. With lead poisoning we also added in some clinical data regarding blood lead levels which the state provided us with from clinics in real time. We then added data about premature deaths, neurological issues and migration patterns to determine exposure and narrow down areas that require remediation.

Screen Shot 2015-11-13 at 6.42.40 PM

The lead model turned the impossible effort on the left to a more focused and doable effort on the right

What kind of partnerships did you establish to make this project successful?

 We partnered with the University of Chicago’s Center for Data Science and Public Policy. Our collaborative team created a model that helped us identify those women and children most at risk. As a result inspectors could get out in front of the problem.

In the lead poisoning example, how can doctors and clinics make use of this data on an individual patient level?

We are actually seeking funding to integrate the model with electronic health records. I see this as a powerful decision support tool, by stratifying the patients into high, medium and low risk in a dashboard in the EMR. Doctors may not order a blood lead level until later in a child life, but if a pregnant woman or her child walks in and is flagged as high risk, the doctor might want to reconsider waiting that long. This can keep us a few years ahead of the curve. On a population health level, the doctor may identify a few high risk individuals who have not come in for a check up for a while and call to follow up with them.

Screen Shot 2015-12-23 at 12.19.45 AM

Dashboards could guide the physicians proactive attitude

That does sound like a game changer. What are some of the challenges you face?

 Connecting different kinds of datasets is tough and requires different techniques to address that. Home addresses potentially have errors for example, and you have to clean it and match it. Clinical data is tricky because you need to make sure that the data matches appropriately and that HL7 integration is done.

How do you see this playing out in to the future of healthcare in the upcoming years?

These kinds of technologies that tap into big data are going to be more of the norm, and decision support whether through Watson or other approaches integrated with the EMR will be easier to access due to big data methodologies. We will be able to anticipate what happens to a patient and that will change our approach. Doctors will be more adept in using the data to make decisions, and that will impact the quality of care. I also think the clinical realm will see an integration of other data sources such as social media, wearables, education and even criminal and justice system.

Open platforms like Hadoop will be more noticeable and I’m intrigued to see if clinic-level people will have the bandwidth to use it. There is also more integration of data such as what is happening between retail groceries and pharmacies like CVS or Walgreens. Independent medical groups are also partnering with Blue Cross Blue Shield who are sharing claims data because it helps their members’ bottom line. It will be interesting to what kind of unusual suspects partner together with data being the anchor. I would also love to see de-identified patient data being open and available for the innovator community.

At the end, the patient or consumer should really be at the center and the question is how can we make the healthy the choice the easy choice using data?

Screen Shot 2015-12-23 at 12.20.37 AM

 Jay speaks about the future at the Health 2.0 Fall Conference


We live in an increasingly complex and yet beautiful world of data. While pessimists raise more issues, futurists such as Jay and his team are figuring out how we can derive the most value from these data streams and do things we never imagined possible, in a secure fashion. Falling prices of increasingly strong processors, higher accuracy of sensors, and the open data movement are all exponential enablers to what we can do with this sea of information. We can’t know the future, but we can sure predict it much more accurately than we could 5 years ago.



Omar is a physician, writer and data analyst. After realizing the potential of exponential technologies to reshape the inefficiencies of healthcare, he left medicine and moved to San Francisco to immerse himself within the network of entrepreneurs in Silicon Valley, while working on technology projects of his own. Omar frequently writes for Health 2.0 News while consulting major organizations with the Healthcare Practice of Clarity Solution Group. View all posts by OMAR SHAKER →

Leave a Reply

Your email address will not be published. Required fields are marked *