Episode 52 - Predicting Hospitilization with the Hilltop Institute_mixdown_Stereo
2:51PM Feb 13, 2024
Speakers:
Dr. Ian Anson
Campus Connections
Dr. Morgan Henderson
Jean Kim
Dr. Leigh Goetschius
Dr. Fei Han
Keywords:
model
hilltop
data
predicting
social science
morgan
literature
primary care
umbc
institute
social sciences
research
great
maryland
team
risk factors
work
people
program
cs3
Hello and welcome to Retrieving the Social Sciences, a production of the Center for Social Science Scholarship. I'm your host, Ian Anson, Associate Professor of Political Science here at UMBC. On today's show, as always, we'll be hearing from UMBC faculty, students, visiting speakers, and community partners about the social science research they've been performing in recent times. Qualitative, quantitative, applied, empirical, normative. On Retrieving the Social Sciences, we bring the best of UMBC's social science community to you.
With the Super Bowl having just occurred over the weekend, I'm sure that some of you were excited to sit down with some chips and dip turn on the TV and watch several uninterrupted hours of television. That's right, two groups of superhuman athletes battling across the grid iron at their peak performance levels for hours upon hours of action. Or if you've ever watched a football game on TV before Sunday, you were likely settling into your couch with the full awareness that according to the pro football network, you only saw between 15 and 20 minutes of actual football. The rest of the four hour Superbowl broadcast consists of the halftime show timeouts, breaks between plays, and of course, commercials. For many viewers. In fact, the commercials are the main event. Advertisers know that to reach viewers most effectively, they should debut brand new ads with celebrity appearances, gimmicks, or memorable hooks. But the job of drawing in viewers has become increasingly difficult for companies in the tech space, because they have to explain to their audience just what it is they're doing and how. Over the past few Super Bowls, one clear theme has emerged. Companies love to tell their audiences about how they're using data. A few years ago, the Super Bowl broadcast was packed with references to the cloud and big data. And now data's invoked in service of AI and other kinds of algorithms. It's clear that the public has developed an understanding of our new era of massive data availability, in part through commercials like these. But how do large datasets actually help the public?
On today's broadcast, we turn once again to the Hilltop Institute for answers. In today's conversation I speak with three members of the Hilltop Institute's Analytics and Research Team. Our conversation concerns the Institute's development of predictive models to identify and flag patients who have a high risk of hospitalization based on a wide variety of factors. Essentially, the team is using a very large amount of data and an algorithmic approach to analyzing that data to help medical professionals intervene to prevent costly and labor intensive hospital stays, saving money and improving outcomes for patients. But as we'll soon learn, just like a winning Super Bowl team, the true performance of this model comes through teamwork itself. In this case, social science teamwork.
On today's episode, I have the privilege of speaking with Dr. Goetschius, Data Scientist Advanced; Dr. Fei Han, Principal Data Scientist and affiliate assistant professor in the UMBC Department of Computer Science, Electrical Engineering; and Dr. Morgan Henderson, Principal Data Scientist and affiliate assistant professor in the UMBC Department of Economics. Working together as the Analytics and Research Team at the Hilltop Institute. All three of these scientists have a unique and important role to play in achieving success in a data analytics approach. Let's jump in to hear what this team has to say. Hut Hut, hike.
Today, I'm really excited to have a conversation with three researchers who are members of the Hilltop Institute at UMBC, specifically members of the Analytics and Research Team. As you can imagine, as the host of Retrieving the Social Sciences and somebody who, if you've been listening to this podcast for a while, you know that I really like analytics and I really like research. So I'm thrilled to be able to talk to these three folks today. We have today Morgan Henderson, Fei Han, and Leigh Goetschius. If you wouldn't mind just briefly saying hi, and we can get into some of the questions that I'd like to ask you,
Hi. Professor Anson, how're you doing today?
Good, thanks, Morgan. That's Morgan Henderson on the mic.
Hi, Leigh Goetschius here.
Yeah, this is Fei.
Hi Fei. Awesome. Great to get to know all three of you. So obviously the topic of today's episode is about the work that the Hilltop Institute has been doing in predictive modeling, especially as it relates to various health care programs at the federal level. And I was really interested first of all, in asking you all you know, not everybody who's listening to this podcast knows that much about the Hilltop Institute. Certainly, we have done an episode that features some of the Hilltop Institute content in the past. We had Morgan, also with another Morgan, on in a previous episode. But for those of you who don't know, or for those listeners who don't know what the Hilltop Institute might be, if you wouldn't mind just telling us a bit about what this is, and then critically, how it is that the Hilltop Institute got involved in this topic of predictive modeling.
Yeah, love to. So Hilltop is a Research Center located currently on campus at UMBC. Our offices are on the third floor of Sondheim. And we've been on campus I believe, since 1994. And so this is probably long before many of your listeners were even born. But we, we've had a presence since then. We keep a reasonably low profile. We are a small, but growing organization. And so we currently have around 55 staff, I'd say and of that, 10 to 15 active researchers who are pursuing research in various different health related fields. And so Hilltop is a research center. And we do a mix of research and client related work. And so our big clients are state agencies from around Maryland. So the Maryland Department of Health, the Maryland Health Benefit Exchange, HSCRC, which is the state agency that regulates hospitals and a lot of other state agencies that I've never even heard of before joining Hilltop, and I'm from Maryland. And so we, we do a lot of operational work for zxthese agencies. And then we do research work as well. One of our core missions is to really try to blend those two, and really practice engaged scholarship where we we take the really interesting, impactful operational work we do, and we we try to put that out into the research world as well. So that's one of the big things that Hilltop focuses on.
So this "Analytic Research Team engaging in predictive modeling," so how was it that you all came to this particular set of projects or projects as the Hilltop Institute?
Fantastic question. This all began actually, with one lunch held probably five years ago, where the former director of the analytics and research team, Dr. Ian Stockwell, who is now faculty in Information Systems here at UMBC. He had lunch with the then executive director of this new program in Maryland called the Maryland Primary Care Program. And Chad Perman as well was a leader in that program. And they really tried to talk about the goals of this new program. And this new program, which is partially federally funded, really aims to help primary care practices around the state transform, modernize, do a really, really good job of providing advanced primary care, and I'm gonna do air quotes. And one of the ways that Ian, the Hilltop Ian, or the former Hilltop Ian, one of the ways he said we could help is hey, well, well, what if doctors all around the state, what if they could identify those patients who are at the greatest risk of having an avoidable hospitalization in the next month? What if they had that list? So then before that bad hospitalization happens, those doctors can go down that list, call the highest risk patients, say you know, "Hello, Mr. Jones, are you taking your medication?" you know, to give the providers tools to practice proactive outreach. And that's how Hilltop's work in this space was born.
Now, Dr. Henderson, you're getting me very excited, because what you're describing here, trying to make predictions about people's health outcomes, sounds a little bit like you're gonna need the tools of social science to do a really good job. And, you know, long term listeners are probably well aware that I'm going to pivot the conversation in that direction. So this seems like a really worthy thing to research. I think that we can all agree that there's a pretty important sort of public facing component to this. How do you do it, though? I mean, how do you actually make good predictions? I'm sure that that's something that both hospitals and public policy officials and the broader research public is pretty desperate to understand. So are there some tools that we might be able to use to most effectively predict these outcomes for primary care?
Well, that's a great question. And I'm going to punt that to one of my team members. And I would say one of the strengths of our team, we are small, but we are mighty, I would say and because we all have slightly different backgrounds, and we bring different perspectives to bear on predictive modeling. Me, I have my PhD in economics, which is, you know, there's a good amount of computational stuff, statistical inference, but it's really more theoretical costs and benefits. And Dr. Goetschius, who I'm gonna pass the mic to, she has her PhD in, I believe, clinical psychology. Developmental, pardon me, and so she brings an amazing skill set that that we didn't have on the team before then, and Dr. Han, he has his PhD in -
Computational analysis and modeling.
So I think we have a great team, we use several different tools to try to make good predictions. We have a lot of processes in place to try to check that what we're predicting is accurate, and it's accurate across groups. So I'll pass it over to my colleague leads here to speak on that point.
Yeah. For at least the predictive models that we've worked on, we sort of have like, I guess a recipe for the development of these models. We start with working with our clients. So in this case, like Morgan was describing, the Maryland Primary Care program where we're trying to enhance primary care for Medicare beneficiaries in Maryland. And they have these targets, like Morgan mentioned, avoidable hospital events that they would like to reduce because these are inpatient hospitalizations or ED visits for conditions that theoretically should be well controlled by enhanced primary care, or by like effective, timely primary care. So we have this target of an event that should be able to be helped by primary care, which is who the model is going to be used by. Then, we go from there into a quite extensive literature review, where we look at, you know, peer reviewed journal articles, reports and evaluations, other white papers and anything we can really get our hands on. Plus feedback from stakeholders, like the Maryland Primary Care Program Executives, but also, they have some user groups that we can get feedback from. And we take all of that and come up with a set of conceptual risk factors that we think will be really helpful. And then we move it into the Medicare claims data, where we really try to use standardized definitions or risk factors, like, you know, for example, a diabetes diagnosis, the set of diagnosis codes that we use come from really standardized sources so that when we say that somebody has diabetes, it fits with what other what other people find to or how other people define it. And then we have Fei, who is our super awesome modeler, who takes all of the risk factors and the outcomes that we define and runs, you know, hours and hours of programming to get our predictive models and Morgan and I run sort of a shadow validation model that makes sure that everything that comes out of face code matches, or we can recreate it so that we know that what we're sending out is actually what's what's true. And then we cap it off with just a lot of really transparent documentation so that people who are using the model know what went into it, and how it's supposed to be used.
Dr. Goetschius, I really love this idea that this process begins in the literature that it begins, theoretically, because you can't just create a model and just throw things into it and get answers right, you have to figure out how to construct a model that makes sense of the world in some way. And for those of our listeners, who might be students might be thinking about the social sciences from that perspective, it's a really reaffirming point to say, Look, you can't just start jumping into the numbers, right, without really knowing what you're doing. And when we're seeing this, this, this project, which has very practical implications, you can't just come at it with your own assumptions, you have to really rest on sort of the body of literature across fields seems, right I mean, are you drawing from just one sort of strand of that literature, or are you are you diving into a variety of different fields to get those insights?
Yea, definitely a variety of fields. Because I think, as I mentioned, we use risk factors in the claims. So we use a lot of healthcare related literature. But we also pull in risk factors related to social determinants of health that we pull from publicly available data sources, you know, census, the American Community Survey, the EPA, all kinds of publicly available sources about the lived environment of the beneficiaries and sort of what's going on in their neighborhood environment. So we then pull from all different kinds of literature areas to cover what to look for, in those, those other areas.
I'm a little bit jealous, because that sounds like a lot of fun, actually. And maybe that's just speaking to myself as as a social science nerd. But that sounds like (Dr. Goetschius: it is fun) fun to scour those literatures and try to put all those different puzzle pieces together. But, Dr. Han, so once we've got all these puzzle pieces, at least theoretically, and we've got some data, how is it that we actually do this modeling? Can you tell us at least a little bit, maybe from a broad perspective, we don't need to get too technical, but what's what's in this black box? How do we actually get a model to give us these predictions?
Yeah, thank you. Yeah, for this great question. And actually, we, we could start it from a broad perspective. But we all know we currently we are in data science era. And we have a data and a lot have a quality data. And also we have computer power. Also, we have Excel in the algorithm. Now you see, with a three component together, we can solve a lot of big problem, a lack of predictive models and Morgan suggested what that Medicare avoid the hospital. hospitalization is just just a one case. And that's a problem. Add more domain knowledge with these three components. We build this model. There's a high level summary and, and accurate press press note I want to hear a little bit about my personal story because this I, I believe it will be a [] to the students. And I came to the US in, in 2009 to study for my PhD degree. I was a, I was a trained mathematician, before I was in China. I have a bachelor's degree in mathematics, Master's degree in mathematics in China. And here in the US, I get my PhD in CAM, computer analysis and modeling. It's actually combined mathematics, statistics and computer science. So it's a perfect match for data science. And after that, I still feel not very strong still. So I get another degree, Master of Science degree in data analytics. So with this, I feel much more comfortable in this area, because almost in any science, I really want to cite a quote, a very famous quote from Dr. Raul. He was a very famous statistician. The quote like this, "In final analysis, all knowledge is history. In the abstract, all sciences are in the abstract, mathematics. All judgments in their rationale, are statistics." So we want to get deeper and deeper and know the basic logic running after this domain knowledge, I feel very comfortable. Now I, I know the principle is this. So all others are just running on on the top. So domain knowledge is actually ruled by these principles. So that if you are students in studying mathematics, or statistics, study harder. Make your life much easier.
You know, usually at the end of these episodes, I ask for advice for any students that want to go pro in the social sciences, but I don't think that we're going to be able to top that. That's wonderful. And certainly some some lessons that I'm taking to heart as well. That's a great perspective. And so thinking about these these models that you've created, right, what are they telling us? Are there specific kinds of lessons that we can learn from this modeling exercise that contribute to this literature that maybe we hadn't understood quite as well, previously?
That is a great question. So I would say the number one lesson, I think that we've all learned is that the models perform, honestly, surprisingly, well. I'm remember remembering back to that day in September of 2019, when when Fei and I ran the first iteration of our first model for the Maryland Primary Care Program, and so this model scored back then it scored about 210,000 people and the scope so that it creates basically a spreadsheet with 210,000 rows, and that number between zero and one, everyone gets a number. And then the scores go out to doctors offices all over the state 380 back then,currently about 500 because the program's grown. And so that first time we ran the model, I think we were really surprised at how good the model was. We were surprised we were really relieved at how good the model was. So this to me is the first lesson of it is possible to predict this, this outcome, this avoidable hospital events in the following month, it's possible to predict it very well. We're seeing that the 10% of people who we say are the riskiest, because this is supposed to be used by by practices to to help them identify their riskiest patients. So we say that how we judge the quality of the models is we line all the people up riskiest to least risky, and then we just look okay, what fraction of true hospital events in the next month is actually comprised of these 10% riskiest people. And if it's only 10% of the hospital events in the next month in the top 10% risk, that's a bad model. It's not predicting anything. But we're seeing that we can identify reliably about 50% of the avoidable hospital events in the next month, are concentrated in those people that we find are the 10% top risk. And so we take that to be good model performance. And so the other models that we have then rolled out, so we have a model predicting severe diabetes complications, we have a model predicting all cause mortality within six months. They perform even better than our avoidable hospital events models, but I would say that that's the first and to us the most important fact, is these models perform well. Now I would say the second and honestly, equally most important result that we're seeing is the models, by all indications, they look like they perform well, across groups. So this is a very important topic, because this touches on this idea of algorithmic fairness, where some predictive models out there, and there are so many predictive models in health that we don't even see, we as consumers of health, we don't even see them, but they're there. A lot of these models, I shouldn't say a lot, some of these models are known to encode systematic biases, typically racial biases, but because of the way they're constructed, and because of what they're trying to predict. And so this is a very, very active and important area of discussion at all levels of the government and research communities right now. We take this really seriously, we check our models for not only performance, but also bias. And we currently don't see any evidence that there is any bias. And if we did, we would very much take steps to tweak the modeling to address that. So those are those are the two big takeaways, I would say.
So Dr. Goetschius and Dr. Han, if you wouldn't mind. So what's the secret sauce? Are you able to talk about why these models seems to perform better? What is it that you're doing, perhaps that other models are not doing, especially in light of this observation, which I think is a really critical one, that this model might do better across groups, and avoid some of these algorithmic biases that come in? Is there a way of knowing that sort of what, what this model does that's superior?
I would say, all the legwork that goes in to the prep work, like before the model is trained and scored, I think helps a lot. All of the literature review, all the stakeholder feedback and input really helps start the model off on the right foot. And so we think that that helps improve performance downstream, and hopefully keep bias out or low across all the models so that we're helping all patients do the best we can.
Yeah, I just want one more comment for your question, Ian. Why? And I think I could summarize, with three points. The first is our research team have a special structure, like Morgan is from economics, and Leigh is from developmental psychology. And we have the societal perspective on the same problem. So we could see the different thing. And we always compare our opinions and, and cross check. The second point is I want to see the data. Hilltop hosts Maryland Medicaid data from 1999. And the data is very high quality. And we have a special stats program team and work hard on these to make it qualitative, very high. And the third point is the size of the data. Because currently, even with the same average, when the data size is big, the estimates or learn the result will be much more accurate. So um, these three point I think I could answer your question, why our model performs good.
I really love this story is emblematic of what I would consider to be really successful social science. And I think if there's one thing I've learned over the span of doing this podcast for now more than 50 episodes, it's that there are certain hallmarks or features of really good social science projects, right. And one is that they're diverse, that this is a team that brings together disciplinary perspectives from various realms, and they're thinking about, not just disciplinary diversity, but also social diversity, right? That your your attention is also to these sorts of indicators of, of social diversity in your outcome in your model, right. You know, the second thing is that it's collaborative, right, that you're actually having these really great conversations, and they're actually leading to insights that make the model different and better. And then the third is right, quoting Sherlock Holmes in his conversation to Watson, right, "Alas, I have no data, I cannot tell." Social Science needs to be able to leverage a lot of evidence basically, to be able to come to any conclusion, whatever that evidence might be, right, either qualitative or quantitative, you know, anthropological, or a gigantic table of, you know, millions of records of, of patients from 99 to the present or something like that, right. And so I just love the story. I mean, it seems like this is such a great example of a project that's having real positive benefits for our society, and that comes from, like researchers getting together and having great ideas and like nerding out over this literature. To me, that is a really exciting thing, and I'm so grateful that you're able to share this success story with us. And so I guess in wrapping up our discussion, I wanted to ask like, what's next, you know, what are some other other things that you think you're going to be able to accomplish in the future. or that you have your eyes on, as you continue to build out this excellent teamwork and collaborative sort of success in the social sciences?
Well, thank you for those kind words first of all, and thanks very much for having us on. We love talking about this, the modeling work. What's next? I would say there are so many potentially worthwhile models that are just kind of waiting to be built. And I would say that our partnerships with the state give us such a great place in which to do that, because we work closely with the Maryland Department of Health, which has currently about 1.5 million people in the Medicaid program. That's a quarter of Marylanders. We work closely with the Maryland Primary Care program, they have about 400,000 Medicare beneficiaries in their program. So that's almost 2 million people who lives we in principle, get to touch through our models. And we take that as a great privilege and a great responsibility. And so, you know, we we want to be responsive to the needs of the state, we're always thinking about how to tweak the models, how to improve them, we have a couple of papers just right now under consideration. So we're always trying to find a time to not just run these amazing models, but also write up what we're finding, and share it with the world because we very much stood on the shoulders of those giants to create these models. We really comb with the academic literature, and we really want to give back as well. So in terms of what's next, more of the same, hopefully, just good, careful, high quality work.
Yeah, we have several projects ongoing, and we will still keep going, keep the quality and the good work going. Thank you.
Yeah, the only thing I would add to what Morgan and Fei have said is that I think I've mentioned stakeholders a couple of times, and one of the things that we're really interested in, I guess, we know our models perform really well in the data. And like, you know, it's predicting what we want it to and it's doing it well. But what we don't know a lot about is once we send our predictive model scores out into clinical practice, how are they being used? We get a little bit of information back from some of the stakeholder groups that we work with already. But I think that an exciting future project would be to really gather some qualitative and survey data about how practitioners and providers are using the risk scores that we send out, and really, how can we make them better to do an even better job of preventing avoidable events.
I'm sure I speak for all of our listeners in wishing you well on this project, because certainly, I think that it has some incredible positive benefits for Marylanders and obviously for the scholarly community at large. Dr. Fei Han, Dr. Morgan Henderson, Dr. Lee Goetschius, I want to thank you, all three of you, again so much for taking the time to chat with us today about this project, and best wishes as you move forward. Hopefully, we'll be able to see the continued fruits of your labor in the near future.
Now it's time for Campus Connections, the part of the broadcast where we connect today's content to other work happening at UMBC. And as always, it's time to hand the ball off to our production assistant Jean, who's ready to take this discussion way downfield. Jean, what's today's connection?
Hi, Dr. Anson, for today's Campus Connection, we're going to look at the work of Professor Zoe McLaren, an associate professor in the School of Public Policy and Department of Economics here at UMBC. Recently, Dr. McLaren published an article entitled "Data-Driven COVID-19 policy is more than a one-size-fits-all approach," which discusses how some COVID-19 policies neglected evidence based data driven sources and instead used outdated one-size-fits-all rules, which offered a less tailored approach to mitigating the pandemic. For example, the article discusses how a data driven guideline would recommend rapid antigen tests to determine the length of an isolation period, but also acknowledged that antigen tests can be pricey and sometimes inaccessible. So symptoms or duration based guidance can be used when one doesn't have access to a test. However, the CDC's one-size-fits-all symptom based, five day isolation guideline is less effective, because according to evidence based data, COVID symptoms are poorly correlated with infectiousness. In essence, the article suggests that tailoring policies based on specific data can lead to better outcomes in managing the pandemic. Just like the work being done at the Hilltop Institute, Dr. McLaren is showing us just how valuable and important data and informed analysis can be by looking out for the health and well being of us and our communities. And that's it for today's Campus Connection. Passing the ball back to you, Dr. Anson.
And it's another touchdown for Jean, who continues to add to her most valuable production assistant campaign. Thanks as always, Jean. And thanks to you for listening to our show today. While we may not have the production value of the NFL, we like to think that you can learn a lot more on our broadcast. And of course, Retrieving the Social Sciences has no commercials. As always, keep questioning.
Retrieving the Social Sciences is a production of the UMBC Center for Social Science Scholarship. Our director is Dr. Christine Mallinson, our Associate Director is Dr. Felipe Filemeno, and our undergraduate production assistant is Jean Kim. Our theme music was composed and recorded by D'Juan Moreland. Find out more about CS3 at socialsciences@umbc.edu and make sure to follow us on Twitter, Facebook, Instagram, and YouTube, where you can find full video recordings of recent CS3 sponsored events. Until next time, keep questioning.