How data can transform healthcare
Dr. Julie Rosen is the Chief Scientist of the Leidos Health Group and Chair of the Leidos Technical Fellows, a program that celebrates technology, science, and engineering across Leidos.
In this latest episode of MindSET, Julie shares her thoughts on how data are transforming healthcare, what AI and ML mean in the healthcare space, as well as what the future looks like for the Health Group and how, as a mathematician, she’s become its Chief Scientist.
“Many people think data science is all about the mathematical algorithms used to crunch data. The true challenge lies in creating and applying the right model to the data available at a given time, and that data are relevant to investigate the mission need.”
So, if you're a future technologist or you’re just starting out in your career, Julie is someone you want to listen to.
“We want to make sure we cultivate the next generation of scientists and engineers at Leidos and in the larger technical community. On behalf of my fellow Tech Fellows, please reach out to any of us if you have questions about your technical career, or you want to talk about a particularly challenging technical problem.”
On today’s podcast:
- How a mathematician gets into health
- How data are transforming the healthcare space
- Why big data isn’t at the heart of 21st century data science
- How AI and ML are impacting data analysis
Transcript
-
Julie Rosen (00:00): Since that moment, whether I'm performing on an internal R&D project or in a contract with a federal or commercial health agency, I'm like a mother tiger in protecting access to and use of private information. Only with such protections can the researchers and domain experts work as a team to develop data models that accurately reflect and learn from collected data.
Bridget Bell (00:31): Welcome to MindSET, a Leidos podcast. I'm your host, Bridget Bell.
Meghan Good (00:35): And I'm your host, Meghan Good. Join us as we talk with pioneers in science, engineering and technology to understand their creative mindset and share their stories of innovation.
Meghan Good (00:51): So on this episode of MindSET, we invited one of my favorite mentors from Leidos, Julie Rosen, who is the Chief Scientist of the Leidos Health Group. She's also the chair of our Leidos Technical Fellows, a program that celebrates technology, science, and engineering across Leidos.
Bridget Bell (01:09): So we asked her about how data is really transforming health care, and I thought it was really interesting, because she got into defining what AI and ML mean in the healthcare space. And she used a great example of the stackable dolls, and how you can dig into different layers.
Meghan Good (01:29): I love when Julie talks about her background and how she got to where she is as a mathematician, but where she really wanted to go as well. I always enjoy working with her on data analysis and data science projects. Even if it's out of her field, she'll sit there, she'll tell you what's new. She'll tell you what's coming up next, and she'll help shape that. And we hear a lot of that in this conversation.
Bridget Bell (01:52): Definitely. In this conversation, she talks about what are the challenges, or what's going well, or what she sees in the future, and also brings in her experience as that Technical Fellow chair, so I hope our listeners enjoy this episode with Julie Rosen.
Bridget Bell (02:16): Welcome to MindSET. Today, we're speaking with Julie Rosen, Chief Scientist with the Leidos Health Group. Welcome, Julie.
Julie Rosen (02:23): Hello, Bridget. And thank you and Meghan for inviting me to this podcast.
Bridget Bell (02:28): We're so looking forward to the conversation. So let's start with a brief introduction about your current role and background.
Julie Rosen (02:35): Yes. I am Chief Scientist with Leidos Health Group, and in that role, I report to the group's CTO, Doug Barton, on most things, dealing with exploration and analysis of healthcare data. In addition to that role in the health group, I also serve as chair of the Leidos Technical Fellows Community, which is sponsored by our corporate CTO, Jim Carlini. But my current role comes after quite the journey, most of which been spent as a member of the technical community at Leidos, formerly SAIC.
Julie Rosen (03:04): During my university education, I earned a bachelor's degree in chemistry, but didn't really want to spend a career washing test tubes, which would have been my fate back in that day. Plus, I did enjoy the fields of physical sciences. Unfortunately, my aura, which is well known by my colleagues over the years as something to step away from, led to my physics professors banning me from taking any more physics labs. And that happens because things blow up as I just enter the room. Thankfully, my chemistry professor didn't disown me after the $10,000, thermometer blew up. And nope, I wasn't touching it at the time. That inadvertent ability to have physical components explode has been the bane of my adult career. And yes, even computers have been known to smoke out when I enter a room. But as long as I have adult supervision from my extraordinary colleagues, the laptops seem to be stronger than my aura most of the time. And today's computer bugs must know that I can get myself into more trouble than they will.
Bridget Bell (04:07): Well, Julie, I think you've missed your calling as a tester maybe, but I have always wondered in the years that we've worked together, how does a mathematician get into health? What was your career journey like to get you there?
Julie Rosen (04:22): Well, it was long and winding. I do feel like I have reached the apex of my career, trying to help a very impactful market health. After being banned from those physics and chemistry laboratories, I was encouraged to take up more theoretical pursuits. Thankfully, one of my senior year physics courses involved an introduction to probability and statistical dynamics. I very much enjoyed and I showed proficiency at inferring possible outcomes from collected measurements of phenomenon, what we today would call data analysis. That led me to my master's degree in mathematics, and onto a PhD in applied mathematics. Looking back on that journey, I now realize the advantage of combining theory with the application of good theory. The former alone can lead to toy problems, and the latter alone can lead to unfortunate outcomes when accepted methods are applied in an irrelevant or in feasible manner. The combination of those two, however, have the greatest impact of both good science for a mission need, and that expanded both my skills and gave me great satisfaction that I can contribute to a valued real-world challenge.
Julie Rosen (05:39): As example, I was able to apply my mathematical left brain little gray cells in the national security arena for over 26 years here at Leidos, doing what is called multi-source multi-hypothesis correlation and tracking, which is a very long phrase for today's of multimodal data fusion. Back in the day, the bulk of the data came from electronic emissions, what those in the intelligence community know as ELINT, and also readings that go for geolocation. Of course, over the decades, during which I've been creating and applying evermore efficient probabilistic models to an ever-expanding set of collected data, the methods, the models, and the libraries of data analysis tools also has matured and expanded into what today is called data science. Unfortunately, while the national security and financial worlds have modernized their data collection analysis and the communications of those analytic findings over the years, the healthcare market has not.
Julie Rosen (06:41): I'm sure your listeners are well aware of the challenges in finding a healthcare provider, scheduling an appointment, providing your health history over and over and over again, and then, of course, figuring out the entire medical claims process. And in this time of the COVID-19 health emergency, we know the infrastructure supporting our healthcare providers is insufficient to forecast, treat, and follow up patients in their communities. Thankfully, we have the technology today to support the healthcare providers, and in this sense I'm referring to what the healthcare market calls IT, but includes plenty of data science and data and engineering. It's for that mission that Doug asked me to transfer from the national security business unit over to the health group about six years ago, and to focus on the creation and application of data analytic models, methods, and tools to improve healthcare outcomes and claims processing performance for our customers in both the federal and commercial space.
Bridget Bell (07:46): Well, it sounds like you've had quite the journey to have experience in the intel side and now being pulled over in healthcare. And I'm really interested in throughout this journey, in your last six years in healthcare, how is data really transforming the healthcare space?
Julie Rosen (08:04): Well, Bridget, thanks for this question. While many people think data science is all about the mathematical algorithms used to crunch data, the true challenge lies in creating and applying the right model to the data available at a given time, and that those data are relevant to investigate the mission need. People typically think big data is at the heart of the 21st century data science, but nope. The practiced data scientist knows that we must start with the mission need, which might be a technical, or might be a cultural gap, and determine if the relevant data are available, or if they're not available today, are they collectible?
Julie Rosen (08:43): As example of the nontechnical need, consider weather forecasts. One might be able to develop a first-class very, very, very precise forecasting algorithm for the weather patterns over the next few days. But if no one wants to carry an umbrella, then they'll still get wet. Similarly, in healthcare, if the findings of highly accurate risk models are not communicated in a timely and culturally aware fashion, then public health guidance will not reach their potential to improve healthcare. So as a second example, consider the process of being admitted to a hospital, and a person, a patient, you, your mother, your child, that patient transfers from one unit to another during the hospital stay. To avoid increasing the risk of, say, hospital-acquired infection or a dangerous fall, best practices should be followed. Those practices assume that the patient's status is measured throughout the hospital stay, and if those data could be made available to the on-call provider at the right time, for the right patient and the right therapy, then the practitioner is better informed about point of care decisions.
Julie Rosen (09:54): So yes, technologies are promising a gob of data from emerging devices at the bedside and wearables, but towards what end? In health care, we have a diversity of types of data, ranging from text-based notes on patient history and current complaints, to the images from radiology, and onto the numbers from a lab test and medical devices. These data typically are captured in the patient's electronic health record, or EHR. But today, healthcare providers use different EHR systems, both within their own hospital system, when they're communicating across hospitals to other systems, and of course, as they report back to the state and federal regulatory agencies. All of these miscommunications result in the prime directive challenge: do we have relevant, patient-specific, timely and trusted data?
Julie Rosen (10:52): The applications of data analytics also are wide ranging. We can think of things from the early detection of suicide ideation using those precious clinical and patient history notes to early detection of breast cancer and other genetically linked diseases using imagery and genomic and proteomic data, and onto the detection of possible adverse health events, such as those falls and hospital-acquired infections I mentioned earlier, using the continuously tracked vitals, or the numbers that come from those devices. Amitree and signal processing models are more mature than today's very challenging text-based models, but all data can benefit when employed in a blend of artificial intelligence and the traditional approaches, rule-based and statistical approaches. Think about the multiple modalities and patient-specific tailoring involved in creating effective robotic limbs. An image sensor must know the distance and angle from the limb to the step in the staircase on the floor. An electric sensor embedded in the brain must know which muscles to stimulate, or which brain cells to activate. Traditional statistical models, which rely on population averages, would not have made such healthcare advances thinkable, much less possible today.
Meghan Good (12:16): Hmm. So you brought it up, as far as artificial intelligence goes, and I feel like we can't have any kind of question or any discussion of data analysis these days and not ask. How do you see AI and ML impacting these areas of data analysis?
Julie Rosen (12:32): Well, Meghan, that's great. And you're right. Firstly, I'm going to play the mathematicians card. Our first principle is always definitions, so let me start with those. The concept of artificial intelligence, or AI, is as old as computing, and its definition has changed over time. In terms of modern technology, AI is the broad domain of digital devices programmed to mimic cubic behaviors and intelligence. In the past decade or so, computing technology has advanced enough to allow for greater computational investigation into the gobs of data at our disposal, which is mostly what we mean today when we refer to AI. Machine learning, or ML, is a subset of AI, but there are other flavors of AI as well, including rules-based expert systems and knowledge graphs. And it's important not to conflate AI with ML.
Julie Rosen (13:29): Methods of AI can be thought of like a little Russian doll set. AI is on the outside then ML is on the doll inside, and then deep learning, or DL, is even further inside that babushka doll. A True data scientist should know the full range of models and methods to consider for a given purpose and available data. Think of ML as the reasoning engine that consumes data, applies algorithmic logic to recognize patterns in the data, and then sends the analytic results to a human or another computing component. General ML models work with a layer of observed data, what we all call the input, and then they go into the layer of the analytic output, such as forecasts and recommended matches, like those you see when you shop online. But in between those two layers is a hidden, or what we mathematicians called the latent layer, where the attributes are connected to each other and to the input and the output layers.
Julie Rosen (14:34): If the data are appropriately curated for like comparisons, apples and apples, then ML techniques have the ability to learn new patterns by operating within a few initial rules that probably came from subject matter experts. But those initial rules then will mature algorithmically as data are processed over time, and new rules will be learned. The beauty of ML is it allows us to characterize situational reasoning without the need for a full complement of program rules and instructions, which are dangerously brittle in real-time decision making with uncertain or missing data. And one more item to complete these definitions. I mentioned DL, or deep learning. DL models are applicable when you're trying to forecast outcomes or recognize patterns within the data, where each measurement associates with many attributes, which we also call features. In these cases, we need more advanced models with deeper and deeper, more and more layers to learn all those many features among all those many patterns in those data sets.
Julie Rosen (15:44): When you're working with unstructured text and imagery, for example, these features become deeply interrelated, and the multiple layers of the deal network can start to resemble a big, hairy fur ball of connections. Now, you've got to start digging deeper into the possible connections to determine what patterns you can infer from them. Such digging can get computationally intense, which is why a new kind of chip called a GPU is in demand to train new deep learning models.
Julie Rosen (16:14): And I just want to make one counterintuitive remark here for folks. Everybody's into big data and they think machine learning and DL is very important, because you've got to process all of those data points. Well, we at Leidos say it's not so much that it's big data, it's big workflow. We don't get a whole lot of data. At least, we don't get very large volumes of data that people normally do when they're in the intelligence community. What we have are data for which we have very weak confidence, or there's sometimes missing and sparse. And so in that case, these DL models and ML models, they are very deep because we've got to go infer and impute what the inferences are with the weak and the shallow data, but we don't have a lot of it. And so that's why it takes a little longer for health problems to be resolved with machine learning.
Meghan Good (17:09): I really like the comparison of the stacking doll, so how you dig in from AI to ML to DL, and then with the understanding that it's not all about big data, because a lot of our conversations in this podcast so far have been focused on the importance of data and how it has come to play with software factories, with AI/ML. But I think your point that it's not always big data, it's that big workflow, and going deep into the data. So I want to continue on that AI/ML thread and ask you, can you explain how AI/ML analysis of healthcare data could help the healthcare mission?
Julie Rosen (17:52): Sure. I've mentioned several cases, applications, in the healthcare domain, using imagery to detect cancerous cells using bedside device data to forecast an adverse reaction, using clinical notes to detect possible suicide risk. Add to that the many genomics pattern recognition projects that forecast genetically linked diseases. I for one am collaborating with Leidos colleagues in support of the veterans administration's mission to process veteran's medical claims in order to identify the primary medical specialty most aligned with the clinician's notes on the pages of the patient's history and current complaints. Our next step in this R&D project is to create deeper data models that would identify patterns within those notes to detect comorbidities and link those comorbidities with what's called the ICD-10 codes or indexes, so that patients' records might be easily shared with other treating clinicians in a unified fashion.
Julie Rosen (18:58): For that kind of a problem, natural language processing, or NLP, is a wonderful blending of ancient studies of linguistics with the 21st century advances in computing. Apropos, your podcast topic. NLP is the automated investigation of text datasets. While this field of study goes back several decades, starting in post-World War II years when automated translation of Russian sentences into English, that example, a rule-based approach was effective. But now, we're in the 21st century, when jargon, dialects, acronyms, even misspellings abound on the text we rely on. As such, rules, and in many cases, traditional statistics are insufficient. In fact, as language evolves rapidly, we require models that can learn new patterns as the topics and tone in the text data evolve. And that's where AI, especially powerful machine-learning data models, come in. To complete the set of definitions I started earlier, NLP is the great beneficiary of two types of ML algorithms, supervised learning and unsupervised learning.
Julie Rosen (20:12): And right in the middle of that is a cross called semi-supervised learning. Supervised learning is when the domain specialists, the humans, label the text with what we call the ground truth, against which the algorithms' findings are compared for accuracy. While this approach requires significant time and effort from the domain experts, this is where data models begin their life, and it results in high accuracy when the trained and tuned algorithm is applied to incoming unlabeled data. But as increasingly available gobs of data are released, we can't expect the subject matter experts to label all of those data. As such, researchers try for a semi or completely unsupervised learning to perform elements of the larger NLP process. For example, we might use historical patterns of syntax in one domain and apply it to a different domain, which is called transferred learning. I mentioned my ongoing work in auto classifying, the specialty associated with clinicians' notes on a page in a veteran's medical claim. The accuracy and throughout performance found in our IRADS for the last two rounds of testing against ground truth are showing great promise.
Julie Rosen (21:30): And while the full set of classifiers are not yet developed in a production environment, we have already deployed two classifiers for audio and psychological specialties, and we are getting closer and closer each time we come and look at data.
Julie Rosen (21:46): A similar approach, I'll be much more detailed in scope, is it the core of the VA's medical adjudication program, another R&D project funded in the health group. The research is still a long way off, but envision how much more quickly the VA might approve a veterans benefit claim if a proven, accurate and trusted automated system could identify medical specialty or diagnosis, perhaps death benefit need, education need, and so much more.
Meghan Good (22:18): So given all of those definitions, which I agree with your mathematician perspective, that's really important to know exactly what we're talking about. I'm wondering, over the last six years that you've spent looking at this health data, what have you been the most surprised with of those different categories that's really seems promising going forward. Your last discussion said NLP, but across the whole spectrum that you presented, what are you thinking is really going to show promise going forward?
Julie Rosen (22:51): Well, if we're talking in the larger scientific community, no question that genomics and proteomics and all of the omics out there have the greatest potential to improve targeted medical therapy. And in that field of genomics and proteomics, we do have a very, very impactful set of folks working up at the Frederick office. We not only have the Frederick National Labs that everybody's familiar with, but we have what we call the Life Sciences Division in the health group. And in that group, we are looking at peptides, and those peptides now are being used as targets, if you will. They're being used as biological missiles, and they're going after these viruses.
Julie Rosen (23:34): They've made great strides over the last 10 years going after malaria, and we really are contributing to tamping that down. We're using that same kind of protein material to see if we can go after other viruses, importantly, orphaned viruses. The genomics that one takes from those proteins and from a patient now might be analyzed to see if a given therapy can be targeted with high precision to help heal the patient without harming the patient. That said, the bulk of what we do in the health group is not genomics. We do more categorical comparisons. And in that case, the text-based modeling, I believe, has the greatest potential to heal patients, improve claims processing and help improve the health community at large.
Meghan Good (24:32): As one of our tech fellows, I know that a passion of yours is cultivating the next generation of technologists, or techies as you call them. So I want to know, what guidance do you have for future technologists and those just starting out in their careers?
Julie Rosen (24:49): Thanks, Meghan. I appreciate you extending the question out to our colleagues, and good on you for having been inducted into the technical fellows two years ago. As for me, I'll start with my strength and depth, which continues to provide a very promising career here at Leidos. Today, here, the term data scientists refers to researchers create data models and algorithmic methods to analyze big, highly complex datasets in order to find patterns. Often, these patterns are not yet prescribed, and most often they're too difficult to detect with traditional roles and static statistical methods. These professionals typically operate at the basic R&D end of the maturity spectrum, proving the accuracy and efficacy of an AI model or method. At the opposite end of the technical maturity curve are data analysts, who employ these proven models, methods, and tools, and they could be third-party tools, not just developed by Leidos, to investigate mission-specific data and supportive decision-making.
Julie Rosen (25:59): Data analysts' mission is to impact the bottom line. As such, it is very important these professionals know their consumers' domain, the all-important mission, and what questions are addressable with exploration of the data. If we thought of these career paths as a Venn diagram, the Leidos data engineer overlaps with the data scientist to get the models and methods matured to a scale and deployable to, in an end to end pipeline, and with the data analyst, to make the pipeline usable and maintainable. Data engineers have similar skills to generalized system engineers and solution architects. They start off with specs from a customer and work with third-party vendors to integrate commodity tools with custom developed and optimized models and methods. And quite often, Leidos folks will be the ones to go in to optimize those models and methods. Data engineers have the additional charge of understanding the impacts of design options to ingest and curate these big, dynamic, fast arriving, diverse, unclean mission-specific data sets.
Julie Rosen (27:12): The data engineer is charged with making the data scientists' accuracy proven models run faster, at the scale of the incoming data, and output the analytic findings through visualization, explanation and/or a communications component. Now, let's expand that lens to include the many fields of study represented by the tech fellows. I'm sure your listeners know that data science is a new term for things that we've been doing for decades, but Leidos, both formerly SAIC and the IS and GS elements, they've been doing sciences for a long time in physics, in biology, in chemistry, in engineering, in specialty engineering, and all of these align with the company's technical core competencies, or TCCs. In my belief, the 21st century can be your oyster if you stay current in both formal education, and that includes both university-level degrees and tech-specific certifications.
Julie Rosen (28:20): Contrary to the world I entered after graduation, your journey doesn't have to be straight along a given field of study. In fact, 21st century businesses like Leidos want to hire and retain intellectually curious technical staff members.
Meghan Good (28:35): Yep.
Julie Rosen (28:36): A foundation of understanding of science and engineering is critical to remaining on top of your fields. But since the half-life of a given technology may be as long as a little teeny tiny fly's lifespan, your interest in and your ability to self-teach emerging fields of practice is important. This ability to expand what knowledge and hands-on experience with a full appreciation of the fundamentals that leads to readily understandable articulation of a Leidos concept and differentiated technical capability is where the tech fellows' value comes to the business. I hope your listeners are aware of our four Cs charter, but if not, I'll tell you what they are. See the creation and the vetting of new concepts. That gets followed with the communication of those concepts and the larger Leidos portfolio to potential customers, potential teammates and vendors.
Julie Rosen (29:35): Then, we go through a collaboration process across the company, with teammates and academics to mature the concept through the prototype stage and into a production. And then, we want to make sure we cultivate the next generation of scientists and engineers at Leidos and in the larger technical community. On behalf of my fellow Tech Fellows, please reach out to any of us if you have questions about your technical career, or you want to chalk talk a particularly challenging technical problem. Check out our technical core competencies on leidos.com and we'll put you in contact with a Fellow.
Meghan Good (30:14): I really think our technical core competencies and our Tech Fellow program is one of the things that sets Leidos apart, and that could be a whole another episode itself, but as we wrap up our conversation today, I'm curious, what final advice do you have for our listeners?
Julie Rosen (30:32): Sure. Again, Meghan and Bridget, I thank you for inviting me to talk to your podcast audience. I do think there is one final and yet crucial point I want to make about healthcare data and their appropriate use to help healthcare policymakers and providers. It's all about that trust, which relies on this thing we call personally identifiable information, PII, and the protected health information, PHI. I've spoken about the relevant and culturally aware communication of the findings of data analysis. A tremendously important part of that communication is that trust dimension. Yes, we must confirm we use the right data in a clinically relevant investigation, but, and this is a really big but, the ramifications of access to private information is a huge challenge for gaining trust of the consumer, as well as trust from the data provider. And those data providers, remember in the case of the VA, are the veterans and their family members.
Julie Rosen (31:37): But the PHI trust lens is at the heart of the current planning on COVID-19 contact tracing. In my work with VA data, I was exposed to the concerns of the veteran population, who are in a never-seen-before suicide rate. New research is underway to develop models to detect increased risk over time at the individual level. One of the most impactful statements I've ever heard in my professional career was from the health groups' former chief information security officer, who said, "If we data scientists and data engineers didn't treat the data and the computing environments with absolute caution, then military members and veterans will not provide their data." And then health concerns, including suicidal ideation parameters, could not be detected, treated or mitigated. Since that moment, whether I'm performing on an internal R&D project, or in a contract with a federal or commercial health agency, I'm like a mother tiger in protecting access to and use of private information.
Julie Rosen (32:45): Only with such protections can the researchers and domain experts work as a team to develop data models that accurately reflect and learn from collected data. Remember, analysis models, while potentially powerful at early detection and discovery at patterns, requires many more data points, and as many features as those data points can provide, than the traditional statistical models.
Julie Rosen (33:10): Retaining that PHI-specific parameter has inherent risks if the source data, the aligned data, the personally specific analyzed data and the analytics findings are communicated outside of the patient, the clinician and other approved members. I won't go into detail here, but I do believe that proper handling of data and computing systems enabled safeguards can reduce, if not eliminate, the risk of compromise. In parallel, encoding of the individual's identity addresses inadvertent authorizations and disclosure.
Julie Rosen (33:49): One final emerging field I'll mention regarding this encoding is the field of research called homomorphic encoding, and that has the possibility to enable the analysis of encrypted data, generating encrypted analytic findings, and that process, that field of study, is showing promise for future implementations at scale. Again, thank you so much for having me on your podcast. I hope you have a whole lot more.
Meghan Good (34:18): And thank you, Julie, and thanks to our audience for listening to MindSET. If you enjoyed this episode, please share with your colleagues and visit Leidos.com/MindSET.