Stitching the Pieces Together

What $100 Million, 600 Terabytes, and a Lot of Careful Thought Will Do for Pittsburgh
Summer 2014

Some recent headline-making news:

  • In November 2013, the FDA sent a letter asking 23andMe to stop assessing health risks for the genes it decodes.
  • In March, Illumina (a biotechnology company) began shipping a machine that can sequence a human genome for less than $1,000.
  • The health exchanges mandated in the Affordable Care Act survived an infamously bumpy launch.

These events will play into or add texture to the issues surrounding the rollout of UPMC’s five-year, $100 million plan to bring personalized medicine into its clinics, announced in fall 2012. That effort, which will use patient data and analytics to develop and optimize treatments, will also guide and inform research at the University and beyond. As the undertaking entered its second year, Pitt Med put together a floating roundtable with some of the key people shaping this historic effort: Steven Shapiro, an MD who is chief medical and scientific officer for UPMC and professor of medicine at Pitt; Jeremy Berg, a PhD who is director of Pitt’s Institute for Personalized Medicine, associate senior vice chancellor for science strategy and planning, health sciences, and Pittsburgh Foundation Professor of Personalized Medicine and of computational and systems biology; Adrian Lee, who directs the Women’s Cancer Research Center (for the University of Pittsburgh Cancer Institute and the Magee-Womens Research Institute) and is a PhD professor of pharmacology and chemical biology; Lisa Parker, director of Pitt’s Master of Arts in Bioethics program and a PhD associate professor of human genetics and of behavioral and community health sciences at the Graduate School of Public Health; and Lisa Khorey, who recently stepped down as UPMC’s vice president of enterprise systems and data management. Their comments are condensed here.


We’re more than a year into this five-year project. What’s happened?

Khorey: The bulk of the work we’ve been doing in the last year is system implementation. We’re building an information management factory. [There’s] a lot of digital data. It’s 29 applications from three different vendors, 47 different servers, two hardware appliances (one for loading multimillions of records, one for real-time data integration). It took 11 months, but that’s all done. Check the box.

Berg: Adrian [Lee] has used the term “data graveyards,” because all that data takes up a lot of disk space, and nobody can figure out what to do with it. You have to decide which genomic data you actually collect—you don’t want to collect it first and then sort it all out. The biggest things that I’ve been focusing on in the last year are data management and manipulation—and usability.

Shapiro: The data is starting to get moved, and we’ve done some small use cases. Our high-volume academic cardiologists were using thrombectomy catheters, for treating acute heart attacks, pretty routinely. Then a big New England Journal of Medicine article came out saying there’s no evidence that these catheters work. They looked at only one month’s worth of data. We didn’t see anything in one month either. But looking at the first three months of data at the end of last year, we found that mortality rates dropped from 15 percent to 10 percent, and the stenosis rate went from 15 percent to 5 percent. It’s six months of data we want, but we’re getting close to saying, Hey, we should use this thing. We think the NEJM study has it wrong. We think the analytics will give us the right answer.


What are the practical challenges of having all this data?

Berg: It’s hard to manage. There are already more than a million files—six or seven hundred terabytes of data. [For comparison, the Library of Congress’ Web archives take up about 525 terabytes.] It’s a hugely complicated informatics and computer science challenge to store the data and track which version is which. We’re involved with a project [a partnership between Pitt, UPMC, and the Pittsburgh Supercomputing Center], … what’s called the Pittsburgh Genome Resource Repository, to get all the data organized and manageable. We want to get to a point where it’s sort of like writing a Google request and getting the answer quickly, rather than having a year-long project to get the answer and then find out that you need to ask a different question.

Lee: Your genome has 3.2 billion base pairs, so in sequencing, an error rate of even 0.1 percent is a problem. Data governance, data control, data versioning become unbelievably important. Also, we’re changing the way we share data and the way we work. My lab, which is a biology lab, now spends a lot of time working on the high performance computer. We need to increase our storage. Data transfer is a problem because we can’t move these large data sets around. Our network wasn’t built for that.

Khorey: In terms of loading data, we started with the last two years’ worth. There are 250 million lab tests and about 38 million conditions [in our records]. So far we have incorporated data from 15 clinical systems. We need to make some decisions around what is the most valuable data versus our impulse to load all the data. The hungry man is starving and wants everything at the buffet, but that’s wasteful.


What’s next for UPMC?

Shapiro: In the thrombectomy catheter example, if the data show what we think [they] will, then the question is, It costs $900; do we really need it for everyone? It looks like if you have it, your hospital stay is two days shorter; so it’s already cost effective. Then we can look and see if everyone needs it, or if it only works for some people. That’s the concept. [In late winter, we were just] preparing for the data. We [started] moving the really large data in April.

Berg: There are close to 100 drugs for which there’s reasonably good information about the genetic variations that are important in terms of how a patient will react to the drug. Different people respond to the same drug in different ways, in part due to genetic background. The question is, How to get that into practice? If the genetic information were already in the patient’s chart, physicians would absolutely use it in prescribing drugs. If they have to order an additional test, that becomes a much different proposition. The challenge for the whole field is how to collect the relevant information for people who are likely to get specific drugs prescribed.

In the planning stages is a project being driven by David Whitcomb [MD/PhD chief of gastroenterology, hepatology, and nutrition. See "Trying on Personalized Medicine."] He’s working on a research study that would use genetic and other information to try to more accurately assess the risk of disease progression for patients who come in with pancreatitis. Many patients have one episode and never have another after that. Other patients develop recurring episodes and then chronic pancreatitis, with irreversible tissue damage. We’re looking at kidney disease and other areas with different disease conditions that have a commonality to see if new tests may be relevant in a clinical setting.

Khorey: Our next six months will be spent refining operational processes involved in data movements, data interpretations, and analytics. We will prepare additional data source systems in collaboration with targeted user groups across UPMC, such as clinics, the health plan, and finance.


What are the ethical issues you’ve identified?

Shapiro: Jeremy and Lisa [Parker] are working on [various issues], from Is it opt in or opt out when we sign patients up for genomic sequencing? to If we do sequencing, what needs to happen if we find something? [Like, what if the doctor learns that the patient has a risk for a disorder that wasn’t the purpose of the visit?]

There are many ethical issues around getting consent, incidental findings, and preparing patients for this.

Parker: There really are two issues with regard to privacy: hackers and risk of exposure. If the data is exposed, what’s the risk? The Genetic Information Nondiscrimination Act, passed in 2008 at the federal level, means health insurers cannot refuse to cover you because of a preexisting condition. And you can’t be charged a higher premium in light of having a genetically based increased risk. Employers are not allowed to use genetic risk information. Life [insurers], long-term [care] insurance, and disability [insurers] could still make use of the information; they want to know the appropriate actuarial pool to put you into.

Berg: If you collect genetic information to test for one condition, for some people you will see things that don’t have anything to do with why you originally were doing the test. [Yet the genetic information the doctor uncovers may have] health consequences some people might want to know about.

The American College of Medical Genetics and Genomics issued a report [stating that doctors have] an ethical obligation to tell people [such information]. That goes against the grain of a lot of ethicists who place more emphasis on patient autonomy.

That’s the tip of a big iceberg. I remember when the gene for Huntington’s disease was identified. If you’ve got the bad form of it, you’re very likely to develop Huntington’s, an awful disease. I couldn’t imagine not wanting to know that. The reality is that something like 80 percent of people don’t want to know.

What about the risk of a class of genomic fortune tellers emerging?

Parker: You can’t read genomes like tea leaves. Or maybe you can; tea leaves are not particularly reliable. The biggest risk of harm is that people themselves will, in some sense, misuse their own information. Or they’ll get their genome on a chip at age 30, find out their risk, and not realize that this is an evolving science. That genome needs to be reinterpreted again at 35 and 40, because we’re [continually] learning new things.

Otherwise, it’s not obvious what we’d want to do with someone’s genetic information. Maybe you could embarrass me, especially if I’m a celebrity or running for public office. People [might ask questions like], Do we want a president who might become diabetic while in office? But that’s not so much [an issue of ] the genetic information as it is the social structures.


Are we seeing personalized medicine yet?

Berg: You know, the first thing I did when I got here was try to find a better term than “personalized medicine.” If I were a clinician I would find it insulting: “Wow, we’re treating patients as individuals; wish we’d have thought of that 5,000 years ago.” “Precision medicine” is the other term, but it’s still a little bit presumptuous at this point.

It’s in its infancy. There are some things for which the genetics are relatively clear and the research is relatively strong and the information you get from the genetic test is actually pretty deterministic. For other things, there are literally hundreds of different genes that contribute to disease risks. And how they interact with each other is not really well understood. Medicine is always going to be stuck with these unpleasant probabilistic outcomes. Our percentages will be better, but there will still be uncertainty.

Lee: Pretty much every tumor is different. We have new tests that utilize these new technologies to screen for mutations so we can give targeted therapies specifically for genetic manipulation in that tumor. Unfortunately, cancer’s pretty clever and most of the time finds its way around what we do to it. We have a long way to go.


Will it cut costs?

Shapiro: I would like to be able to tell you how we’ll determine who’s at risk for readmission, what makes someone need to come to the hospital so frequently. It’s probably too early. [For example], we’d love to know what characteristics of breast cancer tell us [which] 25 percent of patients … will do badly with minimal treatment.


How will patients respond to genomic/personalized medicine?

Berg: That’s another area of research: Does genomic knowledge motivate people or not motivate people? If you find out that you have susceptibility to a disease, do you lose 20 pounds? Or do you say, Oh well, it’s in the genes. I’m going to have a donut.

Lee: In our high-risk clinic, people often refuse genetic testing, and there’s a lot to be said for that. It will take a while for it to become routine, as people become comfortable [with the idea] and insurance figures out what it wants to pay for. It’s causing a revolution in diagnostics and therapies, and the system needs to adapt to that.


What will it mean for doctors?

Shapiro: Every day we’re finding more and more genetic and genomic markers for tumors. At some point we will use them clinically. When we do that, the average oncologist [won’t] know what to do with this genetic information. The analytics themselves need to give them actionable information. The future is to allow the analytics to be a guide for clinicians as opposed to [telling them], Hey, go read these articles. It needs to be bedside.

Lee: We have huge capacity to create data but limited ability to turn the data into knowledge. You used to have your cholesterol and your height and weight [for a physician to assess]. Now, you’ve got 3.2 billion base pairs. If you walk in with your genome, what does the physician do with that? What about when people will come in and say, Oh, I had it sequenced somewhere. Do we accept that? Do we resequence? The current idea is it’s so expensive to store this stuff, and so cheap to [sequence], we’d just redo it.


What does the future hold?

Berg: These things are going to take time. We have to be aggressively patient. Personalized medicine is going to be a major revolution in health care and society in general.

Shapiro: In the last five years we’ve doubled the amount of things we knew in medicine. The textbooks are large enough and medical school is long enough. In some ways, it’ll be easier to teach students; there won’t be as much need to memorize. As we get more information, I hope we’ll simplify the pathways we’re teaching. Research is starting to change. We’re moving from an era where we have a specific hypothesis, a very reductionist approach of looking at a candidate gene and seeing what it does, to one where we say, Let’s generate hypotheses and let the data look at everything so we can come up with better things to ask.

Lee: The human genome now takes roughly a day and $1,000 to sequence. When people received the microscope and could see bacteria, that transformed medicine. The sequencer is like that microscope for the genome.

Shapiro: This is a long-term process. We are doing the hard work now without the glory. … We don’t have a lot of results now, but they’re coming soon. With all the challenges in health care, we’re seeing this as a big investment. Everyone needs to make it.