Data Warehouse Offers Priceless Insights, University-wide Access

Could friendship be making Americans fat?

That was the question Harvard researchers tried to answer in 2007 as they reviewed the data trails of some 12,000 people. Their conclusion: Having an obese friend increases the risk of a person being obese by 57 percent.

Ten years later, Daniel Abrams, engineering sciences and applied mathematics, is exploring how American obesity might have changed in a cohort 100 times larger. The million-person data pool is thought to be the most expansive look yet at the effect a person’s social world has on their weight. The research project is one made possible by an exceptional Core Facility — Northwestern Medicine’s renowned Enterprise Data Warehouse (NMEDW).

The Office for Research oversees 45 Cores, with many more, like the NMEDW, housed within specific schools. The facilities play a pivotal role in Northwestern’s research infrastructure by providing access to ultramodern instrumentation that is often too expensive for any single researcher to purchase.

“I’ve had a longstanding interest in human social behavior models, but I likely would not have pursued this project if I hadn’t previously been exposed to the NMEDW,” says Abrams, who has published more than two-dozen papers on topics ranging from the prevalence of smoking to the movement caused when people walk on bridges. “At the heart of my research is mathematical modeling. For this ongoing obesity project, the NMEDW provided the vast amount of data I required to statistically test new models.”

The NMEDW is a joint effort between the Feinberg School of Medicine, and Northwestern Memorial HealthCare that consists of electronic health records (EHR) from more than 6 million people. The 100 billion or so datapoints are the essence of a 65-terabyte-data warehouse that includes data from the mid-90s through yesterday — new information is added nightly with a 24-hour lag.

“The NMEDW is already one of Northwestern’s greatest core assets and an ongoing multimillion dollar modernization further advances the analytical capabilities of the institution,” says Shakeeb Akhter, NMEDW director, of upgrades to be complete by the end of 2017. He notes that the modern EDW will support advanced use cases, such as predictive modeling, text analytics, and machine learning. “Along with modernizing the analytics platform, the EDW team is also implementing a self-service business intelligence tool to democratize usage of data across Northwestern. In addition, the EDW Reporting Portal is undergoing an end-to-end redesign to provide a seamless user experience when interacting with data.”

Although the NMEDW is housed within the Northwestern University Clinical and Translational Sciences Institute (NUCATS) at Feinberg, it is a powerful resource whose data trove is accessible to any Northwestern investigator — for as little as $75 an hour. 

“It’s not just clinical data  —physician notes, diagnosis codes, test results — either,” says Akhter. “There are financial and vast amounts of text data being leveraged by research teams within the medical school, and beyond.”

Current projects include studying patterns and outcomes of Medicare patients (Kellogg School of Management) and a proposal to research the impact of healthcare on human capital (Pritzker School of Law).

“The NMEDW could easily be tapped by social scientists such as anthropologists, psychologists, and sociologists,” says Kathleen Murphy, Institutional Review Board manager. “The shear number of health records also makes it a remarkable tool for recruiting research participants.”

There is no upfront cost to use the NMEDW and a web-based feasibility tool known as I2B2 allows faculty, students, and staff to develop and run simple queries against a subset of data. The goal is to provide users the ability to quickly determine the feasibility of conducting a specific study.

If investigators opt not to use the self-service tool, or else have concluded a cohort search, they can send a request to the NMEDW support team for further assistance. An analyst will review the request within 24-48 hours, setting up a one-on-one meeting with the user before establishing a data abstract to detail the scope of the work.

The NMEDW research analytics group — five analysts and two data architects — helps investigators navigate and extract the vast amounts of data that comprise each electronic medical record within the data warehouse.

“Once we have identified what data is of interest to the investigator, we are then able to convert it into whatever format best suits their analytical needs,” says Daniel Schneider, manager of research analytics at NUCATS.

Because EHRs within the NMEDW come from various electronic medical record systems (i.e., Epic, Cerner, Athena, etc.), querying data is complex. As part of a new initiative called Project One, the EDW is developing integrated data structures that will consolidate and remove redundant medical data. This will result in increased efficiency and productivity improvements when querying EDW data for research or healthcare operational purposes.

“With these modernizations, the NMEDW will serve as a one-stop shop for data,” says Akhter. “And we will continue to be among the world’s most comprehensive and integrated repositories dedicated to facilitating research, clinical quality, healthcare operations, and medical education.”

By Roger AndersonMarch 13, 2017