Article | January 30, 2017

Why A New Approach To Data Aggregation Could Mean Big Gains For Study Sponsors And CROs

Source: ThoughtSphere
ata Aggregation Could Mean Big Gains For Study Sponsors And CROs

By Sudeep Pattnaik, Co-founder & CEO of ThoughtSphere

Data can change the world, but only with the right tools specifically designed for collecting, synthesizing, and analyzing it.

Data’s ability to positively impact clinical research cannot be overstated, but these improvements are only possible with the right technology solutions. Clinical trials have become increasingly complex, both generating and drawing on a variety of data sets that reflect advancements in technology and social media, such as wearable devices by patients or patient reported outcomes via social media.

For study sponsors and CROs, collecting and managing data is often a pain point. They are spending too much time aggregating data, integrating siloed disparate data sets, and identifying and profiling data, which can lead to an error-prone clinical medical review process. But all this data also represents opportunity, because the insights gleaned can lead to better decision-making. By harnessing the power of data, the people running clinical trials can improve various aspects of trial management, including risk management, early signal detection, clinical oversight, and quality management.

The good news is that the technology needed to facilitate a better, more holistic approach to data aggregation exists and is already in use across a variety of industries. A modern approach to data aggregation has the potential to drive tremendous gains in efficiency, accuracy, and cost-savings for clinical trial sponsors and CROs, and ultimately help us get new treatments to the market faster.

The Disparate Data Sources Challenge

Data from clinical trials comes from many different systems stored in many different source formats, including EDC systems, clinical trial management systems (CTMSs), IXRS, ePRO systems, labs, eTMF, safety systems (Argus, ARISg), Excel spreadsheets, project management systems, contracts, and patient records, as well as unstructured data sources such as publications, social media, EMR/EHR, and data captured in comments in the clinical system.

But most of the tools available today cannot easily integrate different data sets captured in different formats. Data is typically stored in silos and cannot be cross-analyzed or synthesized. Clinical trial sponsors and CROs are then forced to rely on a combination of software and paper-based processes. Managing data is typically time-consuming and costly.

A superior approach to data aggregation would afford a near real-time, proactive, and continuous analysis of data by making it easier to work with different data sources. This would help site sponsors and CROs better manage clinical trial processes and glean a holistic view of the trial. The approach needed to do so is already being used in many other industries and powers data integration platforms that enable users to aggregate data from a variety of sources, in near real-time. This type of platform is built on a data lake, as opposed to a data warehouse—the architecture used by many clinical trial data management solutions today. 

A More Flexible Approach To Data Management

Some of the world’s most innovative companies have already adopted data lake-based systems. A data lake-based system ingests data as is, in its raw format, whether structured, semi-structured or unstructured. Data warehouses are less flexible about what type of data they accept. Data requirements are established before data is ingested, and only data that meets those standards can be added to the system.

The rigidity of data warehouses is particularly troublesome in the biopharmaceutical space because clinical trial software vendors frequently roll out new platform capabilities, which makes keeping the data warehouse up-to-date challenging and time-consuming. Data lake-based systems alleviate this headache because they can ingest data before it is transformed to fit into a predefined model. A non-relational noSQL database allows scalability and flexibility in accommodating changes in a near-real time basis.

A more flexible data aggregation solution would also have advantages over point to point integration, in which a vendor connects directly to clinical and non-clinical sources via a connector. The trouble with these hardcoded connectors is that they are built to connect specific point systems (i.e., Medidata RAVE, Oracle InForm, Oracle Argus, Veeva TMF, Oracle CTMS, etc.) to specific BI and analytics platforms (i.e., Oracle OBIEE, IBM Cognos, Spotfire, Tableau, QlikView, etc.). It is not a source system-agnostic approach. If a data source or analytic platform changes, so must the connector. 

Because clinical trials draw on so many different data sets from a variety of vendors and platforms, adopting a data lake approach could afford tremendous gains, especially if the data architecture is coupled with an effective and easy-to-use data analysis tool. Then, researchers could easily make use of the myriad data streams at their fingertips and better keep up with changes.

A More Holistic Approach To Data

Improving the data aggregation process paves the way for better analytics. If trial sponsors and CROs could more easily access and analyze data from disparate sources, they could make smarter decisions based on a more holistic and accurate understanding of their clinical research.

The biopharmaceutical industry will continue to grow and change. Companies need a flexible data aggregation solution that easily grows and changes, too. A more flexible and scalable solution like that powered by data lakes can improve clinical trial processes and afford a variety of improvements, including reducing monitoring costs through a risk-based approach, improving the time-consuming and error-prone clinical medial review process, and improving data quality through performance-linked investigator payments. Improvements like these not only help companies succeed in the rapidly evolving biopharmaceutical industry—they also facilitate gains in clinical trial processes that ultimately benefit society at large.

To learn more about ThoughtSphere visit

About the Author

Sudeep Pattnaik is President & CEO of ThoughtSphere. Prior to founding ThoughtSphere, Sudeep was the global leader of products for Quintiles, the largest CRO Fortune 500 life science company in the world, creating and leading the strategy team behind a $60M integrated healthcare data hub. He also played a key role in defining the Risk-Based Modeling approach for optimizing the clinical development process and helped develop a best-of-breed RBM platform for the industry. He holds and MSc in Computer Science from Uktal University (India) and an MBA from Leeds School of Business at the University of Colorado, Boulder.

Email contact:

About ThoughtSphere

Founded by clinical information and technology industry experts with over 30 years of experience from the leading global CRO and eClinical providers, ThoughtSphere’s mission is to streamline and empower the clinical trials process by eliminating the two biggest challenges--integration of disparate clinical and operational data and making it accessible for use with existing tools for analytics and visualization. With this innovative platform, biopharma, medical device sponsors and CROs can reduce and optimize clinical development costs, aggregate operational and clinical data to enhance efficiency and effectiveness in the clinical trial processes and gain near real-time actionable insights. The product suite includes ClinDAP, the source-system agnostic next-generation data integration platform; ClinACT, the interactive visualization and analytics platform that enables RBM and CRO Quality Oversight; and SPACE, the integrated site budgeting, payments and contracting solution. At ThoughtSphere, we believe we can deliver on the promise of big data to drive health innovation. For more information, visit