How machine-learning and NLP are revolutionizing biopharma R&D

Machine learning and other artificial intelligence (AI) technologies are generating considerable excitement in the biopharmaceutical community due to the potential to revolutionize pattern identification, predict successes and failures, improve research decision-making, and ultimately accelerate the discovery of new drugs.

Contributing to the success of machine-learning efforts is the growing wealth of unstructured data from clinical records and pharma data. Thanks to the widespread adoption of electronic health records (EHRs), the increased sophistication of insurance claims databases, and the growth of social media, biopharma firms have access to an abundance of clinical data that can be leveraged to provide insights into the real-world impact of therapies on patients. With the right technology, biopharma firms have the potential to speed the development and commercialization of their offerings and advance efforts to improve the delivery of care.

Extracting new drug insights with machine-learning

New, practical examples of use cases that apply machine-learning and statistical models to drug discovery are further fueling industry attention and interest into machine-learning’s potential for the industry. For example, in a recently published journal paper, Eli Lilly researchers describe how they have identified potential new uses for existing drugs by mining serious adverse event data in ClinicalTrials.gov to calculate ranking statistics for the treatment-indication association. The authors describe how insights were extracted from hundreds of unstructured ClinicalTrials.gov records using natural language processing (NLP). The insights were then used to feed machine-learning predictions.

Similarly, in a 2016 publication, researchers from Roche and Humboldt University of Berlin describe how they used NLP to systematically identify all MEDLINE abstracts containing both the protein target and the specific disease indication of a known set of successfully approved or failed cancer therapeutics (for
example, abstracts containing both “Her2” and “breast cancer”, or “c-Kit” and “gastrointestinal stromal
tumor”). The researchers applied machine-learning classifiers and found that the NLP-extracted data insights successfully predicted the success or failure of target-indication pairs, and hence, approved or
failed drugs.

NLP: an essential ingredient making unstructured data accessible for machine-learning

Machine learning offers great promise for the advancement of biopharma research efforts. However, most machine-learning methods require large volumes of high quality data in order to identify patterns or make predictions. Many machine-learning projects are currently failing to go ahead due to lack of structured data. It is thus critical that organizations make effective use of unstructured data as well as structured data, whether from clinical records or pharma data. NLP text-mining tools allow users to extract insights from unstructured data using relevant ontologies and focused queries.

For example, queries can be written to extract information on treatment patterns to identify drug switching or discontinuation. Queries can extract and normalize numerical data such as lab values and dosage information, as well as patient-specific details such as history of disease, problem lists, demographics, social factors, and lifestyle. Different business rules may be used for particular data sets, for example when analyzing adverse events from drug labels versus tweets versus EHRs.

The use of NLP text-mining tools is essential for organizations seeking to jumpstart their machine-learning process and minimize the need for manual data coding. By leveraging appropriate NLP text-mining tools, biopharma companies can enhance the value of their machine-learning efforts. For example:

• Semi-supervised NLP eliminates the need for analyst annotations of raw data

Semi-supervised approaches to NLP allow a data-driven approach from the raw data. This contrasts with fully supervised methods for NLP that require high-quality hand-annotated data. Hand annotated data is difficult to obtain for most tasks, and is costly and time-consuming to generate because of the required expert scientific knowledge. Due to the expense of creating the annotated data, data sets can often be too small, containing few instances of the relevant concepts or relationships. Semi-supervised techniques using an agile NLP system can produce results in much less time, since evaluating samples from NLP text-mined query results and comparing differences between query runs requires just a few hours of specialist time, rather than weeks or months. Semi-supervised techniques can also provide better quality, since relevant samples can be generated from very large sets of unannotated data.

• Agile NLP can provide flexibility when dealing with uncertain business problems

Agile approaches to NLP can enhance the flexibility of machine-learning models by allowing researchers to quickly make modifications to existing models. This is especially beneficial when users need to expand a list of features once they begin analyzing real data. An agile NLP system can extract new features from the unstructured data within minutes, allowing fast evaluation of whether the features are likely to be useful or not. If the features are found useful, queries can then be refined to improve the precision and recall of the extraction, using a data-driven, semi-supervised approach.

The growing wealth of clinical data and new machine-learning technologies are benefiting biopharma firms as they seek to discover new drugs and find new uses for existing drugs. With the addition of NLP text-mining tools, biopharma companies can accelerate their efforts and reduce the time and money required to achieve research and development goals.

Jane Reed is Head of Life Science Strategy at Linguamatics. She is responsible for developing the strategic vision for Linguamatics’ growing product portfolio and business development in the life science market. Jane has extensive experience in life science informatics. She has worked for more than 20 years in vendor companies supplying data products, data integration and analysis, and consultancy to pharma and biotech—with roles at Instem, BioWisdom, Incyte, and Hexagen. Before moving into the life science industry, Jane worked in academia with post-docs in genetics and genomics.

The views, opinions and positions expressed within these guest posts are those of the author alone and do not represent those of Becker's Hospital Review/Becker's Healthcare. The accuracy, completeness and validity of any statements made within this article are not guaranteed. We accept no liability for any errors, omissions or representations. The copyright of this content belongs to the author and any liability with regards to infringement of intellectual property rights remains with them.

Copyright © 2024 Becker's Healthcare. All Rights Reserved. Privacy Policy. Cookie Policy. Linking and Reprinting Policy.

 

Featured Whitepapers

Featured Webinars

>