We have developed a machine learning and rule-based tool using natural language processing which allows a user to upload a trial protocol, and which categorises the protocol as high, medium or low risk of ending uninformatively. The tool is at https://clinicaltrialrisk.org/tool and is open-sourced on Github. You can read an explanation of how the tool works here, and a description of how we validated its accuracy here.
There are several indicators of high risk of uninformativeness which can be identified in a protocol, such as a lack of and or an inadequate statistical analysis plan, use of non-standard endpoints, or the use of cluster randomisation. One of the most common causes of a trial ending uninformatively is underpowering. Low-risk trials are often run by well-known institutions with external funding and an international or intercontinental array of sites. These indicators can be referred to as features or parameters.
This project is an initial Proof of Concept (POC) which to showcase what is possible with natural language processing, with a view to moving towards a more comprehensive main project which may identify a more complete set of cost, complexity, or uninformativeness risk factors.
The tool is designed with a feedback form so that inaccurate data extractions can be reported back to the developers.
In addition the MIT License means that you are free to add features or extend the scope of the tool.
We hope that researchers who are considering submitting a protocol of a trial to a prospective source of funding will be able to use the tool as a kind of checklist to ensure that their trial is designed to reduce risk and increase the prospects of being funded.
Introduction People have asked us often, how was the Clinical Trial Risk Tool trained? Does it just throw documents into ChatGPT? Or conversely, is it just an expert system, where we have painstakingly crafted keyword matching rules to look for important snippets of information in unstructured documents? Most of the tool is built using machine learning techniques. We either hand-annotated training data, or took training data from public sources. How We Trained the Models inside the Clinical Trial Risk Tool The different models inside the Clinical Trial Risk tool have been trained on real data, mostly taken from clinical trial repositories such as clinicaltrials.
Over the years, the overall cost of the drug development process has been exponentially increasing, prompting the adoption and use of adaptive clinical trial design software. Though there are practical difficulties and barriers in implementing clinical trial solutions, these problems are adequately addressed to overcome these issues as they arise. With advancements in software technologies, further improvements are being made to the software’s adaptive clinical trial design. Despite these progresses, just only a handful of well-established software with various types of clinical trial adaptations is currently available.
A clinical trial protocol is a document which serves as the step-by-step playbook for running the trial. The clinical trial protocol guides the study researchers to run the clinical trial effectively within a stipulated period. The prime focus of the clinical trial protocol is to ensure patients’ safety and data security. [1, 2] As the clinical trial protocol is an essential document for the seamless execution of the clinical trial, reviewing (peer-reviewing) the protocol is essential to ensure the scientific validity/viability/quality of the protocol.