Frequently Asked Questions


When a pharmaceutical company develops a drug, it needs to pass through several phases of clinical trials before it can be approved by regulators.

Before the trial is run, the drug developer writes a document called a protocol. This contains key information about how long the trial will run for, what is the risk to participants, what kind of treatment is being investigated, etc.

The problem is that each protocol is up to 200 pages long and the structure can vary. There is not a standardised way of noting the intervention, number of participants, locations, and so on, although there exist in-house standards within many pharma companies.

The Clinical Trial Risk tool is a tool that helps funders and pharma companies identify risk factors in a trial protocol using natural language processing and rate the trial as high, medium or low risk.

Wood TA and McNair D. Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness [version 1; peer review: awaiting peer review]. Gates Open Res 2023, 7:56 (

A BibTeX entry for LaTeX users is

	doi = {10.12688/gatesopenres.14416.1},
	url = {},
	year = 2023,
	month = {apr},
	publisher = {F1000 Research Ltd},
	volume = {7},
	pages = {56},
	author = {Thomas A Wood and Douglas McNair},
	title = {Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness},
	journal = {Gates Open Research}

If you upload a protocol, the Clinical Trial Risk Tool does not store or save it. You can read more on our Privacy Policy page.

If you choose to create a user account on, you can click Login and you will be directed to create an account and/or authenticate on the third party authentication provider Your email address is stored as your unique identifier while on the app. Our reason for storing your email address is that it is needed for optional user authentication. If you want to use the application anonymously, all functionality is still available without logging in, only that you will not be able to save and retrieve profiles at a later date.

You can delete any configuration you have saved on the server using the Delete button on the application atยย  You can also delete your account on our third party authentication provider by logging into that service. In accordance with the Right to Be Forgotten (please see our Privacy Policy), you can also send a message via the contact form to ensure complete deletion of your data.

The Clinical Trial Risk Tool allows a user to upload a trial protocol in PDF format. The tool processes the PDF into plain text and identifies features which indicate high or low risk of uninformativeness.

The tool uses a series of machine learning algorithms, such as Convolutional Neural Networks, combined with rule-based components, to identify key features of a protocol. You can download and run the source code on Github.

Please see this blog post for a summary of the tool’s accuracy in the different areas.

The tool gives a broad sense of risk using a traffic light system (red/amber/green for high/medium/low risk respectively). To see a finer definition of the risk, the tool internally scores protocols between 0 and 100. These scores are derived from a linear model. For example, a trial gains 20 points if it has a completed statistical analysis plan, 10 extra points if it has a large sample size, and so on. These are summed in an easy-to-understand way. You can adjust the weights (coefficients) of the different parameters extracted by the tool under the right-hand tab entitled “Configure thresholds and parameters”.

At this time the Clinical Trial Risk Tool does not give p-values. In future we hope to provide more statistical data to the Clinical Trial Risk Toolโ€™s users.

The Python code of the Clinical Trial Risk Tool was written by Thomas Wood (Fast Data Science).

Our source code is on Github. If you would like to improve the tool you are welcome to submit any changes you may have using a pull request. Please contact us to discuss.

At present the tool is designed to handle single documents only, but a future improvement may allow batch processing of multiple PDFs.

You are welcome to upload a protocol for a different pathology. Just please be aware that the thresholds for what is a small, medium or large trial may be different in a different area such as oncology, and the tool was developed with a focus on HIV and TB in particular. A future direction of the project could involve expansion to more pathologies.

The tool does not save personal data, except for your email address if you create an account, and is GDPR and HIPAA compliant. More information on our Privacy Policy.