FAQ — AutoPrognosis

AutoPrognosis 2.0 is an open-source software package from the van der Schaar Lab that allows users to build diagnostic and prognostic models using machine learning. In addition, AutoPrognosis allows users to debug and understand models using state-of-the-art interpretability methods and to share their models as web-based applications.

To find out more about the capabilities of AutoPrognosis 2.0, please read our paper.

AutoPrognosis can be used to solve classification, regression, and survival/time-to-event problems.

For examples, see our tutorials on GitHub.

All code is fully open-sourced on GitHub and AutoPrognosis can be installed via PyPI.

A number of examples and tutorials demonstrating the core functionality of AutoPrognosis can be found on GitHub. Documentation can be found here. Finally, a short introductory video is available here.

As far as we are aware, AutoPrognosis is the only AutoML framework designed with medical applications in mind. This is reflected in the capabilities of AutoPrognosis, which we believe are unique across any publicly available framework.

Use-cases: AutoPrognosis can be used for classification, regression, and survival or time-to-event analysis.

Pipelines: AutoPrognosis not only trains predictive models, but builds end-to-end pipelines including imputation, preprocessing (such as feature scaling and dimensionality reduction), model selection and hyperparameter tuning, and ensembling.

Clinical investigation: AutoPrognosis does not stop at the model building step! AutoPrognosis also contains state-of-the-art interpretability methods to explain and debug models and enables users to readily share developed models as web-based applications.

AutoPrognosis is designed to work with tabular (cross-sectional) data. At this time, AutoPrognosis is not compatible with alternate data formats, such as images or general time-series data.

AutoPrognosis uses automated machine learning to develop powerful predictors, selecting models and optimising hyperparameters according to a user-defined objective.

For more information, check out our paper.

AutoPrognosis is provided as a fully open-source Python package under the Apache 2.0 license on GitHub. The library can be installed locally, on a server, or in the cloud from PyPI using “$ pip install autoprognosis” or from source.

In addition, AutoPrognosis can be run using Google Colab! Try one of our tutorials now!

To use AutoPrognosis, you will need to prepare a dataset consisting of features or covariates and a target outcome or label. AutoPrognosis handles both imputation of missing values and the model building step. Limited preprocessing is necessary.

Yes! While AutoPrognosis is primarily developed for Python, examples demonstrating how to use AutoPrognosis as an R user can be found here.

Alternatively, you can use R for data visualisation, curation, etc., and then follow one of our many Python tutorials demonstrating AutoPrognosis!

You can see the ensemble of pipelines selected by AutoPrognosis by running “model.name()”.

We provide a set of default values that we believe are appropriate for the vast majority of use-cases. We recommend only changing the hyperparameters if you are an expert user or if you have specific requirements.

No! You are free to include whatever subset of algorithms you would like to consider in the final ensemble.

That said, unless computational resources are an issue, we recommend including all methods for the best results.

Frequently Asked Questions

AutoPrognosis by the van der Schaar Lab

Frequently Asked Questions

What is AutoPrognosis 2.0?

What problems can AutoPrognosis be used to solve?

How can I get started with AutoPrognosis?

How does AutoPrognosis differ from other AutoML frameworks (e.g. AutoGluon)?

What data can I use with AutoPrognosis?

How does AutoPrognosis work?

How can I use AutoPrognosis?

How should I prepare my dataset? What preprocessing do I need to do?

Can I use AutoPrognosis with R?

I’ve run a study with AutoPrognosis, how do I report the pipeline used, e.g. for a publication?

There are a number of hyperparameters/settings. How should these be chosen?

Do I have to include all (or almost all) of the different ML algorithms? Or is it better to select only a subset of algorithms?

AutoPrognosis by the van der Schaar Lab