Eye2Gene A Web Tool For Genetic Prediction Of Inherited Retinal Disease Using Machine Learning

Over 180 gene mutations are known to cause inherited retinal dystrophy (IRD) and a genetic diagnosis is a significant step towards managing, and possibly treating, people’s sight-loss.

Eye2Gene will make genetic diagnosis faster and accessible to more IRD-affected families. Eye2Gene will be trained using the data of thousands of patients so that it can learn to distinguish between the different genetic causes. In the first stage of development, Eye2Gene will be trained using auto-fluorescent images of retinas. You will be able to upload a retina image and Eye2Gene will report which genes are most likely to be causing the IRD.

Early onset IRDs, such as Stargardt disease and retinitis pigmentosa (RP), can lead to permanent vision loss over a period of 10 to 30 years and are cumulatively the leading cause of blindness in the working-age population in the UK.

Moorfields Eye Hospital (MEH), Europe’s largest eye hospital (2M patients), has the largest and best characterised, genetically and phenotypically, cohort of IRD patients in the world (>9000), which includes 800 Stargardt and 600 RP patients. Discovering the causal genetic mutations in IRDs is a prerequisite to determining prognosis and inclusion in any gene-directed clinical trials, such as gene therapy.

IRDs often have characteristic patterns of progression due to gene expression timing and distribution in the different types of retinal cells. Experienced clinicians learn to diagnose these, using various imaging modalities, longitudinal information on the patient’s symptoms, and genetic screening. However, the process is time-consuming and expensive, as it requires access to specialist centres, expensive clinical and genetic tests, and specialist training in electrophysiology, image interpretation and bioinformatics. Moreover, the spectrum of disease-causing mutations, which may be non-coding, and the genetic heterogeneity of similar clinical phenotypes is still poorly understood. Consequently, 40% of IRD patients do not have a diagnosis because of lack of data or insight.

My solution is to develop a Machine Learning (ML) system, trained on the wealth of high-dimensional patient data available at MEH, capable of (i) predicting genetic diagnosis in IRD (ii) predicting patient outcome (iii) recommending treatments and assessing their efficacy.

In order to achieve this, I will create a longitudinal dataset from the available data at MEH, incorporating, imaging, electroretinograms, quantitative and qualitative phenotypes extracted from free-text clinical records, genetic data, along with self-reported patient data. Human Phenotype Ontology (HPO) terms will be used to annotate patient records in a standardised manner. Much of the groundwork for this has already been completed, firstly, through my development of the Phenopolis platform (www.phenopolis.org) for analysing genetic and HPO data, and secondly my contribution to an image processing pipeline for MEH.

I will use this dataset to train Convolutional Neural Networks, a ML approach that successfully solves complex classification problems on high-dimensional and imaging data, to predict causative genetic mutations. I first propose to evaluate my methods on three categories of IRD patients with known causative genes: i) ABCA4 for Stargardt, ii) USH2A for RP, and iii) other 10 genes in similar IRDs. The networks will be tested first on i) vs ii) and later on the harder problem of i) vs ii) vs iii). I will assess the predictions to discover what is the pertinent clinical information used by the network and whether this sheds new insights into the phenotypic and genetic heterogeneity of these diseases, in particular with regards to incomplete or age-related penetrance.

Building on this, I will develop models of disease progression, in order to predict how the patient’s condition will evolve, using Long Short-Term Memory networks or other time series models, by identifying patient trajectories (e.g fast or slow decliners). From these models, I will be able to identify biomarkers to detect disease stage, predict progression and estimate future changes in visual acuity for diagnosed patients. I will seek to develop a score or index to predict patient outcome.

Finally, improved genetic diagnosis and disease models of progression will allow a more precise selection of patients who are likely to benefit from novel treatments, such as gene-therapy, while permitting a more accurate assessment of their efficacy.