Roslin Bioinformatics - VIPER

A Desktop Application for Exploring and Cleaning Inheritance Inconsistencies within Pedigree/Genotype Data

Introduction

The VIPER-Project is a collaboration between Computer Scientists at The Institute for Informatics and Digital Innovation, Edinburgh Napier University and The Bioinformatics Group of The Roslin Institute, The Royal (Dick) School of Veterinary Studies, The University of Edinburgh.

VIPER combines an improved ResSpecies algorithm for genotype inheritance checking and inference with a novel space-efficient visualisation of pedigree structure in a desktop tool for exploring then cleaning data errors in pedigree/genotype datasets.

Datapoint errors in pedigree genotype datasets are difficult to identify and adversely affect downstream genetic analyses. Errors that are inconsistent with the rules of Mendelian inheritance typically invalidate linkage analysis algorithms, and cause such analyses to fail. Genotype errors may arise from a variety of systematic or sporadic errors in either the genotyping assay, or in recording the pedigree or genotype information.

By applying an inheritance-checking algorithm for markers across the pedigree and visualising the inheritance data in an exploratory user interface, VIPER allows the sources of data inconsistency can be resolved.

VIPER displays the structure of the study population in a novel pedigree visualisation of generation sandwiches. Error rates reported by the inheritance algorithm are overlaid on the pedigree structure, allowing the inheritance pattern of reported errors to be explored, and the likely underlying bad datapoint resolved.

Applying the inheritance-checking algorithm across the pedigree for each marker both infers missing genotype data and highlights genotypes that are not consistent with the basic rule that an offspring's genotype must derive from alleles inherited from each parent.

The manner in which the algorithm traverses the pedigree data means that the exact position that an error is reported can vary, i.e. an error reported in an individual genotype may actually be because there is an actual error in a close relative e.g. a parent. In practice the error reporting is pushed down the generations, so an error reported in generation 4 might be due to a mistyping or misrecording of a genotype in a preceding generation. Individuals for whom genotype data is missing are also highlighted, because genotype inference by the algorithm can cause the reporting of errors to be further displaced in these circumstances.

The Improved ResSpecies Algorithm and Genetic Model

The original inheritance checking algorithm and API was developed for the ResSpecies data system and used by GenotypeChecker, however, the underlying Java data model proved too heavyweight for efficient processing of large datasets.

Remodeling replaced Genotype objects by representing each individual (diploid) genotype as a (32-bit) integer bitmap, encoding information about alleles present, parental inheritance, genotype inference and inheritance inconsistencies. The new representation was also restricted to bi-allelic SNP-markers (whilst allowing for the possibility of sex-linkage by using a third null ‘allele’).

The use of bitmaps to represent genotype and inheritance data both dramatically reduces the memory footprint of the API and increases the processing speed which now uses lower-level, more efficient bit-wise operations and masks. On a 32-bit Windows platform these improvements produced a 72-fold increase in processing speed, and increased the data size limit 30-fold from 2 to 60 million genotypes, whilst on a 64-bit/3G RAM Linux platform  datasets of 200 million genotypes could now be analysed in under a minute.

Publications

Genotypechecker: an interactive tool for checking the inheritance consistency of genotyped pedigrees.

Paterson,T. and Law,A. (2011) Animal Genetics, 42, 560–562.

Visualising Errors in Animal Pedigree Genotype Data.

Graham,M., Kennedy,J., Paterson,T. and Law,A. (2011) Computer Graphics Forum, 30, (3), 1011-1020.

Evaluating the VIPER pedigree visualisation: detecting inheritance inconsistencies in genotyped pedigrees.

Paterson,T., Graham,M., Kennedy, J. and Law,A. (2011) In Proc. of IEEE BioVis 2011. IEEE Computer Society Press, Providence, RI, USA, pp. 119-126.

VIPER: a visualisation tool for exploring inheritance inconsistencies in genotyped pedigrees.

Paterson,T., Graham,M., Kennedy, J. and Law,A. (2012) BMC Bioinformatics, 13 (Suppl 8): S5