Roslin Bioinformatics - VIPER

Loading Data

Paired pedigree and genotype data files are loaded via the 'Load Pedigree' followed by 'Load Genotype' buttons.

On loading the pedigree data the pedigree structure is displayed (a single 'dummy' marker is reported in the absence of real genotype data). Any fatal errors in the data format or pedigree structure will cause loading to fail, with errors reported to the screen and optionally saved to file.

When the genotype file is loaded the file is parsed and checked for validity. Again any fatal errors in the data format will cause loading to fail, with errors reported to the screen and optionally saved to file. Only markers with 2 or 3 alleles are analysed. An information panel details the data files loaded and counts of individuals, markers and errors.

Data parsing can be slow for very large data sets (see Large Data Sets).

Reloading Data

A new genotype file can be loaded at any point without reloading a pedigree. Reloading will destroy all current work, which should have been saved prior to reloading (see Saving Cleaned Data). Loading a new Pedigree file removes both the current pedigree and genotype data.

File Formats

Data is loaded as separate Pedigree and Genotype text files.

Pedigree File Format

ma1[SPACE] 0[SPACE] 0[SPACE] F[SPACE]  [CR]
pa1 0 0 M  
bob pa1 ma1 M 001
bill pa1 ma1 M 001
ma2 0 0 F  
jill pa1 ma1 F 002
gus pa1 ma2 M 003

notes

  • Each line represents   individualID | sireID | damID | sex(M/F/U) | [litterID]
  • All four values are required (lines without four or five values are ignored)
  • Values are separated by white space
  • Individuals should have unique names (with no spaces)
  • unknown parentIDs must be recorded as '0' (zero)
  • An optional fifth column can be added with a litterID (again with no spaces)
  • various pedigree inconsistencies (e.g. female sires, pedigree loops) will be detected and reported, and require fixing before the pedigree can be used

Genotype File Format

ma1[SPACE] IQ[SPACE] A[SPACE] A[CR]
pa1 IQ T T
bob IQ A T
bill IQ A ?
ma1 eyes A A
pa1 eyes G
bob eyes A G
bill eyes ?
ma1 sexL C C
pa1 sexL A Y-null
bob sexL ?
bill sexL Y-null ?
bill sexL Y-null

notes

  • Each line represents:   IndividualID | MarkerID | allele1 | allele2
  • Lines with fewer than 3 values are ignored (as there is no genotype information presented)
  • Values are separated by white space
  • There is no significance in the order of alleles
  • '?' indicates allele value unknown
  • A single allele value is interpreted as homozygous
  • Alleles should be recorded as single letter nucleotide codes
  • Only bi-allelic SNPs are valid (plus the possibility of sex-linkage)
  • Sex-linked genotypes should have a null marker (e.g. 'Y', 'W', 'Y-null', 'null')
    asserted in the heterogametic individuals,
    and care taken not to assert homozygosity for this
    (i.e. the last line above is wrong).
  • Duplicate genotype entries are reported as fatal errors
  • Markers with fewer than 2 or more than 3 alleles are ignored (with warnings)