R: Percent Sequence Identity

pid {Biostrings}

R Documentation

Percent Sequence Identity

Description

Calculates the percent sequence identity for a pairwise sequence alignment.

Usage

pid(x, type="PID1")

Arguments

`x`	a `PairwiseAlignments` object.
`type`	one of percent sequence identity. One of `"PID1"`, `"PID2"`, `"PID3"`, and `"PID4"`. See Details for more information.

Details

Since there is no universal definition of percent sequence identity, the pid function calculates this statistic in the following types:

"PID1":: 100 * (identical positions) / (aligned positions + internal gap positions)
"PID2":: 100 * (identical positions) / (aligned positions)
"PID3":: 100 * (identical positions) / (length shorter sequence)
"PID4":: 100 * (identical positions) / (average length of the two sequences)

Value

A numeric vector containing the specified sequence identity measures.

Author(s)

P. Aboyoun

References

A. May, Percent Sequence Identity: The Need to Be Explicit, Structure 2004, 12(5):737.

G. Raghava and G. Barton, Quantification of the variation in percentage identity for protein sequence alignments, BMC Bioinformatics 2006, 7:415.

Examples

  s1 <- DNAString("AGTATAGATGATAGAT")
  s2 <- DNAString("AGTAGATAGATGGATGATAGATA")

  palign1 <- pairwiseAlignment(s1, s2)
  palign1
  pid(palign1)

  palign2 <-
    pairwiseAlignment(s1, s2,
      substitutionMatrix =
      nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))
  palign2
  pid(palign2, type = "PID4")

[Package Biostrings version 2.46.0 Index]