Analysis


What is the shortest peptide length that ensures sampling the entire (self) proteome?

In principle all human peptides can be recognized from sequences of length 5-6.


The selection condition is equivalent to the choice of the Extreme Value:

Probability distribution of the extremum of M objects:

Characteristics of the Extreme Value Distribution :

Mean value:                            ~    

Standard deviation:                             ~   

Scaling in the large N limit:                       

Due to the shaprpness of the distibution in the large N limit, the seletion condition can be written as:

where  and  are the mean and variance of interactions of the candidateTCR sequence.

The above condition is reminiscent of the micro-canonical constraints in Statistical Physics,

and it can be proved rigorously that in the large N limit, the probability to select a sequence is

where  and  have to be obtained self-consistently from

         and       

Graphical solution:

How well does this work for finite N? (N=5 and M=10.000)