Analysis
What is the shortest peptide length that ensures sampling the entire (self) proteome?
In principle all human peptides can be recognized from sequences of length 5-6.
The selection condition is equivalent to the choice of the Extreme Value:
Probability distribution of the extremum of M objects:
Characteristics of the Extreme Value Distribution :
Mean value: ~
Standard deviation: ~
Scaling in the large N limit:
Due to the shaprpness of the distibution in the large N limit, the seletion condition can be written as:
where and are the mean and variance of interactions of the candidateTCR sequence.
The above condition is reminiscent of the micro-canonical constraints in Statistical Physics,
and it can be proved rigorously that in the large N limit, the probability to select a sequence is
where and have to be obtained self-consistently from
and
Graphical solution:
How well does this work for finite N? (N=5 and M=10.000)