The text file provided to solvers contains millions of lines, with 8 characters each. This is a sizable data set, and doing a frequency analysis over the entire file will establish that only some letters of the alphabet are used (ABCDEFGHIKLMNORSTUY), and that a few letters, such as K or V, seem to appear a little more often than you might expect.
If a solver tracks the appearances of an individual letter of the alphabet over, say, 100 or 200 lines, rather than trying to track all of the appearances of all the letters at once, regularities will begin to emerge; if a solver picks one of the letters that is relatively uncommon in the set and track the appearances of that letter, the solver will be able to identify a set of periodic reappearances of that letter, and be able to identify the periodicity with which some instance of that letter appears. This should strongly suggest that there are number of series of letters repeating on an ongoing basis.
The following letters appear at two different identifiable periodicities, listed below:
- B (5 and 7)
- C (7 (twice) and 11)
- D (7 and 12)
- F (10 and 12)
- G (5 and 10)
- H (8 and 12)
- K (8 and 11)
- N (5 and 10)
- R (Twice in a periodicity of 7)
- S (10 and 11)
- T (7 and 11)
- V (7 and 12)
- Y (7 (twice) and 11)
Multiple letters reappearing at the same periodicity should reinforce the conclusion that letters are appearing in repeating on a cyclical basis. The distribution is reasonable for the letters in cyclic patterns to form meaningful words and phrases.
To identify the cyclical patterns at those periodicities use of a programmatic or statistical approach is possible. A manual word-list search using the known positions of letters for a given period length -
_ K _ _ H _ _ _and
_ _ C K _ Y _ _ _ S _are good options for this approach - or a brute force interval-counting approach should eventually yield the contents of enough cycles (BINGO, OKLAHOMA, LIFE GOES ON, MICKEY MOUSE, METHOD OF LOVE) to suggest that the cycles all contain words or phrases prominently spelled out in songs, and to help disambiguate the remaining cycles (DIVORCE, TROUBLE, YMCA).
Analysis that brings the solver to this point should have also observed that from time to time the cyclic patterns skip a letter every so often; this is most observable with YMCA (which skips every second A, leading to a repeating YMCAYMC sequence which results in an apparent period of 7), and with BINGO, which skips every fifth G. One letter is periodically skipped for each of the eight cyclical patterns: A is skipped every second time from YMCA, B is skipped every 9 times from TROUBLE, C is skipped every 7 times from MICKEY MOUSE, D is skipped every 23 times from DIVORCE, E is skipped (once only) every 8 times from METHOD OF LOVE, F is skipped every 5 times from LIFE GOES ON, G is skipped every 5 times from BINGO, and H is skipped every 12 times from OKLAHOMA.
Taking these numbers in A-to-H order (2-9-7-23-8-5-5-12) and converting them to letters spells the answer, BIG WHEEL.