398. Random Pick Index
Given an array of integers with possible duplicates, randomly output the index of a given target number. You can assume that the given target number must exist in the array.
The array size can be very large. Solution that uses too much extra space will not pass the judge.
这种随机数的题目很少做。这个题目用到了蓄水池抽样算法(Reservoir Sampling). wikipedia里有详细说明。
For sample size 1:
Suppose we see a sequence of items, one at a time. We want to keep a single item in memory, and we want it to be selected at random from the sequence. If we know the total number of items (n), then the solution is easy: select an index i between 1 and n with equal probability, and keep the i-th element. The problem is that we do not always know n in advance. A possible solution is the following:
- Keep the first item in memory.
- When the i-th item arrives (for i>1):
- with probability 1/i, keep the new item (discard the old one)
- with probability 1-1/i, keep the old item (ignore the new one)
- when there is only one item, it is kept with probability 1;
- when there are 2 items, each of them is kept with probability 1/2;
- when there are 3 items, the third item is kept with probability 1/3, and each of the previous 2 items is also kept with probability (1/2)(1-1/3) = (1/2)(2/3) = 1/3;
- by induction, it is easy to prove that when there are n items, each item is kept with probability 1/n.
For sample k from n: