Binary Search (Part 2)

(1) Standard BS:

在有序（非降序）数组中查找一个target值，数组中元素没有重复，找到target则返回该元素对应的index，找不到则返回-1

注意1：如果是start < end，那么当target等于num[num.length-1]时，会找不到该值。

 注意2：因为num[mid] > target, 所以如果有num[index] == target, index一定小于mid，能不能写成end = mid呢？举例来说：num = {1, 2, 5, 7, 9}; 如果写成end = mid，当循环到start = 0, end = 0时（即num[start] = 1, num[end] = 1时），mid将永远等于0，此时end也将永远等于0，陷入死循环。也就是说寻找target = -2时，程序将死循环。


注意3：因为num[mid] < target, 所以如果有num[index] == target, index一定大于mid，能不能写成start = mid呢？举例来说：num = {1, 2, 5, 7, 9}; 如果写成start = mid，当循环到start = 3, end = 4时（即num[start] = 7, num[end] = 9时），mid将永远等于3，此时start也将永远等于3，陷入死循环。也就是说寻找target = 9时，程序将死循环。

int binarySearchStandard(vector<int> &num, int target){
    int start = 0;
    int end = num.length - 1;
    while(start <= end){ // 注意1
        int mid = start + ((end - start) >> 1);
        if(num[mid] == target)
            return mid;
        else if(num[mid] > target){
            end = mid - 1; //注意2
        }
        else{
            start = mid + 1; //注意3
        }
    }
    return -1;
}


(2) Variation BS 1 (二分查找变形一:)
在有序（非降序）数组中查找一个target值，数组中元素可能有重复，找到target在数组中对应的index的最小值，找不到则返回-1。

注意1：为了对左右相邻的两个数进行对比，循环结束的时候，start和end应该相隔1。

注意2：a). 当num[mid] == target时，此时并不能直接返回mid，因为有可能num[mid-1]或者num[mid-2]…也等于target，所以此时应该将end  = mid，然后继续循环。b). 当num[mid] > target时，如果有num[index] == target，那么index一定小于mid，本来按此道理应该令end = mid – 1，但是为什么此处可以end = mid呢？回头去看上一题（标准版）的注意2，只有当start = end时，才会出现死循环的情况。这里因为while循环的限制条件使得start和end永远不可能相等，所以end = mid也是正确的。由此，我们可将a) 和 b) 两种情况合并。

注意3：这里与上题一致，不用有变化。但是在这里写成start = mid也是成立的，为啥呢？回头看上题的注意3，同样的例子num = {1, 2, 5, 7, 9}; 在上题中，如果写成start = mid，当循环到start = 3, end = 4时（即num[start] = 7, num[end] = 9时），mid将永远等于3，此时start也将永远等于3，陷入死循环。但在此题中由于while中的限制条件的变化，避免了这种死循环的发生。

注意4：此处为什么要分别检验num[start]和num[end]是否等于target呢？因为这两个值都有可能等于target，取决于中间二分时start和end的更新过程。这里随便用几个简单的例子即可验证。如果循环结束时num[start] == num[end] == target，根据题意应该返回start，所以我们先验证start。如果两个值都不是target，则target不存在，返回-1。
public int binarySearchMinimumIndex(int[] num, int target){
    int start = 0;
    int end = num.length - 1;
    while(start + 1 < end){ //注意1
        int mid = start + ((end - start) >> 1);
        if(num[mid] >= target){ //注意2
            end = mid;
        }
        else{
            start = mid + 1; //注意3
        }
    }
    if(num[start] == target) //注意4
        return start;
    else if(num[end] == target)
        return end;
    else
        return -1;
}



(3)  Variation BS 2 (二分查找变形二)

在有序（非降序）数组中查找一个target值，数组中元素可能有重复，找到target在数组中对应的index的最大值，找不到则返回-1。

道理与上一题相同，只要在注意2和注意3处与上题反过来就行了，原理不重复了，拿不准的时候可以用几个简单的test case测试一下。
public int binarySearchMaximumIndex(int[] num, int target){
    int start = 0;
    int end = num.length - 1;
    while(start + 1 < end){ //注意1
        int mid = start + ((end - start) >> 1);
        if(num[mid] <= target){ //注意2
            start = mid;
        }
        else{
            end = mid - 1; //注意3
        }
    }
    if(num[end] == target)
        return end;
    else if(num[start] == target) //注意4
        return start;
    else
        return -1;
}



(4)  Variation BS 3 (二分查找变形三：)
在有序（非降序）数组中查找一个可能的”最大”index，使得num[index] < target，数组中元素可能有重复，找不到则返回-1。

注意1：我们可以对此步分开考虑，当num[mid] == target时，因为我们要求num[index] < target，所以index必须在mid左边。当num[mid] > target时，target肯定在num[mid]的左边，所以index肯定也要在mid的左边。那么再多想一步，假如num[mid] >= target时，我们进行了end = mid的操作，会产生什么后果呢？经我的有限测试，好像并没有什么问题，因为while中的限制条件导致了每当start + 1 == end的时候，循环就结束了，所以标准版二分查找中出现的死循环的情况并不存在。

注意2：当num[mid] < target时，为什么此处用start = mid而不是start = mid + 1？因为有可能num[mid+1] == target，此时如果start = mid + 1, 则num[start] == target，明显错过了正解。
public int binarySearchClosestFromLeft(int[] num, int target){
    int start = 0;
    int end = num.length - 1;
    while(start + 1 < end){
        int mid = start + ((end - start) >> 1);
        if(num[mid] >= target){ //注意1
            end = mid - 1;
        }
        else{ //注意2
            start = mid;
        }
    }
    if(num[end] < target){
        return end;
    }
    else if(num[start] < target){
        return start;
    }
    else{
        return -1;
    }
}




(5)  Variation BS 3 (二分查找变形四：)
在有序（非降序）数组中查找一个可能的”最小”index，使得num[index] > target，数组中元素可能有重复，找不到则返回-1。
道理与上题完全相同，稍作调整即可。

注意1：此处同上题注意1原理相同，写成start = mid也应该没有问题。
public int binarySearchClosestFromRight(int[] num, int target){
    int start = 0;
    int end = num.length - 1;
    while(start + 1 < end){
        int mid = start + ((end - start) >> 1);
        if(num[mid] <= target){
            start = mid + 1; //注意1
        }
        else{
            end = mid;
        }
    }
    if(num[start] > target){
        return start;
    }
    else if(num[end] > target){
        return end;
    }
    else{
        return -1;
    }
}






综上所述，总结规律：
1. 当数组中没有重复元素时：while循环判定条件是(start <= end)，每次start更新为mid + 1，end更新为mid – 1。
2. 当数组中含有重复元素时或者要找的值是target的相邻数时，判定条件是(start + 1 < end)，当num[mid] == target时，并不返回mid，而是根据情况更新start或者end。

My extension:

 (1) Find the first index that nums[index] >= target (lower_bound) with dup.

      int my_lower_bound(vector<char>& letters, char& target) {
        int left = 0, right = letters.size() - 1;   
        while (left < right) {
            int mid = (right + left) / 2;  // lower_median
            if (letters[mid] >= target)
                right = mid;
            else
                left = mid + 1;
        }
        
        return left;
    }

 (2) Find the first index that nums[index] > target (upper_bound) with dup.

  int my_upper_bound(vector<char>& nums, char& target) {
        int left = 0, right = nums.size() - 1;   
        while (left < right) {
            int mid = (right + left + 1) / 2;  // upper median
            if (nums[mid] <= target)
                left = mid;
            else
                right = mid - 1;
        }
        

        return nums[left] <= target ? left + 1 : left;
    }

Index Binary Search (Need Sorting)

349	Intersection of Two Arrays

(1) Brute force. O(N^2) time and O(N) space. Try every pair of number from nums1 and nums2, if they are equal put it in the result set. In the end, transfer the result from set to vector and return it.

(2) Slightly better solution. O(NlgN) time and O(N) space. (2.1) Sort one array. (2.2) For loop another array and try to do BS in the sorted array.

(3) Best algorithm. O(N) time and O(N) space. (3.1) Copy the elements from one array to a set to get rid of duplicates. (3.2) For loop each element in another array, try to erase every element in the set, if erase successfully ( return 1, meaning erase 1 element), then put that element into the result array, otherwise skip it.

300	Longest Increasing Subsequence

Utilize the fact that a smaller number comes after a greater number which still larger than its predecessors in the current LIS wouldn't make the LIS smaller (it could be equal or larger, but can never be smaller).

int lengthOfLIS(vector<int>& nums) {
vector<int> res;
for (int i = 0; i < nums.size(); i++) {
auto iter = lower_bound(res.begin(), res.end(), nums[i]);
if (iter == res.end())
res.push_back(nums[i]);
else
*iter = nums[i];
}

return res.size();

O(NlgN) time and O(N) space.

658	Find K Closest Elements

The idea is to find the first number which is equal to or greater than x in arr. Then, we determine the indices of the start and the end of a subarray in arr, where the subarray is our result. The time complexity is O(logn + k).

In the following code, arr[index] is the first number which is euqal to or geater than x (if all numbers are less than x, index is arr.size()), and the result is arr[i+1, i+2, ... j].

350	Intersection of Two Arrays II

(1) Sort both arrays and call STL a.erase(set_intersection(a.begin(), a.end(), b.begin(), b.end(), a.end()), a.end()). O(NlgN) time and O(1) space

(2) (2.1) Count the frequency of each number for array1 and save the result in a hash map. (2.2) For loop each number in the second array if found in the hash map and its frequency is still larger than 0, put it in the result array and decrease the frequency by 1. (2.3) In the end, return the result array.
O(M+N) time and O(M) space.

167	Two Sum II - Input array is sorted

(1) Two pointers. Assign front pointer and back pointer to 0 and array size minus 1 respectively. Check whether nums[left] + nums[right] == target, if so return the two indices. Otherwise if the sum is > target, move the right pointer left by 1. Else if the sum < target, move the left pointer right by 1. Keep doing this until find the pair that has sum to target. O(N) time and (1) space.

(2) Improve based on (1). When mismatch, instead of moving left pointer right by 1 or right pointer left by 1. We could do binary search to find the next number >= target - nums[right] (when sum < target), or binary search to find the next number that is <= target - nums[left] (when sum > target). Note the when doing the binary search, find the next greater or equal number, you should return the right, find the next smaller or equal number you should return the left.

34	Search for a Range

(1) STL version. O(lgN)
vector<int> searchRange(vector<int>& nums, int target) {
auto range = equal_range(nums.begin(), nums.end(), target); // lg(N)
if (range.first == range.second)
return {-1, -1};
return {range.first - nums.begin(), range.second - nums.begin() - 1};

(2) Implement your own Lower_bound to return int value instead of iterator.

int lower_bound(vector<int>& nums, int target) {
int l = 0, r = nums.size() - 1;
while (l <= r) {
int m = l + (r - l) / 2;
if (nums[m] >= target) // consider equal like greater than to keep l still usable
r = m - 1;
else
l = m + 1;
}

return l;
}

vector<int> searchRange(vector<int>& nums, int target) {
int idx1 = lower_bound(nums, target);
int idx2 = lower_bound(nums, target+1) - 1;
if (idx1 < nums.size() && nums[idx1] == target)
return {idx1, idx2};
else
return {-1, -1};
}

744	Find Smallest Letter Greater Than Target

Similar to 34 Search for a Range. You can call upper_bound(letters.begin(), letters.end(), target) directly. Or implement your own version of upper_bound (either by mid = int((l + r) / 2 + 0.5), or call lower_bound(), and skip the duplicates after lower bound to get the upper bound.

(a) Sort the heaters array.
(b) For each house in the houses array. Find the lower_bound in the heaters array.
(c) If the lower_bound is not the vector end. Calculate distance1 as the distance from the house to the heater at lower bound. If the lower_bound is not the first in the vector. Also calculate the distance2 as the distance from the house to the heater at lower_bound - 1. Get the minimum of the two distance.
(d) get the maximum of all the minimum distance calculated in (b) (c)

O(House number *lg(Heater number))

275	H-Index II

Hard part is what value to return after the while. Think this way, when equal we return N - mid. After while we should return something similar. either N - low or N - high. So far we only see return f(low), use a simple example (like has 1 element) to confirm that.

int hIndex(vector<int>& citations) {
int N = citations.size();
int low = 0, high = N - 1;
while (low <= high) {
int mid = low + (high - low) / 2;
if (citations[mid] == N - mid)
return citations[mid];
else if (citations[mid] > N - mid)
high = mid - 1;
else
low = mid + 1;
}

return N - low;
}

454	4Sum II

(1) For loop array A and array B, save all the sum of a and b into a new array (has N^2 elements) and sort the sum array in ascending order (this sort is optional).

(2) For loop array C and array D, save all the negative sum of c + d (i.e : -(c + d)) into a new array (has N^2 elements) and sort the sum array in ascending order.

(3) For each element in sum array AB, find the equal_range in negative sum array CD and add the number of equal range numbers to the total count.

(4) Return total count in the end.

O(N^2*lgN) time and O(N^2) space.

230	Kth Smallest Element in a BST

(a) Pass a count value to in order DFS traversal and keep increase it until hit the kth element
and save that value in the out parameter and return. O(N) time complexity

(b) CountNodes of left tree and utilize BST property to do binary search, this is still not
O(lgN) since the coutNode function is still O(N)

(c). Build a TreeNodeWithCount data structure for each node in O(N) time then the future
search and insert / delete could be done in O(lgN) time which is suitable for
frequently query/insert/delete in the tree

After the TreeNodeWithCount is built in O(N) time. For future search and insert delete we can
do :
(c.1) If left is not NULL
if (rootWithCount->left->count >= k)
return kthSmallest(rootWithCount->left, k);
else if (rootWithCount->left->count == k - 1)
return rootWithCount->val;
else // rootWithCount->left->count < k -1
return kthSmallest(rootWithCount->right, k - rootWithCount->left->count - 1)
(c.2) Else left is NULL
if (k == 1)
return rootWithCount->val;
else
return kthSmallest(rootWithCount->right, k - 1)

392	Is Subsequence

Indexing + Binary Search, O(T) + O(S * log(T)), space O(T)
Map T to {char : [indices]}
"acbca" ==> pos = {
'a' : [0, 4]
'b' : [2]
'c' : [1, 3]
prev = -1
index = lower_bound(pos[c], prev)
if (index == len(pos[c])) return false
prev = pos[c][index];

Related: Number of Matching Subsequences

436	Find Right Interval

(a) Use a std::map to save each interval start and its index.
(b) For loop each interval in the array. Get its lower_bound iter in the map. If iterate is map.end(), there is no minimum-"right" for that interval. Otherwise, *iter is the minimum right for that interval.

718	Maximum Length of Repeated Subarray

DP:
vector<vector<int> > dp(sizeA + 1, vector<int>(sizeB + 1, 0));

if A[i] == B[j]
dp[i][j] = dp[i + 1][j + 1] + 1;

res = max(res, dp[i][j]);

Binary Search with Rolling Hash

Kind of complicated, need to learn

Range Binary Search (No Need for Sorting)

378	Kth Smallest Element in a Sorted Matrix

(1) Priority queue, since C++ priority queue is implemented as maximum heap, we just need to maintain the heap at most size k, whether we see the heap size exceeds k, pop the top element of the heap (( k+1) th smallest element). In the end return the top element which will be the kth smallest element.

(2) Range Binary search (not Index Binary search). Let left be the matrix[0][0] and right be matrix[n - 1][n - 1], midv = left + (right - left) / 2. For loop each row and count the number that less or equal than midv (using upper_bound return value minus row.begin()). After counting, if count < k, left = midv + 1, else right = midv. In the end return left.

287	Find the Duplicate Number

The range binary search is utilizing the fact that the duplicate number could only occur in the “denser” half of the array (This is only true, we have no missing numbers from 1 to n). So we should set low or high to move towards the denser half. Eventually when low exceeds high we will get the duplicated number.

240	Search a 2D Matrix II

(1) Saddle search from top right corner. While row < M && col >= 0,If current value equal target return it. Else if it is smaller than target row++, otherwise col--.

(2) O(MlgN) time and O(1) space. Worst case search all the rows.
int mid = top + (bottom - top) / 2;
if (target >= matrix[mid].front() && target <= matrix[mid].back())
if (serachVector(matrix[mid], target)) // row binary search
return true;

if (searchMatrix(matrix, target, top, mid - 1))
return true;

if (searchMatrix(matrix, target, mid + 1, bottom))
return true;

return false;

Other

222	Count Complete Tree Nodes

This problem can be solved in O(N) time using DFS (preOrder/InOrder/postOrder) or BFS (level traversal using queue). But they both TLE. Clearly, the problem needs some algorithm more clever which utilize the complete tree property.

Note 1: the height of a complete tree is the left most tree node, we get the the height of a complete tree in O(lgN) time. 


Note 2: for any  given complete tree. At least one of the left and the right sub-tree is full binary tree. If the height of the left tree is equal to the height of the right tree, then the left tree must be full tree. Else if the height of the left is NOT equal (great than actually) to the height of the right tree. Then the right tree more be full tree. 


Note 3:  For a given full tree, its nodes  number is 2^height - 1 (single node is count as height 0).


Therefore we could count the total number nodes of a complete tree in O(lgN) time recursively:

       int lh = completeTreeHeight(root->left);
       int lr = completeTreeHeight(root->right);
        
        if (lh == lr) // left tree must be full tree, and right treee may not
            return (1 << lh) + countNodes(root->right);
        else // right tree must be full tree, and left tree may not
            return (1 << lr) + countNodes(root->left);

HERE

29	Divide Two Integers

The idea is to find the target value m = divisor * 2^n that is still less or equal to dividend. And and n to the res. And subtract divisor * 2^n from dividend. Keep doing this until the remainder (updated dividend) is less than divisor. Then res will be the quotient.

This is O(lgN) time O(1) clever binary search algorithm.

e.g: 100 / 3, dividend = 100, divisor = 3
m = 3 * 2^5 = 96, res += 2^5 = 32
dividend = 100 - 96 = 4, divisor = 4
m = 3 * 2 ^ 1 = 3, res += 1 (i.e: res = 33)
dividend = 4 - 3 = 1 < divisor = 3, end.
return 33.

Search This Blog

leetcode solution