LeetCode 4. Median of Two Sorted Arrays

4. Median of Two Sorted Arrays

Difficulty: hard

There are two sorted arrays nums1 and nums2 of size m and n respectively.

Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

You may assume nums1 and nums2 cannot be both empty.

Example 1:

nums1 = [1, 3]
nums2 = [2]

The median is 2.0

Example 2:

nums1 = [1, 2]
nums2 = [3, 4]

The median is (2 + 3)/2 = 2.5

这题给我的第一反应就是, 两个序列, 从后往前, 一个一个来, 排到中间为止, 把中位数算出来就好了. 代码如下:

1st Version:

class Solution {
public:
    double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
        auto sz = nums1.size() + nums2.size();
        int pre = numeric_limits<int>::max();
        for (int i = 0; i < (sz + 1) / 2; i++) {
            if (nums2.empty()) {
                pre = nums1.back();
                nums1.pop_back();
                continue;
            }
            if (nums1.empty()) {
                pre = nums2.back();
                nums2.pop_back();
                continue;
            }
            auto& n1 = nums1.back();
            auto& n2 = nums2.back();
            if (n1 < n2) {
                pre = n2;
                nums2.pop_back();
            } else {
                pre = n1;
                nums1.pop_back();
            }
        }
        
        if (sz % 2 == 0) {
            int m;
            if (nums1.empty()) {
                m = nums2.back();
            } else if (nums2.empty()) {
                m = nums1.back();
            } else {
                m  = max(nums1.back(), nums2.back());
            }
            return double(m + pre) / 2;
        } else {
            return double(pre);
        }
    }
};

写的好累赘, 但是我也不想优化逻辑了, 大致道理刚才也说了, 两个序列, 从后往前, 比较两序列的尾部, 把较大的先pop出来, 然后到位中的时候停就行了.

Rate:
Runtime: 24 ms
Memory Usage: 9.5 MB

不过闲来无事, 还是把上面的逻辑优化了一下, 结果…
2nd Version:

class Solution {
public:
    double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
        auto int_min = numeric_limits<int>::min();
        auto sz = nums1.size() + nums2.size();
        int pre = numeric_limits<int>::max();
        for (int i = 0; i < (sz + 1) / 2; i++) {
            int& n1 = !nums1.empty() ? nums1.back() : int_min;
            int& n2 = !nums2.empty() ? nums2.back() : int_min;
            if (n1 < n2) {
                pre = n2;
                nums2.pop_back();
            } else {
                pre = n1;
                nums1.pop_back();
            }
        }
        
        if (sz % 2 == 0) {
            int m;
            if (nums1.empty()) {
                m = nums2.back();
            } else if (nums2.empty()) {
                m = nums1.back();
            } else {
                m  = max(nums1.back(), nums2.back());
            }
            return double(m + pre) / 2;
        } else {
            return double(pre);
        }
    }
};

Rate:
Runtime: 32 ms, faster than 11.09% of C++ online submissions for Median of Two Sorted Arrays.
Memory Usage: 9.4 MB, less than 100.00% of C++ online submissions for Median of Two Sorted Arrays.

甚至还没有刚才理想…

嘛, 不过这题AC确实是AC了, 但是, 完全没有按照题意哦.

The overall run time complexity should be O(log (m+n)).

原题中要求总时间复杂度得是指数级的, 意思就是这题, 得用二分.

没想出来, 参考了讨论区的一个解法, 非常详细, 解释的也非常容易懂.
https://leetcode.com/problems/median-of-two-sorted-arrays/discuss/345042/C%2B%2B-8ms-99-short-binary-search-solution-w-detailed-explanation

简单说一下他的解法, 因为两个序列都是排序好的, 那么加入把中位数左部分和右部分表示出来的话, 必然是下面这样的: (这里设nums1为a, nums2为b, nums1的size为m, nums2的size为n, m < n, 否则swap):
left:  nums1[0..i], nums2[0..j]
right: nums2[i..m], nums2[j..n]
(注: 两个点表示的range是左闭右开的区间)

很容易想明白确实就是这样的.

接着开始推公式:

  1. 当$(m+n)\%2=0$时
    让它们满足, $i+j=m+n-i-j$
    也就是使两个两边的元素数量相等.
    移项, 化简可以得到$j=(m+n)/2-i$
    考虑到C++整数整除的性质, 上式还可以继续写成:
    $j=(m+n+1)/2-i$
    因为$m+n$为偶数嘛, 比方说$4/2=2$, 那么$(4+1)/2=5/2=2$
  2. 当$(m+n)\%2=1$时
    根据答主的建议, 为了方便之后的编码, 采用左部分比右部分多一个元素的割法, 也就是$i+j=m+n-i-j+1$
    移项, 化简可以得到$j=(m+n+1)/2-i$

所以写到最后, 发现两个的最终的式子其实是一样的, 这就是刚才选择左部分多一个的分割法的原因.

根据原题的序列有序, 理应存在a[i-1] <= a[i]b[j-1] <= b[j]. 当在i处分割时, 不能取得中位数的话, 那么b[j-1] > a[i]a[i-1] > b[j]至少会满足一个. 注意这里是从a[i-1]b[i-1]中取中位数.
简单思考一下, b[j-1] > a[i]的情况, 一定是分割i太靠左了; a[i-1] > b[j]的情况, 一定是分割i太靠右了. 于是我们需要针对这两种情况进行微调.

有了这些前戏的思考之后, 就可以考虑二分查找算法了, 下面摘抄了原答主的总结:

Summary of algorithm:

  1. Pick a cut i, in the middle of lo and hi, where lo and hi are the lowest and maximum candidate cut positions.
  2. If we have j > 0 and i < m and b[j-1] > a[i], then we moved the cut too far left, and so the cut is somewhere in the right half. Thus, we pick new lo = i + 1.
  3. If we have i > 0 and j < n and a[i-1] > b[j], then we moved the cut too far right, so the cut is somewhere in the left half. Thus, we pick hi = i - 1.
  4. If both conditions are passed, then there is no violation, so we have the median position (yay!). Based on our definition of cut i relative to the median, we know that if size of array is even, median is (max of left + max of right)/2. If size of array is odd, then max of left is the median. We can proceed to calculate that, and return. Because there is at least one element in either left side or right side, we know that the median is always valid, (for example, if i == 0, so no a elements in the left, there must be at least 1 b element in the left, so b[j-1] must be defined, so j > 0).

大致解释一下, 出现了刚才没出现的东西, lohi, 这俩是二分搜索的老常客了, 按照LeetCode3的说法, 这两就是夹逼的用俩”筷子”, 最后相等时, 也就能获得我们要的结果了.
接着就是想办法构造一个cut i出来的问题了. 这部分留给下面的代码吧:
3rd Version:

class Solution {
public:
    double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
        if (nums1.size() > nums2.size())
            swap(nums1, nums2);
        
        int m = nums1.size(), n = nums2.size(),
            mid = (m + n + 1) / 2;
        
        if (m == 0)
            return (m + n) % 2 ? nums2[mid - 1] : (nums2[mid - 1] + nums2[mid]) / 2.;
        
        int lo = 0, hi = m, i, j;
        while (lo <= hi) {
            i = lo + (hi - lo) / 2;
            j = mid - i;  // (m + n + 1) / 2 - i
            if (j > 0 and i < m and nums2[j-1] > nums1[i]) {
                // nums1[i] is too small => i is too small
                lo = i + 1;
            } else if (i > 0 and j < n and nums1[i-1] > nums2[j]) {
                // nums2[j] is too small => j is too small => i is too large
                hi = i - 1;
            } else {
                // left part max (`even' cond's median or `odd' cond's one half of median)
                int lmax;
                if (i == 0)
                    lmax = nums2[j-1];
                else if (j == 0)
                    lmax = nums1[i-1];
                else
                    lmax = max(nums1[i-1], nums2[j-1]);
                
                // right part min (`odd' cond's one half of median)
                int rmin;
                if (i == m)
                    rmin = nums2[j];
                else if (j == n)
                    rmin = nums1[i];
                else
                    rmin = min(nums1[i], nums2[j]);
                
                if ((m + n) % 2) {
                    return lmax;
                } else {
                    return (lmax + rmin) / 2.;
                }
            }
        }
        return 0;
    }
};

Rate:
Runtime: 16 ms, faster than 90.91% of C++ online submissions for Median of Two Sorted Arrays.
Memory Usage: 9.6 MB, less than 92.81% of C++ online submissions for Median of Two Sorted Arrays.

Postscript: 效率有点低哦, 这题搞了俩小时.