[Python] 圖像處理 – Depthmap 001 – 深度圖介紹與生成

【前言】

對於人眼說明立體感是非常簡單的，但我們如何讓電腦處理相同的情境呢？就要依靠深度圖標註亮點（前景）與暗點（背景），提示其空間感。接著我們將簡單介紹深度圖與用 Python 的製作方式拉～

【重點整理】

越前方的物件越亮（白），反之後方物件越暗（黑）
主要有透過對極幾何與機器學習的兩種方法，分別對應雙圖與單圖處理
BGR 與 RGB 的結果會有差異

【預計內容】

深度圖的介紹與應用
利用程式生成深度圖
- 2.1 Depth Map from Stereo Image
  - 該方法的生成原理
  - 實際執行與成果
- 2.2 Depth Map from Single Image with MiDaS Model
  - PyTorch 安裝
  - 實際執行與成果

【主要內容】

1. 深度圖的介紹與應用

深度圖在生活中主要是針對 3D 模型，分析各角度的前後關係，更準確的分析物體與場景的關係。

說是如此，那為何我們需要將 2D 圖片轉換深度圖呢？當電腦了解到深度後，如下期我們可以透過 point cloud 的方式生成出更具立體感的圖片，或嘗試套用為去背景的工具。

2. 利用程式生成深度圖

接下來，我們將透過「對極幾何」與「機器學習模型」的方式，分別對應雙圖片與單圖片的生成方式。

2.1 Depth Map from Stereo Image [2]

我們人眼之所以能看到立體感，多虧於左右眼的視差，而影像的運作也是如此，也正是我們接著使用的方法。那首先呢，你需要找好一個目標拍照(左)，接著往右併一步，再拍一張（右）。若不方便拍照，下方也有來自資料 [3] 所提供的範例圖片可以使用。

那在執行下方程式碼前，請記得先安裝好 pip install opencv-python matplotlib

Python

import cv2
from matplotlib import pyplot as plt
# Load the left and right image in grayscale
imgL = cv2.imread('bike_l.png', cv2.IMREAD_GRAYSCALE)
imgR = cv2.imread('bike_r.png', cv2.IMREAD_GRAYSCALE)
# Create a Block Matching process
stereo = cv2.StereoBM.create(numDisparities=0, blockSize=9)
# Run and generate depthmap by computing the spatial differences
disparity = stereo.compute(imgL, imgR)
# Display the image in gray colourmap
plt.imshow(disparity, 'gray')
plt.axis('off')
plt.show()

簡單講解下，StereoBM 後續的全稱為 Block Matching，當 numDisparties 與 blockSize 越小時，其結果會更為精確。而由於是透過比較視差的方式，若左右用同張圖就會呈現全黑的結果（代表沒有空間差異）。那在後來也有另外叫 StereoSGBM 較優化的演算法進行類似工作。

那這種方法還有個問題，就是左右缺失與多出的部分是不會比對到的，因此結果的左右側可能有條黑塊。

2.2 Depth Map from Single Image with MiDaS Model [4]

但針對繪畫或相片，我們通常只會有一張，沒有辦法透過對極幾何的方式比對出立體空間。這時，我們就必須仰賴機器學習的模型判斷。那我所使用的是 Pytorch 上的 MiDaS v2 [4]（在官方的 GitHub [5] 於 Dec 24, 2022 就已經推出 v3.1 了）。接著請讀者應設備需求與 GPU 支援版本安裝 Pytorch，另外透過 pip install timm 安裝 MiDaS 所依賴的套件。

Python

import cv2
import torch
from matplotlib import pyplot as plt

# Load the BGR image
image = cv2.imread('demo.png')
# Turn it into RGB
imageRGB = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Load the MiDaS Model and decide if use GPU or CPU
midas = torch.hub.load("intel-isl/MiDaS", "DPT_Large")
device = torch.device(
    "cuda") if torch.cuda.is_available() else torch.device("cpu")
midas.to(device)
midas.eval()

# Transform the image for input to MiDaS
midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms")
transform = midas_transforms.dpt_transform if hasattr(
    midas_transforms, 'dpt_transform') else midas_transforms.default_transform
input_batch = transform(imageRGB).to(device)

# Predict depth map
with torch.no_grad():
    prediction = midas(input_batch)
    prediction = torch.nn.functional.interpolate(prediction.unsqueeze(
        1), size=imageRGB.shape[:2], mode="bicubic", align_corners=False).squeeze()

# Store the result
depth_map = prediction.cpu().numpy()

# Show the grayscale of depthmap
plt.imshow(depth_map, 'gray')
plt.axis('off')
plt.show()

那我們知道 opencv 套件的預設是 BGR，那在色彩溫暖度最後是會影響到成效的，因此我們要先轉成 RGB，最後再依照 Pytorch 上的使用範例，轉換出與原圖同寬高的深度圖並透過 pyplot 呈現灰階版本。

【後話】

至此，這篇講述深度圖的基礎與簡單的生成方式。那在後續，我們將透過 MiDaS 生成的深度圖分別建立 point cloud 3D 模型與嘗試去除背景。

【參考資料】

[1] Pelagotti, Anna & Uccheddu, Francesca & Picchioni, F.. (2011). Automated texture registration on 3D models. Proceedings of SPIE – The International Society for Optical Engineering. 8180. 6-. 10.1117/12.898605.

[2] Depth Map from Stereo Images, Official Documentation for v3.4.20.
https://docs.opencv.org/3.4/dd/d53/tutorial_py_depthmap.html

[3] ArtyZiff35. ‘3D_Reconstruction_From_Stereo_Images,’ Dec 31, 2018, GitHub.
https://github.com/ArtyZiff35/3D_Reconstruction_From_Stereo_Images/tree/master/sampleRectified

[4] ‘MiDaS’, PyTorch.
https://pytorch.org/hub/intelisl_midas_v2/

[5] ‘MiDaS’, GitHub.
https://github.com/isl-org/MiDaS