G-Drive 智慧工坊 – 檔案偵測 002 – 向下探索與基礎分類

0. 前言


我們上次完成母資料夾內的掃瞄,然而為了區分畫風與標註作家,我們將預期有許多的子資料夾。我們也不可能手動新增 Google Spreadsheet 在各個子料夾,且若如此我們也難統一管理。

因此這次的目標就是透過迴圈與自我呼叫,掃描所有的子資料夾,並透過資料夾名稱或檔名進行分類。

1. 重點


透過迴圈搜尋檔案與子資料夾,再度呼叫方法在子資料夾向下迴圈掃描
善用 regex 或 split 進行文字處理

2. 內容


2.1. 確立分類依據

由於我目前的計畫,是針對與朋友共享的繪畫素材庫,那我們範例中的依據就是畫風。以下我先建立好幾種常見的畫風分類作為判斷依據,而圖的部分當然還是得之後人工上傳。

那由於我所下載的繪畫素材,有經過一定的命名處理為 {作者} – {作品} – {相簿順序} – {作品編號},由於是利用 – 分開屬性,並有兩個空格包住而非單個特殊符號,因此我夠避開例外並透過切割標註所有的作者名稱。

另外我們前面說過 G-Drive 的 Lazy-Loading 會讓找圖片不太方便,正巧多數畫家發布的作品都有自己的獨特風格,因此又另外建立如 @ {作家名稱} 的資料夾。由於我們給予 @ 標記,我們之後再分類作品風格時能依靠 regex 向上翻找直到沒有 @ 符號為止。

2.2. 建立迴圈掃瞄所有子資料夾

確立好索引後我們就可以開工了。那向下掃描的方法,其實也沒有各位想得困難,我們在母子料夾中除了掃描檔案外,再加上掃描所有子資料夾的步驟,並在過程中呼叫方法自身持續下去掃描子資料夾即可。

那我們取代舊有的方法,並稍微解釋下。

在最開始從 Google Spreadsheet 所在的母資料夾開始迴圈掃描,先檢查資料夾是否有新的檔案,若無則檢查是否有子資料夾。當有子資料夾,他們的 folderId 會被推到 stack,也就是會優先比他母資料夾同輩先進行掃描,直到整個分支掃完後才換到其他風個的資料夾。

JavaScript
// Search Files Newly Created
// 
// - Loop through stack to see if any folder need to be scan
// -- Comparing Files CreatedDate and Latestest File Created Time on last execution to see if is new
// -- Look into the subfolder
// ---- Push subfolder to stack to continue while-loop
// - Resort full fileList by file created date
// - Write attributes to spreadsheet
// - Reset property "lastSearchFilesCreatedTime" to latest
function searchFilesCreated() {
  // Getting the spreadsheet for data writing
  const sheetFilesDetection = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(PropertiesService.getScriptProperties().getProperty('sheetFilesDetection_sheetName'));

  // Latest file created time on last execution
  const lastSearchFilesCreatedTime = PropertiesService.getScriptProperties().getProperty('lastSearchFilesCreatedTime');

  // Init first folder path
  let folderStack = [DriveApp.getFileById(SpreadsheetApp.getActiveSpreadsheet().getId()).getParents().next().getId()];

  // List for search result
  let fileList = [];

  while (folderStack.length > 0) {
    // // Debugging
    // console.log(folderStack);

    // Choosing folder to run process
    let folderId = folderStack.pop();
    // Query
    let query = `createdTime > "${lastSearchFilesCreatedTime}" and trashed=false and mimeType != "application/vnd.google-apps.folder" and "${folderId}" in parents`;

    // Checkpoint
    let pageToken = "";

    do {
      try {
        const url = encodeURI(
          `https://www.googleapis.com/drive/v3/files?q=${query}&pageSize=1000&pageToken=${pageToken}`
        ); // Include driveId and includeItemsFromAllDrives parameters
        const res = UrlFetchApp.fetch(url, {
          headers: { authorization: "Bearer " + ScriptApp.getOAuthToken() },
        });
        const obj = JSON.parse(res.getContentText());
        if (obj.files && obj.files.length > 0) {
          fileList = [...fileList, ...obj.files];
        }
        pageToken = obj.nextPageToken;
      } catch (error) {
        console.error(`[ERROR] Error fetching files: ${error}`);
        break;
      }
    } while (pageToken);

    // Find all subfolders
    const folders = DriveApp.getFolderById(folderId).getFolders();
    // Add subfolders to stack
    while (folders.hasNext()) {
      folderStack.push(folders.next().getId());
    }
  }

  // Sorting by createdDate
  fileList.sort((a, b) => new Date(DriveApp.getFileById(a.id).getDateCreated()) - new Date(DriveApp.getFileById(b.id).getDateCreated()));

  // Debugging
  console.log(fileList);

  // Write attributes to spreadsheet on sequence
  fileList.forEach(json => {
    // Use ID to get Object of File Class
    file = DriveApp.getFileById(json.id);
    // Write to spreadsheet
    writeToSpreadsheet(sheetFilesDetection, file);
  });

  // There's new file
  if (fileList.length > 0) {
    // Store time of lastest file created time
    PropertiesService.getScriptProperties().setProperty('lastSearchFilesCreatedTime', DriveApp.getFileById((fileList[fileList.length - 1].id)).getDateCreated().toISOString());
  }
}

那接著,我將幾張新下載的圖片素材放入兩個子資料夾之中。再來就透過手動測試上述的 searchFilesCreated 方法,應該會最終只會看到的整個陣列結果。

2.3. 修改資料回傳模板

在上一步的操作中,我們還是維持上次的資料型態,那接著我們就需要新增畫風與作家的分類了。這邊就需要各位依據需求自行改動了。

那首先我要透過資料夾的名稱作為繪畫風格的依據,可以看到我多寫一個 getArtStyle() 用來向上搜尋,直到母資料夾名字中不包含 @ 標籤。並利用處理過的檔名分割出作品名稱與作家。我也有稍微修改以前的變數名稱增加對我自己的識別度。

JavaScript
// Write attributes back to spreadsheet
// 
// @ Artstyle
// @ Author Name
// @ File Name
// @ File URL
// @ Folder URL
// @ File Created Date
// @ File Modified Date 
function writeToSpreadsheet(sheet, file) {
  // Get attributes
  // -- File Name
  const parts = file.getName().split(' - ');
  // ---- Author Name
  const authorName = parts[0];
  // ---- Work Name
  const workName = parts[1];
  // -- File URL
  const workURL = file.getUrl();
  // -- Folder URL
  const folderURL = file.getParents().next().getUrl();
  // -- File Created Date
  const formattedDateCreated = Utilities.formatDate(file.getDateCreated(), Session.getScriptTimeZone(), "yyyy-MM-dd HH:mm:ss");
  // -- File Modified Date
  const formattedDateModified = Utilities.formatDate(file.getLastUpdated(), Session.getScriptTimeZone(), "yyyy-MM-dd HH:mm:ss");

  // Get the header row
  const columnHeaders = sheet.getRange(1, 1, 1, sheet.getLastColumn()).getValues()[0];

  // Index
  const indexWorkCategory = columnHeaders.indexOf('繪畫風格');
  const indexWorkAuthor = columnHeaders.indexOf('作家名稱');
  const indexDateCreated = columnHeaders.indexOf('建檔時間');
  const indexDateModified = columnHeaders.indexOf('更新時間');
  const indexFolderURL = columnHeaders.indexOf('母資料夾');
  const indexWorkName = columnHeaders.indexOf('作品名稱');
  const indexWorkURL = columnHeaders.indexOf('檔案位置');

  // Prepare the data array for the new row
  let row = new Array(columnHeaders.length).fill('');

  row[indexWorkCategory] = getArtStyle(file.getParents().next()); 
  row[indexWorkAuthor] = authorName || "Unknown";
  row[indexDateCreated] = formattedDateCreated;
  row[indexDateModified] = formattedDateModified;
  row[indexFolderURL] = folderURL;
  row[indexWorkName] = workName || "Undefined";
  row[indexWorkURL] = workURL;

  // Write to last row in sheet
  sheet.appendRow(row);
}

// Get artstyle based on folder name
// 
// - Look up parent folder until no more @ in folder name
function getArtStyle(folder) {
    // With @ -> Artist portfolio
    if (folder.getName().indexOf('@') !== -1) {
        // Find it's parent 
        do {
            folder = folder.getParents().next();
        } while (folder.getName().indexOf('@') !== -1);
    }
    // Art style
    return folder.getName();
}

我們刪除步驟二與之前寫入的試算表資料,並撥前 Project Properties 的時間紀錄,再度重新測試。接著檢查試算表,應該會依照建立時間排序,並能看到確實有依照新的模板與分類進行。

3. 後話


到這邊我們就成功完成迴圈掃描所有新增的底層資料並進行分配了!但我們可以想像可能會有搬遷予刪除的情況,我們也該因應而修改或刪除紀錄。

那下一回我們就要依靠類似的架構,新增查詢修改與刪除的方法。

4. 參考


[1] Sorting Date format
https://stackoverflow.com/questions/10123953/how-to-sort-an-object-array-by-date-property

[2] Drive API — Official Doc
https://developers.google.com/drive/api/reference/rest/v3/files/list

5. 素材


[1] QuAn_ — pixiv
https://www.pixiv.net/users/6657532

[2] P.art — pixiv
https://www.pixiv.net/en/users/35929537

6. 成品


JavaScript
// Find parent folder by where the spreadsheet is stored
function getParentFolder() {
  // Get the active spreadsheet
  var spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
  // Get the ID of the spreadsheet
  var spreadsheetId = spreadsheet.getId();

  // Get the parent folders of the spreadsheet
  var parentFolders = DriveApp.getFileById(spreadsheetId).getParents();

  // Check if there are any parent folders
  if (parentFolders.hasNext()) {
    // Assuming we only have a parent folder
    var parentFolder = parentFolders.next();

    // For testing purpose
    console.log(`Parent Folder: ${parentFolder.getUrl()}`);

    // Return the folder object
    return parentFolder;
  }
  // If there are no parent folders, return null 
  else {
    console.error("[ERROR] Critical Error on Getting Parent URL");

    return null;
  }
}

// Init setting in project propeties
function init() {
  // Sheet Name
  const sheetFilesDetection_sheetName = "總表";

  try {
    PropertiesService.getScriptProperties().setProperties({
      'sheetFilesDetection_sheetName': sheetFilesDetection_sheetName,
      // Init lastSearchFilesCreatedTime
      'lastSearchFilesCreatedTime': new Date().toISOString()
    });
    console.info("[ O K ] Successfully initalize the app.");
  } catch (error) { console.error(`[ERROR] Failed to initialize the app: ${error}`); }
}

// Search Files Newly Created
// 
// - Loop through stack to see if any folder need to be scan
// -- Comparing Files CreatedDate and Latestest File Created Time on last execution to see if is new
// -- Look into the subfolder
// ---- Push subfolder to stack to continue while-loop
// - Resort full fileList by file created date
// - Write attributes to spreadsheet
// - Reset property "lastSearchFilesCreatedTime" to latest
function searchFilesCreated() {
  // Getting the spreadsheet for data writing
  const sheetFilesDetection = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(PropertiesService.getScriptProperties().getProperty('sheetFilesDetection_sheetName'));

  // Latest file created time on last execution
  const lastSearchFilesCreatedTime = PropertiesService.getScriptProperties().getProperty('lastSearchFilesCreatedTime');

  // Init first folder path
  let folderStack = [DriveApp.getFileById(SpreadsheetApp.getActiveSpreadsheet().getId()).getParents().next().getId()];

  // List for search result
  let fileList = [];

  while (folderStack.length > 0) {
    // // Debugging
    // console.log(folderStack);

    // Choosing folder to run process
    let folderId = folderStack.pop();
    // Query
    let query = `createdTime > "${lastSearchFilesCreatedTime}" and trashed=false and mimeType != "application/vnd.google-apps.folder" and "${folderId}" in parents`;

    // Checkpoint
    let pageToken = "";

    do {
      try {
        const url = encodeURI(
          `https://www.googleapis.com/drive/v3/files?q=${query}&pageSize=1000&pageToken=${pageToken}`
        ); // Include driveId and includeItemsFromAllDrives parameters
        const res = UrlFetchApp.fetch(url, {
          headers: { authorization: "Bearer " + ScriptApp.getOAuthToken() },
        });
        const obj = JSON.parse(res.getContentText());
        if (obj.files && obj.files.length > 0) {
          fileList = [...fileList, ...obj.files];
        }
        pageToken = obj.nextPageToken;
      } catch (error) {
        console.error(`[ERROR] Error fetching files: ${error}`);
        break;
      }
    } while (pageToken);

    // Find all subfolders
    const folders = DriveApp.getFolderById(folderId).getFolders();
    // Add subfolders to stack
    while (folders.hasNext()) {
      folderStack.push(folders.next().getId());
    }
  }

  // Sorting by createdDate
  fileList.sort((a, b) => new Date(DriveApp.getFileById(a.id).getDateCreated()) - new Date(DriveApp.getFileById(b.id).getDateCreated()));

  // Debugging
  console.log(fileList);

  // Write attributes to spreadsheet on sequence
  fileList.forEach(json => {
    // Use ID to get Object of File Class
    file = DriveApp.getFileById(json.id);
    // Write to spreadsheet
    writeToSpreadsheet(sheetFilesDetection, file);
  });

  // There's new file
  if (fileList.length > 0) {
    // Store time of lastest file created time
    PropertiesService.getScriptProperties().setProperty('lastSearchFilesCreatedTime', DriveApp.getFileById((fileList[fileList.length - 1].id)).getDateCreated().toISOString());
  }
}

// Write attributes back to spreadsheet
// 
// @ Artstyle
// @ Author Name
// @ File Name
// @ File URL
// @ Folder URL
// @ File Created Date
// @ File Modified Date 
function writeToSpreadsheet(sheet, file) {
  // Get attributes
  // -- File Name
  const parts = file.getName().split(' - ');
  // ---- Author Name
  const authorName = parts[0];
  // ---- Work Name
  const workName = parts[1];
  // -- File URL
  const workURL = file.getUrl();
  // -- Folder URL
  const folderURL = file.getParents().next().getUrl();
  // -- File Created Date
  const formattedDateCreated = Utilities.formatDate(file.getDateCreated(), Session.getScriptTimeZone(), "yyyy-MM-dd HH:mm:ss");
  // -- File Modified Date
  const formattedDateModified = Utilities.formatDate(file.getLastUpdated(), Session.getScriptTimeZone(), "yyyy-MM-dd HH:mm:ss");

  // Get the header row
  const columnHeaders = sheet.getRange(1, 1, 1, sheet.getLastColumn()).getValues()[0];

  // Index
  const indexWorkCategory = columnHeaders.indexOf('繪畫風格');
  const indexWorkAuthor = columnHeaders.indexOf('作家名稱');
  const indexDateCreated = columnHeaders.indexOf('建檔時間');
  const indexDateModified = columnHeaders.indexOf('更新時間');
  const indexFolderURL = columnHeaders.indexOf('母資料夾');
  const indexWorkName = columnHeaders.indexOf('作品名稱');
  const indexWorkURL = columnHeaders.indexOf('檔案位置');

  // Prepare the data array for the new row
  let row = new Array(columnHeaders.length).fill('');

  row[indexWorkCategory] = getArtStyle(file.getParents().next());
  row[indexWorkAuthor] = authorName || "Unknown";
  row[indexDateCreated] = formattedDateCreated;
  row[indexDateModified] = formattedDateModified;
  row[indexFolderURL] = folderURL;
  row[indexWorkName] = workName || "Undefined";
  row[indexWorkURL] = workURL;

  // Write to last row in sheet
  sheet.appendRow(row);
}

// Get artstyle based on folder name
// 
// - Look up parent folder until no more @ in folder name
function getArtStyle(folder) {
  // With @ -> Artist portfolio
  if (folder.getName().indexOf('@') !== -1) {
    // Find it's parent 
    do {
      folder = folder.getParents().next();
    } while (folder.getName().indexOf('@') !== -1);
  }
  // Art style
  return folder.getName();
}

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.