Pro Git 7.4 Git 強制策略實例

出自DILA Wiki

在本節中,我們應用前面學到的知識建立這樣一個 Git 工作流程:檢查提交資訊的格式,只接受純 fast-forward 內容的推送,並且指定用戶只能修改專案中的特定子目錄。我們將寫一個用戶端腳本來提示開發人員他們推送的內容是否會被拒絕,以及一個伺服端腳本來實際執行這些策略。

這些腳本使用 Ruby 寫成,一半由於它是作者傾向的指令碼語言(scripting language),另外作者覺得它是最接近偽代碼(pseudocode-looking)的指令碼語言;因而即便你不使用 Ruby 也能大致看懂。不過任何其他語言也一樣適用。所有 Git 自帶的樣例腳本都是用 Perl 或 Bash 寫的。所以從這些腳本中能找到相當多的這兩種語言的掛鉤樣例。

服務端掛鉤

所有服務端的工作都在 hooks(掛鉤)目錄的 update(更新)腳本中制定。update 腳本為每一個得到推送的分支運行一次;它接受推送目標的索引,該分支原來指向的位置,以及被推送的新內容。如果推送是通過 SSH 進行的,還可以獲取發出此次操作的用戶。如果設定所有操作都通過公鑰授權的單一帳號(比如"git")進行,就有必要通過一個 shell 包裝依據公鑰來判斷用戶的身份,並且設定環境變數來表示該使用者的身份。下面假設嘗試連接的使用者儲存在 $USER 環境變數裡,我們的 update 腳本首先搜集一切需要的資訊:

#!/usr/bin/env ruby

$refname = ARGV[0]
$oldrev  = ARGV[1]
$newrev  = ARGV[2]
$user    = ENV['USER']

puts "Enforcing Policies... \n(#{$refname}) (#{$oldrev[0,6]}) (#{$newrev[0,6]})"

沒錯,我在用全域變數。別鄙視我——這樣比較利於演示過程。

強制特定的提交資訊格式

我們的第一項任務是指定每一條提交資訊都必須遵循某種特殊的格式。作為演示,假定每一條資訊必須包含一條形似 “ref: 1234” 這樣的字串,因為我們需要把每一次提交連結到專案問題追蹤系統裏面的工作項目。我們要逐一檢查每一條推送上來的提交內容,看看提交資訊是否包含這麼一個字串,然後,如果該提交裡不包含這個字串,以非零返回值退出從而拒絕此次推送。

把 $newrev 和 $oldrev 變數的值傳給一個叫做 git rev-list 的 Git plumbing 命令可以獲取所有提交內容的 SHA-1 值列表。git rev-list 基本類似 git log 命令,但它預設只輸出 SHA-1 值而已,沒有其他資訊。所以要獲取由 SHA 值表示的從一次提交到另一次提交之間的所有 SHA 值,可以執行:

$ git rev-list 538c33..d14fc7
d14fc7c847ab946ec39590d87783c69b031bdfb7
9f585da4401b0a3999e84113824d15245c13f0be
234071a1be950e2a8d078e6141f5cd20c1e61ad3
dfa04c9ef3d5197182f13fb5b9b1fb7717d2222a
17716ec0f1ff5c77eff40b7fe912f9f6cfd0e475

截取這些輸出內容,迴圈遍歷其中每一個 SHA 值,找出與之對應的提交資訊,然後用規則運算式(regular expression)來測試該資訊包含的格式化的內容。

下面要搞定如何從所有的提交內容中提取出提交資訊。使用另一個叫做 git cat-file 的 Git plumbing 工具可以獲得原始的提交資料。我們將在第九章瞭解到這些 plumbing 工具的細節;現在暫時先看一下這條命令的輸出:

$ git cat-file commit ca82a6
tree cfda3bf379e4f8dba8717dee55aab78aef7f4daf
parent 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
author Scott Chacon <schacon@gmail.com> 1205815931 -0700
committer Scott Chacon <schacon@gmail.com> 1240030591 -0700

changed the version number

通過 SHA-1 值獲得提交內容中的提交資訊的一個簡單辦法是找到提交的第一個空白行,然後取出它之後的所有內容。可以使用 Unix 系統的 sed 命令來實現該效果:

$ git cat-file commit ca82a6 | sed '1,/^$/d'
changed the version number

這條咒語從每一個待提交內容裡提取提交資訊,並且會在提取資訊不符合要求的情況下退出。為了退出腳本和拒絕此次推送,返回一個非零值。整個腳本大致如下:

$regex = /\[ref: (\d+)\]/

# 指定提交資訊格式
def check_message_format
  missed_revs = `git rev-list #{$oldrev}..#{$newrev}`.split("\n")
  missed_revs.each do |rev|
    message = `git cat-file commit #{rev} | sed '1,/^$/d'`
    if !$regex.match(message)
      puts "[POLICY] Your message is not formatted correctly"
      exit 1
    end
  end
end
check_message_format

把這一段放在 update 腳本裡,所有包含不符合指定規則的提交都會遭到拒絕。

實現基於使用者的存取權限控制清單(ACL)系統

假設你需要添加一個使用存取權限控制列表的機制來指定哪些使用者對專案的哪些部分有推送許可權。某些使用者具有全部的訪問權,其他人只對某些子目錄或者特定的檔具有推送許可權。要搞定這一點,所有的規則將被寫入一個位於伺服器的原始 Git 倉庫的 acl 檔。我們讓 update 掛鉤檢閱這些規則,審視推送的提交內容中需要修改的所有檔,然後決定執行推送的用戶是否對所有這些檔都有許可權。

我們首先要創建這個列表。這裡使用的格式和 CVS 的 ACL 機制十分類似:它由若干行構成,第一欄的內容是 avail 或者 unavail;下一欄是由逗號分隔的使用者清單,列出這項規則會應用到哪些使用者;最後一欄是規則生效的目錄(空白表示開放訪問)。這些欄位由 | 字元隔開。

下例中,我們指定幾個管理員,幾個對 doc 目錄具有許可權的文檔作者,以及一個對 lib 和 tests 目錄具有許可權的開發人員,相應的 ACL 文件如下:

avail|nickh,pjhyett,defunkt,tpw
avail|usinclair,cdickens,ebronte|doc
avail|schacon|lib
avail|schacon|tests

首先把這些資料讀入到你所能使用的資料結構。本例中,為保持簡潔,我們暫時只實現 avail 的規則(譯注:也就是省略了 unavail 部分)。下面這個方法生成一個關聯陣列,它的主鍵是用戶名,值是一個該用戶有寫許可權的所有目錄組成的陣列:

def get_acl_access_data(acl_file)
  # read in ACL data
  acl_file = File.read(acl_file).split("\n").reject { |line| line == '' }
  access = {}
  acl_file.each do |line|
    avail, users, path = line.split('|')
    next unless avail == 'avail'
    users.split(',').each do |user|
      access[user] ||= []
      access[user] << path
    end
  end
  access
end

針對之前給出的 ACL 規則檔,這個 get_acl_access_data 方法回傳的資料結構如下:

{"defunkt"=>[nil],
 "tpw"=>[nil],
 "nickh"=>[nil],
 "pjhyett"=>[nil],
 "schacon"=>["lib", "tests"],
 "cdickens"=>["doc"],
 "usinclair"=>["doc"],
 "ebronte"=>["doc"]}

搞定了使用者許可權的資料,下面需要找出這次推送的提交之中,哪些位置被修改,從而確保試圖推送的使用者對這些位置有全部的許可權。

使用 git log 的 --name-only 選項(在第二章裡簡單的提過)我們可以輕而易舉的找出一次提交裡修改的檔:

$ git log -1 --name-only --pretty=format:'' 9f585d

README
lib/test.rb

使用 get_acl_access_data 回傳的 ACL 結構來一一核對每一次提交修改的檔案列表,就能找出該用戶是否有許可權推送所有的提交內容:

# 僅允許特定用戶修改專案中的特定子目錄
def check_directory_perms
  access = get_acl_access_data('acl')

  # 檢查是否有人在向他沒有許可權的地方推送內容
  new_commits = `git rev-list #{$oldrev}..#{$newrev}`.split("\n")
  new_commits.each do |rev|
    files_modified = `git log -1 --name-only --pretty=format:'' #{rev}`.split("\n")
    files_modified.each do |path|
      next if path.size == 0
      has_file_access = false
      access[$user].each do |access_path|
        if !access_path  # 用戶擁有完全存取權限
          || (path.index(access_path) == 0) # 或者對此位置有存取權限
          has_file_access = true 
        end
      end
      if !has_file_access
        puts "[POLICY] You do not have access to push to #{path}"
        exit 1
      end
    end
  end  
end

check_directory_perms

以上的大部分內容應該都比較容易理解。通過 git rev-list 獲取推送到伺服器內容的提交清單。然後,針對其中每一項,找出它試圖修改的檔,然後確保執行推送的用戶對這些檔具有許可權。一個不太容易理解的 Ruby 技巧是 path.index(access_path) ==0 這句,如果路徑以 access_path 開頭,它會回傳 True——這是為了確保 access_path 並不是只在允許的路徑之一,而是所有准許全選的目錄都在該目錄之下。

現在你的用戶沒法推送帶有不正確的提交資訊的內容,也不能在准許他們訪問範圍之外的位置做出修改。

只允許 Fast-Forward 類型的推送

剩下的最後一項任務是指定只接受 fast-forward 的推送。在 Git 1.6 或者更新版本裡,只需要設定 receive.denyDeletes 和 receive.denyNonFastForwards 選項就可以了。但是通過掛鉤的實現可以在舊版本的 Git 上工作,並且通過一定的修改,它可以做到只針對某些用戶執行,或者更多以後可能用到的規則。

檢查這一項的邏輯是看看提交裡是否包含從舊版本裡能找到但在新版本裡卻找不到的內容。如果沒有,那這是一次純 fast-forward 的推送;如果有,那我們拒絕此次推送:

# 只允許純 fast-forward 推送
def check_fast_forward
  missed_refs = `git rev-list #{$newrev}..#{$oldrev}`
  missed_ref_count = missed_refs.split("\n").size
  if missed_ref_count > 0
    puts "[POLICY] Cannot push a non fast-forward reference"
    exit 1
  end
end

check_fast_forward

一切都設定好了。如果現在執行 chmod u+x .git/hooks/update —— 修改包含以上內容檔的許可權,然後嘗試推送一個包含非 fast-forward 類型的索引,會得到類似如下:

$ git push -f origin master
Counting objects: 5, done.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 323 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
Enforcing Policies... 
(refs/heads/master) (8338c5) (c5b616)
[POLICY] Cannot push a non-fast-forward reference
error: hooks/update exited with error code 1
error: hook declined to update refs/heads/master
To git@gitserver:project.git
 ! [remote rejected] master -> master (hook declined)
error: failed to push some refs to 'git@gitserver:project.git'

這裡有幾個有趣的資訊。首先,我們可以看到掛鉤運行的起點:

Enforcing Policies... 
(refs/heads/master) (fb8c72) (c56860)

注意這是從 update 腳本開頭輸出到標準輸出的。所有從腳本輸出的提示都會發送到用戶端,這點很重要。

下一個值得注意的部分是錯誤資訊。

[POLICY] Cannot push a non fast-forward reference
error: hooks/update exited with error code 1
error: hook declined to update refs/heads/master

第一行是我們的腳本輸出的,在往下是 Git 在告訴我們 update 腳本退出時返回了非零值因而推送遭到了拒絕。最後一點:

To git@gitserver:project.git
 ! [remote rejected] master -> master (hook declined)
error: failed to push some refs to 'git@gitserver:project.git'

我們將為每一個被掛鉤拒之門外的索引接受到一條遠端資訊,解釋它被拒絕是因為一個掛鉤的原因。

而且,如果那個 ref 字串沒有包含在任何的提交裡,我們將看到前面腳本裡輸出的錯誤資訊:

[POLICY] Your message is not formatted correctly

又或者某人想修改一個自己不具備許可權的檔,然後推送了一個包含它的提交,他將看到類似的提示。比如,一個文檔作者嘗試推送一個修改到 lib 目錄的提交,他會看到

[POLICY] You do not have access to push to lib/test.rb

全在這了。從這裡開始,只要 update 腳本存在並且可執行,我們的倉庫永遠都不會遭到回轉(rewound),或者包含不符合要求資訊的提交內容,並且使用者都被鎖在了沙箱裡面。

Client-Side Hooks

The downside to this approach is the whining that will inevitably result when your users’ commit pushes are rejected. Having their carefully crafted work rejected at the last minute can be extremely frustrating and confusing; and furthermore, they will have to edit their history to correct it, which isn’t always for the faint of heart.

The answer to this dilemma is to provide some client-side hooks that users can use to notify them when they’re doing something that the server is likely to reject. That way, they can correct any problems before committing and before those issues become more difficult to fix. Because hooks aren’t transferred with a clone of a project, you must distribute these scripts some other way and then have your users copy them to their .git/hooks directory and make them executable. You can distribute these hooks within the project or in a separate project, but there is no way to set them up automatically.

首先,你應該在每次提交前核查你的提交注釋資訊,這樣你才能確保伺服器不會因為不合條件的提交注釋資訊而拒絕你的更改。為了達到這個目的,你可以增加 ’commit-msg’ 掛鉤。如果你使用該掛鉤來閱讀作為第一個參數傳遞給git的提交注釋資訊,並且與規定的模式作對比,你就可以使git在提交注釋資訊不符合條件的情況下,拒絕執行提交。

#!/usr/bin/env ruby 
message_file = ARGV0 
message = File.read(message_file)

$regex = /\[ref: (\d+)\]/

if !$regex.match(message)
  puts "[POLICY] Your message is not formatted correctly"
  exit 1
end

如果這個腳本放在這個位置 (.git/hooks/commit-msg) 並且是可執行的, 並且你的提交注釋資訊不是符合要求的,你會看到:

$ git commit -am 'test'
[POLICY] Your message is not formatted correctly

在這個實例中,提交沒有成功。然而如果你的提交注釋資訊是符合要求的,git會允許你提交:

$ git commit -am 'test [ref: 132]'
[master e05c914] test [ref: 132]
 1 files changed, 1 insertions(+), 0 deletions(-)

Next, you want to make sure you aren’t modifying files that are outside your ACL scope. If your project’s .git directory contains a copy of the ACL file you used previously, then the following pre-commit script will enforce those constraints for you:

#!/usr/bin/env ruby

$user    = ENV['USER']

# [ insert acl_access_data method from above ]

# only allows certain users to modify certain subdirectories in a project
def check_directory_perms
  access = get_acl_access_data('.git/acl')

  files_modified = `git diff-index --cached --name-only HEAD`.split("\n")
  files_modified.each do |path|
    next if path.size == 0
    has_file_access = false
    access[$user].each do |access_path|
    if !access_path || (path.index(access_path) == 0)
      has_file_access = true
    end
    if !has_file_access
      puts "[POLICY] You do not have access to push to #{path}"
      exit 1
    end
  end
end

check_directory_perms

This is roughly the same script as the server-side part, but with two important differences. First, the ACL file is in a different place, because this script runs from your working directory, not from your Git directory. You have to change the path to the ACL file from this

access = get_acl_access_data('acl')

to this:

access = get_acl_access_data('.git/acl')

The other important difference is the way you get a listing of the files that have been changed. Because the server-side method looks at the log of commits, and, at this point, the commit hasn’t been recorded yet, you must get your file listing from the staging area instead. Instead of

files_modified = `git log -1 --name-only --pretty=format:'' #{ref}`

you have to use

files_modified = `git diff-index --cached --name-only HEAD`

But those are the only two differences — otherwise, the script works the same way. One caveat is that it expects you to be running locally as the same user you push as to the remote machine. If that is different, you must set the $user variable manually.

The last thing you have to do is check that you’re not trying to push non-fast-forwarded references, but that is a bit less common. To get a reference that isn’t a fast-forward, you either have to rebase past a commit you’ve already pushed up or try pushing a different local branch up to the same remote branch.

Because the server will tell you that you can’t push a non-fast-forward anyway, and the hook prevents forced pushes, the only accidental thing you can try to catch is rebasing commits that have already been pushed.

Here is an example pre-rebase script that checks for that. It gets a list of all the commits you’re about to rewrite and checks whether they exist in any of your remote references. If it sees one that is reachable from one of your remote references, it aborts the rebase:

#!/usr/bin/env ruby

base_branch = ARGV[0]
if ARGV[1]
  topic_branch = ARGV[1]
else
  topic_branch = "HEAD"
end

target_shas = `git rev-list #{base_branch}..#{topic_branch}`.split("\n")
remote_refs = `git branch -r`.split("\n").map { |r| r.strip }

target_shas.each do |sha|
  remote_refs.each do |remote_ref|
    shas_pushed = `git rev-list ^#{sha}^@ refs/remotes/#{remote_ref}`
    if shas_pushed.split(“\n”).include?(sha)
      puts "[POLICY] Commit #{sha} has already been pushed to #{remote_ref}"
      exit 1
    end
  end
end

This script uses a syntax that wasn’t covered in the Revision Selection section of Chapter 6. You get a list of commits that have already been pushed up by running this:

git rev-list ^#{sha}^@ refs/remotes/#{remote_ref}

The SHA^@ syntax resolves to all the parents of that commit. You’re looking for any commit that is reachable from the last commit on the remote and that isn’t reachable from any parent of any of the SHAs you’re trying to push up — meaning it’s a fast-forward.

The main drawback to this approach is that it can be very slow and is often unnecessary — if you don’t try to force the push with -f, the server will warn you and not accept the push. However, it’s an interesting exercise and can in theory help you avoid a rebase that you might later have to go back and fix.