中古佛教寫本資料庫工作手冊

出自DILA Wiki
於 2015年12月7日 (一) 16:23 由 imported>Blueve.tw 所做的修訂

中古佛教寫本資料庫工作手冊
The Database on the Grammar of Medieval Chinese
Date: 2015-7-01 Transcription, Encoding:Lin Ching-hui 林靜慧, Training, Administration:Zhang Boyong 張伯雍, Co-director:Hung Jen-jou 洪振洲, Ann Heirman, Marcus Bingenheimer 馬德偉

本計畫為中華佛研所、馬德偉博士(Dr. Marcus Bingenheimer, Temple University)、太史文博士 (顧問)(Dr. Stephen F. Teiser, Princeton University)與大英圖書館國際敦煌專案(International Dunhuang Project, IDP)的合作專案。專案說明詳見計畫書。本計畫網站

寫卷數位化

數位化的寫卷標題範例:lengqieszj-S-4272.xml。S-4272 為敦煌文獻號碼。
所使用的標記規範為 TEI P5

寫卷結構與標記

  • 以敦煌寫卷「件」為單位,即每一個檔案即為一件敦煌寫卷(如 S.4272)
行號標記例 <lb xml:id="S-4272-0001"/>
空格標記例 Space <space type="honorific" unit="char" extent="1"/> Honorific.jpg

P-3664-0662

<space type="punctuation" unit="char" extent="1"/> 空格.jpg

S-4272-0008

<space type="bindingHole" unit="char" extent="1"/> BindingHole.jpg

P-4646-01-03

<space type="simpleSpace" unit="char" extent="1"/> SimpleSpace.jpg

S-2054-0192

<space type="verseSpacing" unit="char" extent="1"/> VerseSpacing.jpg

P-2634-0002

異寫字標記例 Choice <orig reg="偽"><g ref="#S4272-005-11"/></orig>(專案新增) 異寫1.jpg

S-4272-0005

<orig reg="障"><g ref="#A04441-003"/></orig>(教育部異體字字典) 異寫2.jpg

S-4272-0013

取代標記例 Substitute <subst><del>无</del><add>有</add></subst> 取代1.jpg

S-4272-0005

<subst><del unit="char" extent="1"/><add>心</add></subst> 取代2.jpg

S-4272-0021

<subst><del><orig reg="薩"><g ref="#A03580-001"/></orig></del><add place="inline-right">提</add></subst> 取代3.jpg

P-3436-0037

<subst><del hand="2">然見性</del><add place="inline-right" hand="2">明</add></subst> 取代4.jpg

P-3777-0540

插入標記例 Addition <add place="inline-right">性</add> 插入.jpg

S-4272-0009

倒乙標記例 Reverse <orig reg="不出">出<add place="inline-right">㆑</add>不</orig> 倒乙符.jpg(不出)

P-3436-0037

<lb xml:id="P-3436-0206"/><orig type="CJK" reg="坐"><add place="inline-right">㆑</add>浄</orig> 行首倒乙.jpg 行首倒乙1.jpg

P-3436-0206

<orig reg="苐二魏朝"><g ref="#A04688-002"/>朝<add place="inline-right"><note resp="hand2">向上</note></add>苐二</orig> 倒乙說明.jpg(苐二魏朝)

P-3436-0110

補充修改標記例 Addition <add place="margin-bottom">軰</add> 修改補充.jpg

P-3436-0056

破損標記例 Damage <damage>使鬼神</damage> 破損1.jpg

P-3436-0057

<damage unit="char" extent="1"/>

<choice><unclear><damage unit="char" extent="1"/></unclear><reg>諸</reg></choice>

破損2.jpg

P-3436-0011

字跡不清標記例 Unclear <unclear>斷</unclear> 字跡不清.jpg

P-3436-0070

難辨字標記例 Gap <gap unit="char" extent="1"/> 難辨.jpg

P-3703-0011

省書例 Abbreviations <choice>卄卄<expan>菩薩</expan></choice> 省書.jpg

P-2634-0010

<choice>阿〻難〻<expan>阿難阿難</expan></choice> 重文例2.png

P-3664-0511

重文例 Repeat sign <choice>種〻<expan>種種</expan></choice> 重文例3.png

P-3664-0500

雙行夾注例 Inline-para <note resp="hand1" rendition="#inline-para">在舒州一名思空山</note> 雙行夾注.jpg

P-3559-0567

副標例 Subtitle <hi rendition="#subtitle">并序</hi> 副標.jpg

P-2634-0001

廢字例 Deletion

(感謝 汪娟教授來函建議)

者<del>者</del>非 廢字.jpg

P-2460-0068v

<del>清浄</del>解 廢字2.jpg

P-4646-08-04r

專案訂正例 Corrections by project <choice><sic>光濡</sic><corr>先儒</corr></choice><note>見《左傳‧春秋序》。</note>不取 專案訂正例.jpg

P-2634-0038r

偈文例 Verse line <lg><l><choice><orig><g ref="#A02941-036"/></orig><reg>稽</reg></choice>首<choice><orig><g ref="#A03222-001"/></orig><reg>善</reg></choice>知識<space type="verseSpacing" unit="char" extent="1"/><damage><choice><orig type="Ext-A">䏻</orig><reg>能</reg></choice>令<choice><orig><g ref="#P2634-002-08"/></orig><reg>護</reg></choice></damage>本心</l></lg> 偈文.jpg

P-2634-0002r

進階說明──文字迻錄原則

  • 原則一、不論原文使用何種字體(楷書、行書、草書等),皆迻錄為楷書(楷化)。
  • 原則二、Unicode 有提供字型者,按原字形迻錄。如:㘴,不改成為教育部標準字體(正字)「坐」。
Non-Unicode Variants - attested 萬國碼未收之異體字──已確認
  1. 萬國碼未收之字形。The variant character is not in Unicode.
  2. 教育部異體字字典有收錄者。It is attested in the "Dictionary of Chinese Character Variants" 教育部異體字字典 (Ministry of Education, RoC, 2012). Current Query Interface: http://dict2.variants.moe.edu.tw/variants/.
  3. 教育部異體字字典對該字的編號會記錄在<g>@ref標記中。 <g>@ref points to a header item which references the character number of the variant in the MoE Dictionary.
  4. 能夠以所對應的正字表達出來。 It can be represented by a semantically equivalent common character (通用字).
Ex.1: S-4272-0002:

為除忘相<choice><orig><g ref="#A03335-004"/></orig><reg>修</reg></choice>行六度

OrigRegChoice.png
Non-Unicode Variants - unattested 萬國碼未收之異體字──未確認(專案新增)
  1. 萬國碼與教育部異體字字典均未收錄。The character is neither in Unicode nor in the MoE Dictionary.
  2. 但字形結構上能夠分辨者。 Use this only for characters where the stroke count is clearly legible.
Ex.1: S-4272-0022:

度眾生過去<choice><orig><g ref="#S4272-022-14"/></orig><reg>逢</reg></choice>无量恒

Reg1.png
"Unclear" Characters 模糊字
  1. <unclear> 是一個較鬆散的解釋,此類字多受到摹寫字跡以及古代字形的影響。<unclear> is much open to interpretation. It is influenced strongly by the quality of the facsimile and the level of paleographic skills.
  2. 標記此類文字時通常需借助其他版本的文獻,而不能由文本直接辨認出來。We use it in this project when the character and its stroke structure are not recognizable on their own, but only by comparing with other versions.
  3. 所有的<unclear>都能理解為某個正字,但與<reg>不同的是<unclear>文字結構模糊,而<reg>的文字結構清晰。All <unclear> are understood as 通用字, this form of regularization differes from <reg>, however, because with <unclear> the intended variant is unknown. With <reg> the shape/stroke structure of the variant character is seen.
Ex.1: P-3703-0002:

無有邊<unclear>畔坐</unclear>

Unclear1.png
Significant spaces 文中的空格
  1. 另起一段或徵引文獻時。Intentional, significant space before new sections (Ex. 1) or quotations (Ex.1).
  2. 文獻末尾空格不標記。No <space> needed at end of a Ms folio.
Ex.1: S-4272-0008 - S-4272-0010:

為中道<space unit="char" extent="2"/>苐三齊朝 人年十四遇達摩禪師 真登佛果<space unit="char" extent="1"/>楞伽経云

Space1.png
Character(s) added in the Ms. 插入字
  1. 文中有人插入文字。Character(s) added by a scribe in the Ms.
  2. 大致描述插入字位置。@place gives rough description where to find it.
Ex.1: S-4272-0009:

禪師俗<add place="inline-right">性</add>姖武窂人

Add1.png
Character(s) Overwrite other Character(s): 覆蓋字

被覆蓋的字若不清楚則使用,清楚則轉錄出來,不確定則使用<unclear>。If the overwritten character is illegible use , if legible give character, if unsure use <unclear>.

Ex.1: S-4272-0021:

為是知眾生識<subst><del unit="char" extent="1"/><add>心</add></subst>自度

Subst1.png
Damaged but recognizable characters 破損字

<damage>與<unclear>近似,標記中直接使用正字。(範例中的字也可以識別為「忕」或「𢗗」,此處依另一版本。)<damage> is similar to <unclear> in that the text provided should be considered 通用字 as the variant can not be distinguished clearly.

Ex.1: P-3703-0001 :

時<damage>狀</damage>𠰥

Damage1.png
Unrecognizable characters due to accidental damage (tearing, breaking, smearing, blotting, smudging etc.) with later annotation 因意外而造成無法判讀(如撕裂、破損、磨滅、髒汙等),後來新增者
  1. 背面墨水透出使「法」字部分不清,另一個字則完全不清。Seeping ink renders the character 法 partially illegible and another character completely illegible.
  2. 可以推論第二個字可能被 (hand="1") 塗改為「有」,但又暈墨。Probably the latter character was originally deleted, and the first scribe (hand="1") had added a 有 next to the line, which, however, too became blotted as the ink seeped through, but is still inferable.
  3. 不清的「法」字旁潦草寫了一個「法」。(hand="2") 又另在前次暈墨的「有」下方再寫一個「有」,這必然是在背面抄寫後才發生的,這份手卷發現數次這樣因背面的墨透背後,才進行的補救。 Later someone adds a quickly written 法 next to the partially damaged 法, and a 有 below the damaged first addition inline-right. This probably was someone else (@hand="2") because it must have occurred after the verso text had been written and there are several other cases of clarifying damaged characters elsewhere in the Ms.
  4. 假定這髒污是由背面的墨透過來的,那事情發生的順序應是:先抄寫了正面,而背面又抄寫了其他文稿,結果導致墨暈至背面。後來在讀正面時(的人),又將模糊的字重書在右方。Assuming the blotting is due to ink seeping through the paper the series of events was: someone wrote the text, then something else was written on verso, and the ink seeping through blotted the recto text. A later reader clarifies unclear characters recto with a dry brush.
Ex.1: P-3703-0007:

In the header: <profileDesc> <creation> <listChange> <change xml:id="stage1">The manuscript is written, corrections were made by the scribe.寫卷抄錄時的修正</change><change xml:id="stage2">The verso is written. Ink seeps through blotting some characters.背面抄寫時的墨透背後所汙染者</change><change xml:id="stage3">A later hand clarifies characters that were blotted out.在汙處外再次訂正</change> </listChange> </creation> </profileDesc>

非<unclear>離</unclear>生<damage change="#stage2">法</damage><add change="#stage3" hand="2" place="inline-right">法</add><damage change="#stage2"><del change="#stage1" hand="1" unit="char" extent="1"/><add change="#stage1" place="inline-right" hand="1">有</add></damage><add change="#stage3" hand="2" place="inline-right">有</add>无生龍
Ex.2: P-3703-0010:

一切圡木<damage change="#stage2">瓦</damage><add place="inline-right" change="#stage3">瓦</add>石

  1. The original character (probably 瓦) becomes illegible by ink seeping thorugh. 原來的字(應是「瓦」)被透背的墨所暈。
  2. A later hand clarifies the illegible section and writes 瓦 next to it. 後來的讀者重書「瓦」在右方。
DamageAdd1.png

DamageAdd2.png

Reverse Mark 倒乙符號 (レ-点)
  1. 以萬國碼「雁點」(レ点 U+3191)為倒乙符號。Use Unicode Character 'IDEOGRAPHIC ANNOTATION REVERSE MARK' (U+3191) within <add place="inline-right"> </add>

參見:媒體:敦煌古代的標點符號.pdf 維基百科:訓読

Ex.1: P-3436-0037:

亦出<add place="inline-right">㆑</add>不扵有

Retten1.png
Repetition / Iteration Mark 叠字符號
  1. 以萬國碼「疊字元號」(踊り字 U+303B)為叠字符號(重文)。Use Unicode Character 'VERTICAL IDEOGRAPHIC ITERATION MARK' 〻 (U+303B) .

參見:Iteration marks

重文例3.pngP-3664-0500
Abbreviations 省書符號
  1. 以萬國碼「疊字元號」(踊り字 U+303B)為省書符號。Use <choice>阿〻難〻<expan>阿難阿難</expan></choice>
  2. 所有的 <expan> 標記中已視為「正規化」,故不再有如 <unclear> 等標記。All <expan>sions are understood to be regularized and cannot contain further <unclear> etc.
  3. 省書符號前有行號時的標法:只標省書符號。
Ex 1:P-3664-0511

<choice><abbr>阿〻<reg>難</reg>〻</abbr><expan>阿難阿難</expan></choice>

重文例2.png
Ex 2:P-2634-0010

<choice>卄卄<expan>菩薩</expan></choice>

省書.jpg
Ex 3:S-2054-0325

<lb xml:id="S-2054-0325"/><choice><abbr>〻</abbr><expan>色</expan></choice>

重文例3.jpg

字型工具

  • 最好安裝 Unicode Super-CJK Fonts v6.0