敦煌漢文佛教寫卷點校本工作手冊

出自DILA Wiki
於 2016年7月21日 (四) 18:08 由 imported>Putitz 所做的修訂 →‎XML 轉 InDesign 規則

敦煌漢文佛教寫卷點校本工作手冊
Critical Editions of Chinese Buddhist Dunhuang Manuscripts(Markup manual)
Date: 2016-4-14 Author: Zhang Boyong 張伯雍, Marcus Bingenheimer 馬德偉

本計畫為中華佛研所、馬德偉博士(Dr. Marcus Bingenheimer, Temple University)、太史文博士 (顧問)(Dr. Stephen F. Teiser, Princeton University)與大英圖書館國際敦煌專案(International Dunhuang Project, IDP)的合作專案。專案說明詳見計畫書。本計畫網站

寫卷數位化

數位化的寫卷標題範例:lengqieszj-S-4272.xml。S-4272 為敦煌文獻號碼。
所使用的標記規範為 TEI P5

寫卷結構與標記

  • 以敦煌寫卷「件」為單位,即每一個檔案即為一件敦煌寫卷(如 S.4272)
行號標記例 <lb xml:id="S-4272-0001"/>
換紙或頁標記例 <milestone unit="sheet" n="P-3664r-15"/> Milestone.jpg

P-3664-0310

空格標記例 Significant Spaces <space type="honorific" unit="char" extent="1"/> Honorific.jpg

P-3664-0662

<space type="punctuation" unit="char" extent="1"/> 空格.jpg

S-4272-0008

<space type="bindingHole" unit="char" extent="1"/> BindingHole.jpg

P-4646-01-03

<space type="simpleSpace" unit="char" extent="1"/> SimpleSpace.jpg

S-2054-0192

<space type="verseSpacing" unit="char" extent="1"/> VerseSpacing.jpg

P-2634-0002

異寫字標記例 Character Variation <orig reg="偽"><g ref="#S4272-005-11"/></orig>(專案新增) 異寫1.jpg

S-4272-0005

<orig reg="障"><g ref="#A04441-003"/></orig>(教育部異體字字典) 異寫2.jpg

S-4272-0013

<orig type="CJK" reg="猶">由</orig>(教育部異體字字典) 異寫3.jpg

P-3434-0079

破損標記例 Damage <damage unit="char" extent="1"/> 破損例.jpg

P-2460v-0001

<damage>使鬼神</damage> 破損1.jpg

P-3436-0057

<unclear reason="damage">諸</unclear> 破損2.jpg

P-3436-0011

字跡不清標記例 Unclear <unclear>斷</unclear> 字跡不清.jpg

P-3436-0070

<unclear unit="char" extent="1"/> 難辨.jpg

P-3703-0011

取代標記例 Substitutions <subst><del>无</del><add>有</add></subst> 取代1.jpg

S-4272-0005

<subst><del><unclear reason="illegible" unit="char" extent="1"/></del><add>心</add></subst> 取代2.jpg

S-4272-0021

<subst><del><orig reg="薩"><g ref="#A03580-001"/></orig></del><add place="inline-right">提</add></subst> 取代3.jpg

P-3436-0037

<subst><del resp="hand2">然見性</del><add place="inline-right" resp="hand2">明</add></subst> 取代4.jpg

P-3777-0540

廢字例 Deletion Marks

(感謝 汪娟教授來函建議)

者<del>者</del>非 廢字.jpg

P-2460-0068v

<del>清浄</del>解 廢字2.jpg

P-4646-08-04r

插入/補充修改標記例 Additions <add place="inline-right">性</add> 插入.jpg

S-4272-0009

<add place="margin-bottom">軰</add> 修改補充.jpg

P-3436-0056

倒乙標記例 Reverse Mark <orig reg="不出">出<pc resp="hand">㆑</pc>不</orig> 倒乙符.jpg(不出)

P-3436-0037

<lb xml:id="P-3436-0206"/><orig type="CJK" reg="坐"><pc resp="hand">㆑</pc>浄</orig> 行首倒乙.jpg 行首倒乙1.jpg

P-3436-0206

<orig reg="苐二魏朝"><g ref="#A04688-002"/>朝<add place="inline-right"><note resp="hand">向上</note></add>苐二</orig> 倒乙說明.jpg(苐二魏朝)

P-3436-0110

省書例 Abbreviations <choice><orig reg="卄卄"><g ref="#P2634-010-01"/></orig><expan>菩薩</expan></choice> 省書.jpg

P-2634-0010

<choice>阿〻難〻<expan>阿難阿難</expan></choice> 重文例2.png

P-3664-0511

重文例 Repetition Mark <choice>種〻<expan>種種</expan></choice> 重文例3.png

P-3664-0500

雙行夾注例 Notes inline or marginal in the Ms <note resp="hand1" rendition="#inline-para">在舒州一名思空山</note> 雙行夾注.jpg

P-3559-0567

副標例 Subtitles <hi rendition="#subtitle">并序</hi> 副標.jpg

P-2634-0001

專案訂正例 Corrections by project <choice><sic>光濡</sic><corr>先儒</corr></choice><note>見《左傳‧春秋序》。</note>不取 專案訂正例.jpg

P-2634-0038r

<choice><sic>期</sic><corr>斯</corr></choice>(形近而誤) 形近而誤.jpg

P-2640-0112v

<orig reg="妄"><g ref="#A01309-003"/></orig>(音近而誤、假借) 假借.jpg

P-2640-0105v

偈文例 Verse lines <lg><l><choice><orig><g ref="#A02941-036"/></orig><reg>稽</reg></choice>首<choice><orig><g ref="#A03222-001"/></orig><reg>善</reg></choice>知識<space type="verseSpacing" unit="char" extent="1"/><damage><choice><orig type="Ext-A">䏻</orig><reg>能</reg></choice>令<choice><orig><g ref="#P2634-002-08"/></orig><reg>護</reg></choice></damage>本心</l></lg> 偈文.jpg

P-2634-0002r

衍字例 Extra characters <surplus><add place="inline-right">元</add></surplus> 衍字.jpg

P-3664-0332

進階說明──文字迻錄原則

  • 原則一、不論原文使用何種字體(楷書、行書、草書等),皆迻錄為楷書(楷化)。
  • 原則二、Unicode 有提供字型者,按原字形迻錄。如:㘴,不改成為教育部標準字體(正字)「坐」。
  • 原則三、因書寫造成的差異,以教育部異體字典收錄為準。如「工」教育部異體字典有「空」「A02955-004.png」的差別;但「差」「江」無此區別,故亦不區分。
  • 原則四、因文獻破損及字,若不妨礙判讀,則不加標記。雖有影響,但仍可識別(或藉其他本寫本可識別)者,以「達」字為例,標記如:<damaged>達</damaged>,在本計畫出版品作。若不能辨識則作<unclear reason="illegible" unit="char" extent="1"/>,extent="1"指一個字,在本計畫出版品作▯。若文獻缺損,則作<damage unit="char" extent="3"/>,extent="3"指缺損部分在他本有3個字,在本計畫出版品作▯▯▯。If damaged but legible , if damaged and illegible ▯.
  • 原則五、文獻為後世讀者所做之句讀,標記如<pc resp="hand2">.</pc>、<del resp="hand2"><pc resp="hand2">.</pc></del>。Old punctuation in the Ms: <pc resp="hand2">.</pc> <del resp="hand2"><pc resp="hand2">.</pc></del>
  • 原則六、缺字編碼,教育部異體字典有錄者,從之;教育部異體字典未收,全字庫有錄者,從全字庫;未收於前二者,以敦煌文獻編號-行號-字序編碼,如 P3436-023-02。本計畫內之缺字若僅出現一次,且未被教育部異體字典、全字庫收錄者,編者斟酌以<reg>達</reg>(以「達」字為例)或其他缺字替代。若有特殊情形會加註解。<orig reg="詩"><g ref="#S-10484-01-09"/></orig>, only if there are more than one. Otherwise it is calligraphic -> <reg>
  • 原則七、寫卷內容之更正(<corr>)標記,只會在孤本或所有寫本均錯誤的情形下,才會使用。若讀者可以藉由他本校出文字的脫漏、贅衍、錯誤,則不作更正。
Non-Unicode Variants - attested 萬國碼未收之異體字──已確認
  1. 萬國碼未收之字形。The variant character is not in Unicode.
  2. 教育部異體字字典有收錄者。It is attested in the "Dictionary of Chinese Character Variants" 教育部異體字字典 (Ministry of Education, RoC, 2012). Current Query Interface: http://dict2.variants.moe.edu.tw/variants/.
  3. 教育部異體字字典對該字的編號會記錄在<g>@ref標記中。 <g>@ref points to a header item which references the character number of the variant in the MoE Dictionary.
  4. 能夠以所對應的正字表達出來。 It can be represented by a semantically equivalent common character (通用字).
Ex.1: S-4272-0002:

為除忘相<orig reg="修"><g ref="#A03335-004"/></orig>行六度

OrigRegChoice.png
Non-Unicode Variants - unattested 萬國碼未收之異體字──未確認(專案新增)
  1. 萬國碼與教育部異體字字典均未收錄。The character is neither in Unicode nor in the MoE Dictionary.
  2. 但字形結構上能夠分辨者。 Use this only for characters where the stroke count is clearly legible.
Ex.1: S-4272-0022:

度眾生過去<orig reg="逢"><g ref="#S4272-022-14"/></orig>无量恒

Reg1.png
"Unclear" Characters 模糊字
  1. <unclear> 是一個較鬆散的解釋,此類字多受到摹寫字跡以及古代字形的影響。<unclear> is much open to interpretation. It is influenced strongly by the quality of the facsimile and the level of paleographic skills.
  2. 標記此類文字時通常需借助其他版本的文獻,而不能由文本直接辨認出來。We use it in this project when the character and its stroke structure are not recognizable on their own, but only by comparing with other versions.
  3. 所有的<unclear>都能理解為某個正字,但與<reg>不同的是<unclear>文字結構模糊,而<reg>的文字結構清晰。All <unclear> are understood as 通用字, this form of regularization differes from <reg>, however, because with <unclear> the intended variant is unknown. With <reg> the shape/stroke structure of the variant character is seen.
Ex.1: P-3703-0002:

無有邊<unclear>畔坐</unclear>

Unclear1.png
Significant spaces 文中的空格
  1. 另起一段或徵引文獻時。Intentional, significant space before new sections (Ex. 1) or quotations (Ex.1).
  2. 文獻末尾空格不標記。No <space> needed at end of a Ms folio.
Ex.1: S-4272-0008 - S-4272-0010:

為中道<space unit="char" extent="2"/>苐三齊朝 人年十四遇達摩禪師 真登佛果<space unit="char" extent="1"/>楞伽経云

Space1.png
Character(s) added in the Ms. 插入字
  1. 文中有人插入文字。Character(s) added by a scribe in the Ms.
  2. 大致描述插入字位置。@place gives rough description where to find it.
Ex.1: S-4272-0009:

禪師俗<add place="inline-right">性</add>姖武窂人

Add1.png
Character(s) Overwrite other Character(s): 覆蓋字

被覆蓋的字若不清楚則使用,清楚則轉錄出來,不確定則使用<unclear>。If the overwritten character is illegible use , if legible give character, if unsure use <unclear>.

Ex.1: S-4272-0021:

為是知眾生識<subst><del unit="char" extent="1"/><add>心</add></subst>自度

Subst1.png
Damaged but recognizable characters 破損字

<damage>與<unclear>近似,標記中直接使用正字。(範例中的字也可以識別為「忕」或「𢗗」,此處依另一版本。)<damage> is similar to <unclear> in that the text provided should be considered 通用字 as the variant can not be distinguished clearly.

Ex.1: P-3703-0001 :

時<damage>狀</damage>𠰥

Damage1.png
Unrecognizable characters due to accidental damage (tearing, breaking, smearing, blotting, smudging etc.) with later annotation 因意外而造成無法判讀(如撕裂、破損、磨滅、髒汙等),後來新增者
  1. 背面墨水透出使「法」字部分不清,另一個字則完全不清。Seeping ink renders the character 法 partially illegible and another character completely illegible.
  2. 可以推論第二個字可能被 (resp="hand1") 塗改為「有」,但又暈墨。Probably the latter character was originally deleted, and the first scribe (resp="hand1") had added a 有 next to the line, which, however, too became blotted as the ink seeped through, but is still inferable.
  3. 不清的「法」字旁潦草寫了一個「法」。(resp="hand2") 又另在前次暈墨的「有」下方再寫一個「有」,這必然是在背面抄寫後才發生的,這份手卷發現數次這樣因背面的墨透背後,才進行的補救。 Later someone adds a quickly written 法 next to the partially damaged 法, and a 有 below the damaged first addition inline-right. This probably was someone else (@resp="hand2") because it must have occurred after the verso text had been written and there are several other cases of clarifying damaged characters elsewhere in the Ms.
  4. 假定這髒污是由背面的墨透過來的,那事情發生的順序應是:先抄寫了正面,而背面又抄寫了其他文稿,結果導致墨暈至背面。後來在讀正面時(的人),又將模糊的字重書在右方。Assuming the blotting is due to ink seeping through the paper the series of events was: someone wrote the text, then something else was written on verso, and the ink seeping through blotted the recto text. A later reader clarifies unclear characters recto with a dry brush.
Ex.1: P-3703-0007:

In the header: <profileDesc> <creation> <listChange> <change xml:id="stage1">The manuscript is written, corrections were made by the scribe.寫卷抄錄時的修正</change><change xml:id="stage2">The verso is written. Ink seeps through blotting some characters.背面抄寫時的墨透背後所汙染者</change><change xml:id="stage3">A later hand clarifies characters that were blotted out.在汙處外再次訂正</change> </listChange> </creation> </profileDesc>

非<unclear>離</unclear>生<damage change="#stage2">法</damage><add change="#stage3" resp="hand2" place="inline-right">法</add><damage change="#stage2"><del change="#stage1" resp="hand1" unit="char" extent="1"/><add change="#stage1" place="inline-right" resp="hand1">有</add></damage><add change="#stage3" resp="hand2" place="inline-right">有</add>无生龍
Ex.2: P-3703-0010:

一切圡木<damage change="#stage2">瓦</damage><add place="inline-right" change="#stage3">瓦</add>石

  1. The original character (probably 瓦) becomes illegible by ink seeping thorugh. 原來的字(應是「瓦」)被透背的墨所暈。
  2. A later hand clarifies the illegible section and writes 瓦 next to it. 後來的讀者重書「瓦」在右方。
DamageAdd1.png

DamageAdd2.png

Reverse Mark 倒乙符號 (レ-点)
  1. 以萬國碼「雁點」(レ点 U+3191)為倒乙符號。Use Unicode Character 'IDEOGRAPHIC ANNOTATION REVERSE MARK' (U+3191) within <add place="inline-right"> </add>

參見:媒體:敦煌古代的標點符號.pdf 維基百科:訓読

Ex.1: P-3436-0037:

亦出<add place="inline-right">㆑</add>不扵有

Retten1.png
Repetition / Iteration Mark 叠字符號
  1. 以萬國碼「疊字元號」(踊り字 U+303B)為叠字符號(重文)。Use Unicode Character 'VERTICAL IDEOGRAPHIC ITERATION MARK' 〻 (U+303B) .

參見:Iteration marks

重文例3.pngP-3664-0500
Abbreviations 省書符號
  1. 以萬國碼「疊字元號」(踊り字 U+303B)為省書符號。Use <choice>阿〻難〻<expan>阿難阿難</expan></choice>
  2. 所有的 <expan> 標記中已視為「正規化」,故不再有如 <unclear> 等標記。All <expan>sions are understood to be regularized and cannot contain further <unclear> etc.
  3. 省書符號前有行號時的標法:只標省書符號。
Ex 1:P-3664-0511

<choice><abbr>阿〻<reg>難</reg>〻</abbr><expan>阿難阿難</expan></choice>

重文例2.png
Ex 2:P-2634-0010

<choice>卄卄<expan>菩薩</expan></choice>

省書.jpg
Ex 3:S-2054-0325

<lb xml:id="S-2054-0325"/><choice><abbr>〻</abbr><expan>色</expan></choice>

重文例3.jpg

字型工具

  • 最好安裝 Unicode Super-CJK Fonts v6.0

出版品表達原則

本計畫最終將所完成的標記文本出版成書,內容分為兩部分:A. 數位文字摹本,格式、用字俱儘量接近原始文獻;B. 標準字體標注本,用字均改現今標準字體,並加新式標點及注釋。
但為符合美觀及適讀性,訂出以下原則:

  • 抄寫者的刪除、修改記號會保留在摹本中;若過程較複雜,則選擇最後一次所修改的樣貌。而標準字體標注本僅提供最終所要傳達的文字內容。In cases where a scribe corrects his own work we show the original and the correction in the diplomatic version, but give only final version, the one intended by the scribe, in the regularized output.
  • 文獻中若出現二次修改、標讀 (如 P.3664-l.580、l.619),仍將在摹本中出現。標準字體標注本不會出現二次修改的記錄。In case where a second hand has made corrections to the text (e.g. P.3664, l.580, l.619), we show such interventions in the diplomatic edition, but give only the text of the original scribe in the regularized aligned version. We do not therefore present a genetic edition of the text.
  • 判斷為原文或二次修改大致依前後文、墨色等要素。At time it is unclear whether a change was done by the original scribe or a later redactor. Here we have to decide from the context. Do we see the same ink? Are there many other changes by a later redactor in that Ms.?
  • 文獻上的標讀僅呈現於摹本中 (如 P.3664, 或 P.3777-l.506 等),標準字體標注本僅呈現本計畫的新式標點。Punctuation: Our own punctuation is added to the regularized version. In cases where a second hand has added punctuation this is shown in the diplomatic edition (e.g. P.3664, or P.3777, l.506 ff) but not in the regularized version.
  • 若同時出現不同的讀號,則不加區分。In the diplomatic transcription the difference between the two hands is not expressed, we add a mark wherever either the red or the white hand made it.

XML 轉 InDesign 規則

說明 Disambiguation

Print1 = 數位文字摹本 Diplomatic Transcription

Print2 = 標準字體標注本 Normalized Transcription


Damage

1. 因文獻本身的破損而不能閱讀者 Characters are missing completely due to paper damage.

TEI: <damage unit="char" extent="2"/>

Print 1+2: 皆使用空心方格表示 In such cases both transcriptions use a place-holder (□). (轉換成<span>□□</span>)


2. 文字雖破損但不妨礙閱讀者 Character is partly damaged, but legible on its own.

TEI: <damage>字</damage>

Since the exact form of the glyph cannot be ascertained, in such cases both transcriptions give the regularized character.

Print 1: 以 reg 字元樣式表示 (轉換為<span rend="reg">字</span>)

Print 2: 以 expan 字元樣式表示 (轉換為<span rend="expan">字</span>)

3. 文字雖破損但可藉其他文獻辨識者 Due to paper damage, a character is not legible on its own, but can be ascertained by other witnesses

TEI: <unclear reason="damage">字</unclear>

Print1: 以空心方格表示 Use the place-holder (□). (轉換成<span>□</span>)

Print2: 以 reg 字元樣式表示 Use the regularized transcription, given in <unclear>. (轉換為<span rend="reg">字</span>)


另外兩種<unclear> Two other uses of <unclear>

4. 字跡不清需藉助其他文獻辨識者 Characters that are written illegibly/unclearly, so they can be ascertained only by other witnesses

TEI: <unclear>字</unclear>

Print 1: 以問號表示 Use the question mark (?). (轉換成<span>?</span>)

Print 2: 以 expan 字元樣式表示文字 Regularized (轉換為<span rend="expan">字</span>)


5. 字跡不清亦無法藉助其他文獻辨識者 Characters that are written unclearly, and cannot be ascertained only by other witnesses (e.g. because there are no other witnesses)

TEI: <unclear unit="char" extent="1"/>

Print 1+2: 皆使用問號 Use the question mark (?). (轉換成<span>?</span>)


標點符號 Punctuation Characters/Marks <pc>

1. 本專案所加的標點 Our punctuation (No @resp on <pc>)

TEI: <pc>,</pc>

Print1: 隱藏,不呈現

Print2: 呈現


2. 寫本中的倒乙符號 Reverse mark ㆑in the Ms

TEI: <orig reg="念中">中<pc resp="hand">㆑</pc><orig type="Ext-D" reg="念">𫝹</orig></orig>

Print1: 比照寫本的排列方式,倒乙符號套用倒乙符號字元樣式 Show the reverse mark (轉換為<span rend="pc-reverse">㆑</span>) and the characters in the order and form as they appear in the Ms.

Print2: 以標準字體現示正確的排列 Display the characters in the correct order and in normalized form.


3. 寫卷中的句讀符號 Punctuation in the Ms

<pc resp="handPunct">.</pc>

Print 1: 以小紅點表示 Little red dots (轉換為<span rend="dot">.</span>)

Print 2: 不呈現


4. 寫卷中刪除句讀符號 "Deleted" Punctuation in the Ms.

<del resp="hand"><pc resp="handPunct">.</pc></del>

Print 1: 以 ø 表示 Crossed out red dot. (轉換為<span rend="del-dot">ø</span>)

Print 2: 隱藏,不呈現


逕用標準字體 Normalizations

所書寫的字形雖可辨識,但其結構不足以提供造字的情形下,以<reg>標記,並逕以標準字體表達。In cases where the character is relatively clearly visible, but the writing is not clear enough to identify what variant is used or allow to describe its shape in terms of radicals.

TEI: <reg>字</reg>

Print 1: 套用 reg 字元樣式 Use reg version. (轉換為<span rend="reg">字</span>)

Print 2: 以一般字元顯示 Use normalized version. (轉換為<span>字</span>)

(How does this differ from a non-Unicode character? No, it can't here.)

reg 在 choice 下

(I don't think there are any //choice/reg left.)

衍字 surplus

懷疑是卷中的雜寫,Print 1+2 俱不顯示。

分紙或分頁 milestone

<milestone unit="sheet" n="P-2460v-01"/>

Print 1: 以 ‖ 轉 90 度表示。

Print 2: 以 ‖ 表示。