




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、存儲堆棧中的數據損壞問題分析Lakshmi N. Bairavasundarambairavasundaram Lakshmi N., Garth R. Goodson古德森,加思R., Bianca Schroeder,比安卡施羅德Andrea C. Arpaci-Dusseau安德列C. arpaci杜索, Remzi H. Arpaci-Dusseau該arpaci杜索,H.University of Wisconsin-Madison威斯康星大學-麥迪遜Network Appliance, Inc.網絡設備公司University of Toronto多倫多大學laksh, dusse
2、au, , garth.goodson, 拉克,杜索,該 ,garth.goodson,Abstract摘要An important threat to reliable storage of data is silent對數據可靠存儲的一個重要威脅是無聲的data corruption. In order to develop suitable protection數據腐敗。為了開發合適的保護mechanisms against data corruption
3、, it is essential to understand its characteristics. In this paper, we present the對數據腐敗的機制,它是必不可少的,以了解其特點。在本文中,我們提出了first large-scale study of data corruption. We analyze corruption instances recorded in production storage systems第一次大規模數據腐敗研究。我們分析記錄在生產存儲系統的腐敗現象containing a total of 1.53 million disk
4、 drives, over a period of 41 months. We study three classes of corruption:包含1530000個磁盤驅動器,超過41個月的時間。我們研究了三類腐敗:checksum mismatches, identity discrepancies, and parity inconsistencies. We focus on checksum mismatches校驗和不匹配,身份的差異,和奇偶校驗不一致。我們專注于校驗和不匹配since they occur the most.因為他們最。We find more than 400
5、,000 instances of checksum我們發現校驗和400000多個實例mismatches over the 41-month period. We find many41個月內不匹配。我們發現很多interesting trends among these instances including: (i)有趣的趨勢,在這些情況下,包括:(我)nearline disks (and their adapters) develop checksum近線盤(和適配器)開發的校驗mismatches an order of magnitude more often than ente
6、rprise class disk drives, (ii) checksum mismatches within錯位的幅度往往比企業級磁盤驅動器的順序,(ii)在校驗和不匹配the same disk are not independent events and they show同一個磁盤不是獨立的事件,它們顯示high spatial and temporal locality, and (iii) checksum高的時間和空間局部性,及(iii)校驗mismatches across different disks in the same storage在同一存儲的不同磁盤上的不匹配
7、system are not independent. We use our observations to系統不是獨立的。我們用我們的意見derive lessons for corruption-proof system design.從中吸取教訓,以防腐敗體系設計。1 Introduction1引言One of the biggest challenges in designing storage systems is providing the reliability and availability that users在設計存儲系統的最大挑戰之一是提供的可靠性和可用性,用戶expe
8、ct. Once their data is stored, users expect it to be persistent forever, and perpetually available. Unfortunately,期待。一旦他們的數據存儲,用戶期望它會持續永遠,永遠有效。不幸的是,in practice there are a number of problems that, if not在實踐中有許多問題,如果不dealt with, can cause data loss in storage systems.處理,可引起存儲系統中的數據丟失。One primary caus
9、e of data loss is disk drive unreliability 16. It is well-known that hard drives are mechanical, moving devices that can suffer from mechanical problems leading to drive failure and data loss. For數據丟失的一個主要原因是磁盤驅動器的可靠性 16 。眾所周知,硬盤是機械的,移動的設備,可以承受機械故障導致的故障和數據丟失。對于example, media imperfections, and loose
10、 particles causing scratches, contribute to media errors, referred to as例如,媒體的不完善,以及松散的顆粒造成的劃傷,有助于媒體的錯誤,簡稱為latent sector errors, within disk drives 18. Latent sector潛在的部門錯誤,在磁盤驅動器 18 。潛在部門errors are detected by a drives internal error-correcting錯誤被檢測到驅動器的內部錯誤校正codes (ECC) and are reported to the sto
11、rage system.碼(ECC)和報告存儲系統。Less well-known, however, is that current hard drives然而,眾所周知,目前的硬盤驅動器and controllers consist of hundreds-of-thousands of lines和控制器由數百條線組成of low-level firmware code. This firmware code, along低級別固件代碼。這個固件代碼,一起with higher-level system software, has the potential for使用更高級別的系統軟件
12、,具有潛在的harboring bugs that can cause a more insidious type of窩藏錯誤,可以導致更陰險的類型disk error silent data corruption, where the data is磁盤錯誤:數據是錯誤的,數據是錯誤的silently corrupted with no indication from the drive that無聲的損壞,沒有任何跡象表明,從驅動器an error has occurred.發生錯誤。Silent data corruptions could lead to data loss more
13、 often than latent sector errors, since, unlike latent sector errors, they cannot be detected or repaired by the disk drive靜默數據損壞可能會導致數據丟失的往往比潛在扇區錯誤,因為,不像潛在扇區錯誤,他們無法檢測或修復的磁盤驅動器itself. Detecting and recovering from data corruption requires protection techniques beyond those provided by本身。檢測和恢復數據損壞需要保
14、護技術,超越了那些提供the disk drive. In fact, basic protection schemes such as磁盤驅動器。事實上,基本的保護計劃,如RAID 13 may also be unable to detect these problems.襲擊 13 可能也無法檢測到這些問題。The most common technique used in storage systems存儲系統中最常用的技術to detect data corruption is for the storage system to add檢測數據腐敗,是為存儲系統添加its own h
15、igher-level checksum for each disk block, which自己的上級校驗每個磁盤塊,這is validated on each disk block read. There is a long history of enterprise-class storage systems, including ours,在每個磁盤塊上進行驗證。企業級存儲系統有很長的歷史,包括我們的,in using checksums in a variety of manners to detect data在以各種方式使用校驗和檢測數據corruption 3, 6, 8, 2
16、2. However, as we discuss later,腐敗 3,6,8,22 。然而,我們稍后再討論,checksums do not protect against all forms of corruption.校驗和不保護反對一切形式的腐敗。Therefore, in addition to checksums, our storage system因此,除了校驗和,我們的存儲系統also uses file system-level disk block identity information to detect previously undetectable corrup
17、tions.使用文件系統級的磁盤塊的身份信息來檢測從未發現的腐敗。In order to further improve on techniques to handle為了進一步提高處理技術corruption, we need to develop a thorough understanding腐敗,我們需要深入了解of data corruption characteristics. While recent studies數據腐敗特征。而最近的研究provide information on whole disk failures 11, 14, 16提供整個磁盤故障的信息 11,14
18、,16 and latent sector errors 2 that can aid system designers和潛在部門的錯誤 2 ,可以幫助系統設計師in handling these error conditions, very little is known在處理這些錯誤的情況下,很少是已知的about data corruption, its prevalence and its characteristics. This paper presents a large-scale study of silent關于數據腐敗,其患病率及其特點。本文提出了一種大規模的研究,沉默d
19、ata corruption based on field data from 1.53 million disk基于1530000盤數據的數據腐敗drives covering a time period of 41 months. We use the開蓋的時間期限為41個月。我們使用same data set as the one used in recent studies of latent在最近的研究中使用的相同的數據集sector errors 2 and disk failures 11. We identify the扇區錯誤 2 和磁盤故障 11 。我們確定fraction
20、 of disks that develop corruption, examine factors that might affect the prevalence of corruption, such發展腐敗的磁盤組,檢查可能影響腐敗盛行的因素,例如as disk class and age, and study characteristics of corruption, such as spatial and temporal locality. To the best of作為磁盤類和年齡,研究腐敗的特征,如空間和時間的地方。到最好的our knowledge, this is t
21、he first study of silent data corruption in production and development systems.我們的知識,這是第一次在生產和發展系統中的無聲數據腐敗的研究。We classify data corruption into three categories based我們將數據分類為三類on how it is discovered: checksum mismatches, identity discrepancies, and parity incons它是如何發現:校驗和不匹配,身份的差異,和奇偶incons(描述in det
22、ail in Section 2.3). We focus on checksum mismatches since they are found to occur the most. Our important observations include the following:在2.3節中詳細介紹。我們專注于校驗和不匹配是因為他們發現發生的最。我們的重要意見包括以下內容:(i) During the 41-month time period, we observe more(一)在41個月的時間內,我們觀察到更多than 400, 000 instances of checksum mi
23、smatches, 8% of400,校驗和不匹配的000個實例,8%which were discovered during RAID reconstruction, creating the possibility of real data loss. Even though the在空襲重建過程中發現的,創造了真實數據丟失的可能性。即使是rate of corruption is small, the discovery of checksum腐敗率小,校驗和發現mismatches during reconstruction illustrates that data在重建過程中的不匹
24、配說明了數據corruption is a real problem that needs to be taken into腐敗是一個需要被納入的現實問題account by storage system designers.由存儲系統設計的帳戶。(ii) We find that nearline (SATA) disks and their adapters(ii)發現近線(SATA)磁盤和適配器develop checksum mismatches an order of magnitude開發一個量級的校驗和不匹配more often than enterprise class (FC
25、) disks. Surprisingly,比企業級(足球)磁盤更經常。令人驚訝的,enterprise class disks with checksum mismatches develop more of them than nearline disks with mismatches.校驗和不匹配的企業級磁盤的發展超過了近線盤錯位。(iii) Checksum mismatches are not independent occurrences both within a disk and within different disks in(iii)校驗和不匹配的不獨立在磁盤和在不同的
26、磁盤上the same storage system.同一存儲系統。(iv) Checksum mismatches have tremendous spatial locality; on disks with multiple mismatches, it is often consecutive blocks that are affected.(四)校驗和不匹配,有巨大的空間位置;對多錯配盤,它往往是連續的數據塊的影響。(v) Identity discrepancies and parity inconsistencies do(五)身份差異和平價不一致occur, but affe
27、ct 3 to 10 times fewer disks than checksum發生,但影響3到10倍比較少的磁盤校驗mismatches affect.錯配影響。The rest of the paper is structured as follows. Section 2本文其余部分的結構如下。第2節presents the overall architecture of the storage systems介紹存儲系統的總體架構used for the study and Section 3 discusses the methodology used. Section 4 pr
28、esents the results of our analysis of checksum mismatches, and Section 5 presents the用于研究和3節討論所使用的方法。4節介紹了我國的校驗和不匹配的分析結果,和5節介紹了results for identity discrepancies, and parity inconsistencies. Section 6 provides an anecdotal discussion of corruption, developing insights for corruption-proof storage結果的
29、身份差異,奇偶性不一致。第6節提供了一個軼事的腐敗問題,發展的見解,腐敗證據存儲system design. Section 7 presents related work and Section 8 provides a summary of the paper.系統設計。第7節介紹了有關工作和8節提供了一個總結的文件。2 Storage System Architecture2存儲系統架構The data we analyze is from tens-of-thousands of production and development Network Appliance我們分析的數據來自
30、于成千上萬的生產和開發網絡設備TMTMstorage保管部systems (henceforth called the system) installed at hundreds of customer sites. This section describes the architecture of the system, its corruption detection mechanisms, and the classes of corruptions in our study.系統(此后稱為系統)安裝在數百個客戶網站。本節描述了該系統的體系結構,其腐敗的檢測機制,并在研究腐敗類。2.1
31、 Storage Stack2.1存儲棧Physically, the system is composed of a storagecontroller that contains the CPU, memory, network interfaces, and storage adapters. The storage-controller物理上,該系統由包含CPU,內存,一個storagecontroller網絡接口,存儲適配器。存儲控制器is connected to a set of disk shelves via Fibre Channel通過光纖通道連接到一組磁盤架上loops
32、. The disk shelves house individual disk drives.循環。磁盤架上的單個磁盤驅動器。The disks may either be enterprise class FC disk drives磁盤可以是企業級的磁盤驅動器or nearline serial ATA (SATA) disks. Nearline drives或近線串行ATA(SATA)硬盤。近線驅動器use hardware adapters to convert the SATA interface to使用硬件適配器轉換為SATA接口the Fibre Channel proto
33、col. Thus, the storage-controller光纖通道協議。因此,存儲控制器views all drives as being Fibre Channel (however, for視圖所有驅動器作為光纖通道(然而,對于the purposes of the study, we can still identify whether這項研究的目的,我們仍然可以確定是否a drive is SATA and FC using its model type).硬盤是SATA和FC利用其模型類型)。The software stack on the storage-controll
34、er is composed of the WAFL在存儲控制器的軟件堆棧組成的細胞凋亡RRfile system, RAID, and storage文件系統,突襲和存儲layers. The file system processes client requests by issuing read and write operations to the RAID layer, which層。該文件系統處理客戶端請求,通過發布讀寫操作來處理層transforms the file system requests into logical disk block將文件系統請求轉換為邏輯磁盤塊re
35、quests and issues them to the storage layer. The RAID請求并將它們發布到存儲層。空襲layer also generates parity for writes and reconstructs層也產生奇偶校驗寫入和重構data after failures. The storage layer is a set of customized device drivers that communicate with physical故障后的數據。存儲層是一組定制的設備驅動程序,與物理通信disks using the SCSI command
36、set 23.使用SCSI命令集 23盤。2.2 Corruption Detection Mechanisms2.2腐敗檢測機制The system, like other commercial storage systems, is與其他商業存儲系統,該系統是designed to handle a wide range of disk-related errors.設計用于處理磁盤相關的廣泛錯誤。The data integrity checks in place are designed to detect and recover from corruption errors so t
37、hat they are數據完整性檢查的目的是為了檢測和恢復從腐敗的錯誤,使他們not propagated to the user. The system does not knowingly propagate corrupt data to the user under any circumstance.不傳播給用戶。在任何情況下,該系統不向用戶傳播腐敗數據。We focus on techniques used to detect silent data corruption, that is, corruptions not detected by the disk drive我們專
38、注于用來檢測靜默數據損壞,這是技術,通過硬盤檢測不到腐敗or any other hardware component. Therefore, we do not或任何其他硬件組件。因此,我們不describe techniques used for other errors, such as transport corruptions reported as SCSI transport errors or latent sector errors. Latent sector errors are caused by描述用于其他錯誤的技術,如運輸損壞報告為SCSI傳輸錯誤或潛在扇區錯誤。潛
39、在的部門錯誤造成的physical problems within the disk drive, such as media磁盤驅動器內的物理問題,如媒體scratches, “high-fly” writes, etc. 2, 18, and detected by劃痕,“高飛”寫等 2,18 ,并檢測the disk drive itself by its inability to read or write sectors, or through its error-correction codes (ECC).磁盤驅動器本身的讀寫扇區的無能,或通過其糾錯碼(ECC)。In order
40、 to detect silent data corruptions, the system為了檢測沉默的數據損壞,系統stores extra information to disk blocks. It also periodically reads all disk blocks to perform data integrity存儲額外信息到磁盤塊。它還定期讀取所有磁盤塊來執行數據完整性checks. We now describe these techniques in detail.支票。我們現在詳細描述這些技術。Corruption Class Possible Causes D
41、etection Mechanism Detection Operation腐敗類可能導致檢測機制的檢測操作Checksum mismatch Bit-level corruption; torn write; RAID block checksum Any disk read校驗和錯配位腐敗;撕開寫;RAID塊校驗磁盤讀misdirected write錯誤的寫Identity discrepancy Lost or misdirected write File system-level block identity File system read身份差異丟失或誤導寫文件系統級的文件系統讀
42、取塊身份Parity inconsistency Memory corruption; lost write; RAID parity mismatch Data scrub奇偶性不一致的內存損壞;丟失的寫;校驗失配數據擦洗bad parity calculation差平價計算Table 1: Corruption classes summary.表1:腐敗類總結。(a) Format for enterprise class disks(一)企業級磁盤的格式520 520 520 520 520 520520 520 520 520 520 5204 KB4 KB文件系統數據塊520 520
43、520 52064byte Data64字節數據Integrity Segment完整性段(b) Format for nearline disks(b)為近線磁盤格式4 KB File system data block4 KB的文件系統數據塊512 512 512 512 512 512 512 512 512512 512 512 512 512 512 512 512 512448 bytes unused448字節未使用64byte Data64字節數據Integrity Segment +完整性段+(c) Structure of the data integrity segmen
44、t (DIS)()數據完整性分部(解散)的結構.。Checksum of data block數據塊校驗Identity of data block數據塊身份.Checksum of DIS校驗和DISFigure 1: Data Integrity Segment. The figure shows the圖1:數據完整性段。圖顯示different on-disk formats used to store the data integrity segment of a disk block on (a) enterprise class drives with 520B用于存儲磁盤塊的數
45、據完整性段光盤格式的不同(一)與企業級硬盤520Bsectors, and on (b) nearline drives with 512B sectors. The figure also shows (c) the structure of the data integrity segment.部門,和(b)近線驅動器512B扇區。圖還顯示了數據完整性段的結構。In particular, in addition to the checksum and identity information, this structure also contains a checksum of itse
46、lf.特別是,除了校驗和身份信息,該結構還包含一個校驗本身。2.2.1 Data Integrity Segment2.2.1數據完整段In order to detect disk block corruptions, the system為了檢測磁盤塊的損壞,系統writes a 64-byte data integrity segment along with each一個64字節的數據完整段以及每個disk block. Figure 1 shows two techniques for storing磁盤塊。圖1顯示了存儲的技術this extra information, and
47、also describes its structure.這個額外的信息,也描述了它的結構。For enterprise class disks, the system uses 520-byte sectors. Thus, a 4-KB file system block is stored along with對于企業級磁盤,該系統使用520字節扇區。因此,一個4KB的文件系統的塊存儲在64 bytes of data integrity segment in eight 520-byte sectors. For nearline disks, the system uses the
48、default 512-byte sectors and store the data integrity segment for each八字節的數據完整性分部在520個64字節扇區。對于近線盤,系統將使用默認的512字節扇區存儲數據完整性的一段set of eight sectors in the following sector. We find that在下列部門設置八個部門。我們發現the protection offered by the data integrity segment is數據完整性段所提供的保護well-worth the extra space needed t
49、o store them.很值得的額外空間來存儲它們。One component of the data integrity segment is a數據完整性段的一個組成部分是checksum of the entire 4 KB file system block. The整個4 KB的文件系統的塊校驗。這個checksum is validated by the RAID layer whenever the校驗和是由RAID層驗證時data is read. Once a corruption has been detected, the數據讀取。一旦發現了腐敗,original bl
50、ock can usually be restored through RAID reconstruction. We refer to corruptions detected by RAIDlevel checksum validation as checksum mismatches.原始的塊通常可以通過空襲重建恢復。我們指的raidlevel檢測校驗和驗證作為校驗和錯配的腐敗。A second component of the data integrity segment is數據完整性段的另一個組成部分是block identity information. In this case
51、, the fact that the塊身份信息。在這種情況下,事實上,file system is part of the storage system is utilized. The文件系統是利用存儲系統的一部分。這個identity is the disk blocks identity within the file system身份是文件系統中的磁盤塊的標識(e.g., this block belongs to inode 5 at offset 100). This(例如,這一塊屬于inode 5偏移100)。這identity is cross-checked at file
52、 read time to ensure that在文件讀取時間時,要確保交叉檢查,以確保the block being read belongs to the file being accessed.被讀取的塊屬于被訪問的文件。If, on file read, the identity does not match, the data is如果,在文件讀取時,身份不匹配,數據是reconstructed from parity. We refer to corruptions that從奇偶校驗。我們指的是腐敗,are not detected by checksums, but dete
53、cted through file沒有檢測到通過校驗,但檢測到文件system identity validation as identity discrepancies.身份差異的系統身份驗證。2.2.2 Data Scrubbing2.2.2數據清理In order to pro-actively detect errors, the RAID layer periodically scrubs all disks. A data scrub issues read operations for each physical disk block, computes a checksum o
54、ver its data, and compares the computed checksum to the checksum located in its data integrity segment. If the checksum comparison fails (i.e., a checksum為了積極檢測錯誤,定期擦洗所有磁盤的RAID層。數據清洗問題讀操作的每個物理磁盤塊,計算校驗和的數據,并比較計算的校驗和校驗和位于其完整的數據段。如果校驗和比較失敗(即,一個校驗和mismatch), the data is reconstructed from other disks in
55、不匹配),數據從其他磁盤重建the RAID group, after those checksums are also verified.的RAID組,經過校驗和驗證。If no reconstruction is necessary, the parity of the data如果沒有重建是必要的,數據的奇偶性blocks is generated and compared with the parity stored塊生成并與奇偶存儲in the parity block. If the parity does not match the verified data, the scru
56、b process fixes the parity by regenerating it from the data blocks. In a system protected by在奇偶校驗塊。如果奇偶校驗不匹配的驗證數據,擦洗過程修復的奇偶性,通過再生的數據塊。在受保護的系統中double parity, it is possible to definitively tell which of雙奇偶校驗,可以明確地告訴它the parity or data block is corrupt.奇偶或數據塊被損壞。We refer to these cases of mismatch between data and我們指的是這些情況下,數據之間的不匹配parity as parity inconsistencies. Note that data scrubs奇偶校驗不一致。注意,數據服are unable to validate the extra file system identity information stored in the data integrity segment, since, by its
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 攝影項目合同協議
- 社保購買合同協議
- 英文撤銷合同協議
- 石油運銷合同協議
- 運輸加盟合同協議
- 水務供水合同協議
- 童工雇傭合同協議
- 退款解除合同協議
- 投資地皮合同協議
- 小學數學人教版(2024)一年級2025年口算加法教案及反思
- 店長勞務合同協議
- 2024年地理中考模擬考試地理(江蘇泰州卷)(A4考試版)
- 乳腺癌診治指南與規范(2025年版)解讀
- 2024年上海嘉定區區屬國有企業招聘真題
- 2025河北建投水務招聘29人易考易錯模擬試題(共500題)試卷后附參考答案
- 常德輔警考試題庫
- 基于核心素養的初中歷史跨學科教學策略研究
- 有理數的加法說課課件2024-2025學年人教版數學七年級上冊
- 肺癌化療護理查房
- 2025年04月中共北京市大興區委政法委員會公開招聘臨時輔助用工4人筆試歷年典型考題(歷年真題考點)解題思路附帶答案詳解
- GB/T 18655-2025車輛、船和內燃機無線電騷擾特性用于保護車載接收機的限值和測量方法
評論
0/150
提交評論