It is currently Fri Nov 27, 2020 8:38 pm
All times are UTC + 8 hours

Puzzled. How should RAID-1 and RAID-5 behave in case of read errors?

Backup and data protection discussion at its finest.

Puzzled. How should RAID-1 and RAID-5 behave in case of read errors?

Postby tech17 » Wed May 31, 2017 1:05 am

Hi! I am using AS3204T NAS, two WD 3Tb Red discs working in RAID-1 (mirroring).

Recently I got plenty (around 100) of reading errors on my Disk 2 reported in the "System Log" when I read some files on the NAS and when I ran "Disk Doctor" scan.
Disk 1 still looks perfect, no errors reported by Disc Doctor. Disk 2 is still part of the RAID array, it was not "dropped". I can see red LED when NAS encounters an error, but when I reboot it, the LED turns green (until an error encountered again).

Before I exchange the Disk 2 with some new one and rebuild the RAID, let me understand something which troubles me very much as for my NAS capabilities and features generally. The thing is, there are currently two files on my raid which I wrote recently which I cannot read (I get some delays when I read them (some sectors look to take time to be read, then finally an I/O error).
I cannot read them even though the Disk 1 has no any error (!). I expected that in the mirroring mode (RAID-1) when NAS encounters a read error, it is able to access another disk to fetch the other copy of the sector data and so no errors on the level of file system will be reported.
I also expected that the NAS will then be able to rewrite the broken sector with a correct data, effectively repairing the broken disk, at least until it allows to rewrite the data, possibly remapping the broken sectors on the level of disk controller.

This is what Linux RAID documentation says: "a read-error will [...] cause md to attempt a recovery by overwriting the bad block. i.e. it will find the correct data from elsewhere, write it over the block that failed, and then try to read it back again." ( https://linux.die.net/man/4/md ). So I expected that Asustor NAS has this behavior, but from what I can see in my particular case this behavior seems not to present.

Any help? Any clarification about what actually may happen in my case and whether Asustor NAS recovers the broken sectors in RAID-1 (or RAID-5 as well where the data may also be recovered on the fly according to Linux documenttion)? I feel really frustrated now being not able to understand what is happening, nor understanding to which extent my data is actually protected on the NAS. Say, I get read error both on disk 1 and at disk 2 (at different sectors). Then all my data factually is still present on both disks, but I just will not be able to recover the RAID as if I remove one of the disks, the other one will have errors and the RAID will not be able to be rebuilt. And... should I really throw a disk away just after one read error? I probably have some misconceptions here. Any help?
tech17
 
Posts: 10
Joined: Wed May 31, 2017 12:40 am

Re: Puzzled. How should RAID-1 and RAID-5 behave in case of read errors?

Postby orion » Wed May 31, 2017 10:41 am

tech17 wrote:The thing is, there are currently two files on my raid which I wrote recently which I cannot read (I get some delays when I read them (some sectors look to take time to be read, then finally an I/O error).
I cannot read them even though the Disk 1 has no any error (!). I expected that in the mirroring mode (RAID-1) when NAS encounters a read error, it is able to access another disk to fetch the other copy of the sector data and so no errors on the level of file system will be reported.

This case should mean that you got read errors on those 2 disks at the same time. If you know how to use putty by ssh session, you can execute "dmesg" command to see if it's the case. However, I'm not sure why you see only one disk error in ADM log. Anyway, "dmesg" should be the trusted log. You may post it here if you can.
User avatar
orion
 
Posts: 2787
Joined: Wed May 29, 2013 11:09 am

Re: Puzzled. How should RAID-1 and RAID-5 behave in case of read errors?

Postby tech17 » Wed May 31, 2017 12:36 pm

orion wrote:This case should mean that you got read errors on those 2 disks at the same time. If you know how to use putty by ssh session, you can execute "dmesg" command to see if it's the case. However, I'm not sure why you see only one disk error in ADM log. Anyway, "dmesg" should be the trusted log. You may post it here if you can.


In fact, I did not realize what the actual problem was until I entered ssh and tried to copy the file from there. The RAID probably did not report mistake, it was rather some other layer, may be SMB or my windows computer which probably got some timeout because the reading of the file used to hang for long time and reported me error.

Anyway, from ssh I was able to copy the file (with many long hangs, but the copying eventually finished). Sorry for misinforming the forum :oops: , there was probably no I/O mistake on the level of file system (above the RAID) at all (but there were, of course, errors while reading the second disk).

Now, I could clearly see with help of 'dmesg' that the RAID indeed redirects to the first disc after encountering an error on the second. :o Thanks for advising that!

Still, I do not understand whether the RAID tries at all to repair broken sectors in the course of regular disk access when encountering a bad sector, as it looks to be documented in 'md' man. I made a number of attempts to copy the same problematic file and I got hangs again and again and dmesg reports mostly the same sectors at each copying, as if they were not recovered. Any help with that?

Also, whatever the answer for the previous question is, would you recommend to use 'scrubbing' ( echo check > ...md/sync_action ) on Asustor NAS to repair bad sectors periodically? Is it safe? I could not find any recommendations regarding that on the forum.

Thank you for the help!

BTW, here is one of my dmesg logs during copying the problematic file one time. I can send others if you need.

[97000.341913] ata2.00: exception Emask 0x0 SAct 0x180 SErr 0x0 action 0x0
[97000.348775] ata2.00: irq_stat 0x40000008
[97000.352888] ata2.00: failed command: READ FPDMA QUEUED
[97000.358194] ata2.00: cmd 60/00:38:a0:0e:dd/08:00:2c:01:00/40 tag 7 ncq 1048576 in
[97000.358194] res 41/40:00:27:12:dd/00:00:2c:01:00/40 Emask 0x409 (media error) <F>
[97000.374471] ata2.00: status: { DRDY ERR }
[97000.378634] ata2.00: error: { UNC }
[97000.384589] ata2.00: configured for UDMA/133
[97000.389139] md/raid1:md1: sdb4: rescheduling sector 5038476960
[97000.395332] ata2: EH complete
[97010.549278] md/raid1:md1: redirecting sector 5038476960 to other mirror: sda4
[97018.798549] ata2.00: exception Emask 0x0 SAct 0x7e00 SErr 0x0 action 0x0
[97018.805541] ata2.00: irq_stat 0x40000008
[97018.809614] ata2.00: failed command: READ FPDMA QUEUED
[97018.814920] ata2.00: cmd 60/40:48:00:08:db/05:00:2c:01:00/40 tag 9 ncq 688128 in
[97018.814920] res 41/40:00:b0:0b:db/00:00:2c:01:00/40 Emask 0x409 (media error) <F>
[97018.831050] ata2.00: status: { DRDY ERR }
[97018.835230] ata2.00: error: { UNC }
[97018.840028] ata2.00: configured for UDMA/133
[97018.844478] md/raid1:md1: sdb4: rescheduling sector 5038344192
[97018.850629] ata2: EH complete
[97025.862340] ata2.00: exception Emask 0x0 SAct 0x3e0000 SErr 0x0 action 0x0
[97025.869480] ata2.00: irq_stat 0x40000008
[97025.873591] ata2.00: failed command: READ FPDMA QUEUED
[97025.878984] ata2.00: cmd 60/60:a8:40:0d:db/01:00:2c:01:00/40 tag 21 ncq 180224 in
[97025.878984] res 41/40:00:80:0d:db/00:00:2c:01:00/40 Emask 0x409 (media error) <F>
[97025.895227] ata2.00: status: { DRDY ERR }
[97025.899372] ata2.00: error: { UNC }
[97025.904107] ata2.00: configured for UDMA/133
[97025.908581] md/raid1:md1: sdb4: rescheduling sector 5038345536
[97025.914648] ata2: EH complete
[97032.926152] ata2.00: exception Emask 0x0 SAct 0x3c000000 SErr 0x0 action 0x0
[97032.933453] ata2.00: irq_stat 0x40000008
[97032.937506] ata2.00: failed command: READ FPDMA QUEUED
[97032.942807] ata2.00: cmd 60/40:d0:a0:0e:db/05:00:2c:01:00/40 tag 26 ncq 688128 in
[97032.942807] res 41/40:00:e7:0e:db/00:00:2c:01:00/40 Emask 0x409 (media error) <F>
[97032.959069] ata2.00: status: { DRDY ERR }
[97032.963229] ata2.00: error: { UNC }
[97032.968019] ata2.00: configured for UDMA/133
[97032.972519] md/raid1:md1: sdb4: rescheduling sector 5038345888
[97032.978666] ata2: EH complete
[97088.661512] md/raid1:md1: redirecting sector 5038344192 to other mirror: sda4
[97125.790644] md/raid1:md1: redirecting sector 5038345536 to other mirror: sda4
[97142.421021] md/raid1:md1: redirecting sector 5038345888 to other mirror: sda4
[97159.360864] ata2.00: exception Emask 0x0 SAct 0x1fe0 SErr 0x0 action 0x0
[97159.367802] ata2.00: irq_stat 0x40000008
[97159.371874] ata2.00: failed command: READ FPDMA QUEUED
[97159.377241] ata2.00: cmd 60/10:28:00:00:c3/07:00:2c:01:00/40 tag 5 ncq 925696 in
[97159.377241] res 41/40:00:30:05:c3/00:00:2c:01:00/40 Emask 0x409 (media error) <F>
[97159.393381] ata2.00: status: { DRDY ERR }
[97159.397549] ata2.00: error: { UNC }
[97159.402396] ata2.00: configured for UDMA/133
[97159.406926] md/raid1:md1: sdb4: rescheduling sector 5036769280
[97159.413037] ata2: EH complete
[97166.424668] ata2.00: exception Emask 0x0 SAct 0x3f8000 SErr 0x0 action 0x0
[97166.431819] ata2.00: irq_stat 0x40000008
[97166.435902] ata2.00: failed command: READ FPDMA QUEUED
[97166.441198] ata2.00: cmd 60/f0:a8:10:07:c3/00:00:2c:01:00/40 tag 21 ncq 122880 in
[97166.441198] res 41/40:00:50:07:c3/00:00:2c:01:00/40 Emask 0x409 (media error) <F>
[97166.457474] ata2.00: status: { DRDY ERR }
[97166.461640] ata2.00: error: { UNC }
[97166.466416] ata2.00: configured for UDMA/133
[97166.470912] md/raid1:md1: sdb4: rescheduling sector 5036771088
[97166.477009] ata2: EH complete
[97224.810322] md/raid1:md1: redirecting sector 5036769280 to other mirror: sda4
[97243.259813] md/raid1:md1: redirecting sector 5036771088 to other mirror: sda4
[97254.197902] ata2.00: exception Emask 0x0 SAct 0x8000 SErr 0x0 action 0x0
[97254.204855] ata2.00: irq_stat 0x40000008
[97254.208945] ata2.00: failed command: READ FPDMA QUEUED
[97254.214283] ata2.00: cmd 60/00:78:a0:96:db/08:00:2c:01:00/40 tag 15 ncq 1048576 in
[97254.214283] res 41/40:00:37:97:db/00:00:2c:01:00/40 Emask 0x409 (media error) <F>
[97254.230628] ata2.00: status: { DRDY ERR }
[97254.234757] ata2.00: error: { UNC }
[97254.239602] ata2.00: configured for UDMA/133
[97254.244092] md/raid1:md1: sdb4: rescheduling sector 5038380704
[97254.250306] ata2: EH complete
[97265.051589] md/raid1:md1: redirecting sector 5038380704 to other mirror: sda4
[97275.415282] ata2.00: exception Emask 0x0 SAct 0x70 SErr 0x0 action 0x0
[97275.422073] ata2.00: irq_stat 0x40000008
[97275.426151] ata2.00: failed command: READ FPDMA QUEUED
[97275.431458] ata2.00: cmd 60/a0:20:00:88:da/06:00:2c:01:00/40 tag 4 ncq 868352 in
[97275.431458] res 41/40:00:00:8a:da/00:00:2c:01:00/40 Emask 0x409 (media error) <F>
[97275.447659] ata2.00: status: { DRDY ERR }
[97275.451820] ata2.00: error: { UNC }
[97275.456598] ata2.00: configured for UDMA/133
[97275.461071] md/raid1:md1: sdb4: rescheduling sector 5038311424
[97275.467217] ata2: EH complete
[97315.434511] md/raid1:md1: redirecting sector 5038311424 to other mirror: sda4
[97326.470166] ata2.00: exception Emask 0x0 SAct 0x3c00 SErr 0x0 action 0x0
[97326.477114] ata2.00: irq_stat 0x40000008
[97326.481227] ata2.00: failed command: READ FPDMA QUEUED
[97326.486587] ata2.00: cmd 60/40:50:a0:8e:dc/05:00:2c:01:00/40 tag 10 ncq 688128 in
[97326.486587] res 41/40:00:d0:90:dc/00:00:2c:01:00/40 Emask 0x409 (media error) <F>
[97326.502940] ata2.00: status: { DRDY ERR }
[97326.507113] ata2.00: error: { UNC }
[97326.511907] ata2.00: configured for UDMA/133
[97326.516396] md/raid1:md1: sdb4: rescheduling sector 5038444192
[97326.522593] ata2: EH complete
[97357.419145] ata2.00: exception Emask 0x0 SAct 0x10 SErr 0x0 action 0x0
[97357.425904] ata2.00: irq_stat 0x40000008
[97357.430012] ata2.00: failed command: READ FPDMA QUEUED
[97357.435335] ata2.00: cmd 60/08:20:a0:92:dc/00:00:2c:01:00/40 tag 4 ncq 4096 in
[97357.435335] res 41/40:00:a0:92:dc/00:00:2c:01:00/40 Emask 0x409 (media error) <F>
[97357.451326] ata2.00: status: { DRDY ERR }
[97357.455517] ata2.00: error: { UNC }
[97357.460322] ata2.00: configured for UDMA/133
[97357.464840] ata2: EH complete
[97357.476086] md/raid1:md1: read error corrected (8 sectors at 5038707360 on sdb4)
[97369.692494] md/raid1:md1: redirecting sector 5038444192 to other mirror: sda4
[97376.968260] ata2.00: exception Emask 0x0 SAct 0x60000000 SErr 0x0 action 0x0
[97376.975564] ata2.00: irq_stat 0x40000008
[97376.979632] ata2.00: failed command: READ FPDMA QUEUED
[97376.984981] ata2.00: cmd 60/00:e8:00:c0:da/08:00:2c:01:00/40 tag 29 ncq 1048576 in
[97376.984981] res 41/40:00:07:c5:da/00:00:2c:01:00/40 Emask 0x409 (media error) <F>
[97377.001340] ata2.00: status: { DRDY ERR }
[97377.005496] ata2.00: error: { UNC }
[97377.010279] ata2.00: configured for UDMA/133
[97377.014739] md/raid1:md1: sdb4: rescheduling sector 5038325760
[97377.020884] ata2: EH complete
[97385.351982] md/raid1:md1: redirecting sector 5038325760 to other mirror: sda4
tech17
 
Posts: 10
Joined: Wed May 31, 2017 12:40 am

Re: Puzzled. How should RAID-1 and RAID-5 behave in case of read errors?

Postby orion » Wed May 31, 2017 2:35 pm

Based on man page of md, "If the md driver detects a write error on a device in a RAID1, RAID4, RAID5, RAID6, or RAID10 array, it immediately disables that device (marking it as faulty) and continues operation on the remaining devices." "a read-error will instead cause md to attempt a recovery by overwriting the bad block. i.e. it will find the correct data from elsewhere, write it over the block that failed, and then try to read it back again. If either the write or the re-read fail, md will treat the error the same way that a write error is treated, and will fail the whole device."

In your case, that means md driver writes good data back to failed disk successfully. I think the write actually goes to disk cache (inside HDD). Then, md driver reads sector data back successfully (that should come from disk cache too). So your "sdb" disk is not kicked off by md driver. After disk cache flushes, you will encounter read error again on the same sector. Disk cache behaves differently on different model. In my case before, md driver kicked off my failed disk directly. In this degrade mode, I can read / write data with full speed without read-error delay.

Anyway, I think your sdb device is bad. You'd better to replace it as soon as possible.
Oh, I won't do 'scrubbing' periodically. It impacts to disk IO performance. And it might be able to extend HDD life cycle with fewer accesses. However, it's only my personal opinion.
User avatar
orion
 
Posts: 2787
Joined: Wed May 29, 2013 11:09 am

Re: Puzzled. How should RAID-1 and RAID-5 behave in case of read errors?

Postby tech17 » Thu Jun 01, 2017 2:47 am

Thank you for the reply. Much helpful!

orion wrote: In your case, that means md driver writes good data back to failed disk successfully. I think the write actually goes to disk cache (inside HDD). Then, md driver reads sector data back successfully (that should come from disk cache too). So your "sdb" disk is not kicked off by md driver. After disk cache flushes, you will encounter read error again on the same sector. Disk cache behaves differently on different model.


I see... But that would mean that the disk firmware does not remap a broken sector but uses it again for a restoring write operation even though it cannot be read physically reliably after that at all?! I.e. the standard story told about disk sectors remapping is that the disk firmware detects bad (unreadable) sectors during regular read operations (or disk scans, if made) and marks them as pending relocation (remapping). As soon as a read operation on this sector succeeds (by any small chance) or a new data is written to the sector, the firmware finally remaps the sector to a new (reserved) space and writes the data there. Looks like in my case no remapping was done at all, at least SMART reports no sectors remapping (while I have more than 100 broken sectors!). So, instead of suspecting Asustor or Linux, should I actually blame WD disk for not remapping bad sectors, but reusing them stubbornly? Can it be a buggy disk firmware or some pre-designed feature of WD disk or a feature of any HD disk of which I do not know?

orion wrote:Anyway, I think your sdb device is bad. You'd better to replace it as soon as possible.


That's true. Too many broken sectors, the disk is probably going to die. But I still have one healthy disk in the RAID-1, my data is backed up also and what I currently really want is to understand and learn what are my actual Asustor NAS capabilities (or WD disk capabilities or whatever) to know what to expect from the future. Currently I have a warranty on my HDs, but I think I can not afford in the future to throw away any disk from the RAID as soon as it gets one broken sector, so I want to learn how this situation is handled or is supposed to be handled. I.e. Linux promises to fix it by rewriting the sector, WD promises to "cooperate" by remapping a broken sector - then why on Hell nothing of that works? Why do I still have my RAID stuck for tens of seconds while reading these broken sectors? I planned to put in my RAID four 8TB disks - for such big disks there is non-zero probability of some sectors to get bad, it looks to me ridiculous to throw away such an expansive disk because of one broken sector. Then a sector on another disk will get broken, then what? I would rather not buy 8TB disks at all in such a case. Or have them mounted just as regular disks and resync them manually emulating a "mirror" RAID mode - still better than having RAID-1 as it turns out... :mrgreen:


orion wrote:Oh, I won't do 'scrubbing' periodically. It impacts to disk IO performance. And it might be able to extend HDD life cycle with fewer accesses. However, it's only my personal opinion.


My concern is what happens if some sectors get broken on both disks in RAID-1. I.e. I may have some file A for which some sector is broken on disk 1 and some file B for which some sector is broken on disk 2. No scrub is done, no files A or B are read so everything looks perfect for a naive me. One day I read the file B, see the log reporting my disk 2 has a bad sector, remove disk 2, insert a new disk and... my RAID will fail to rebuild because of another broken sector on disk 1! What do you see as a reasonable way to avoid such situations?
tech17
 
Posts: 10
Joined: Wed May 31, 2017 12:40 am

Re: Puzzled. How should RAID-1 and RAID-5 behave in case of read errors?

Postby orion » Thu Jun 01, 2017 11:57 am

tech17 wrote:I see... But that would mean that the disk firmware does not remap a broken sector but uses it again for a restoring write operation even though it cannot be read physically reliably after that at all?! I.e. the standard story told about disk sectors remapping is that the disk firmware detects bad (unreadable) sectors during regular read operations (or disk scans, if made) and marks them as pending relocation (remapping). As soon as a read operation on this sector succeeds (by any small chance) or a new data is written to the sector, the firmware finally remaps the sector to a new (reserved) space and writes the data there. Looks like in my case no remapping was done at all, at least SMART reports no sectors remapping (while I have more than 100 broken sectors!). So, instead of suspecting Asustor or Linux, should I actually blame WD disk for not remapping bad sectors, but reusing them stubbornly? Can it be a buggy disk firmware or some pre-designed feature of WD disk or a feature of any HD disk of which I do not know?

Nowadays, I believe HDD will do reallocation if there are bad sectors. After all, dead-on-arrival or production failure will cause them to lose money. However, the reallocation pool should be limited and mechanism / algorithm is variant by vendors and by models. SMART value is actually statistic value. We can only know that it's bad if SMART value is lower then threshold value. How do you think about SMART-reallocation = 120? I guess that, your case, should be caused by running out of reallocation pool. Of course, it could be caused by a buggy disk firmware. It's very hard to prove that. Anyway, returning it is a good way to let them revise any wrong designs (if it's the reason).

tech17 wrote:Or have them mounted just as regular disks and resync them manually emulating a "mirror" RAID mode - still better than having RAID-1 as it turns out... :mrgreen:

Life should be easy. Machine should do "that" thing. I wana enjoy life. :lol:

tech17 wrote:My concern is what happens if some sectors get broken on both disks in RAID-1. I.e. I may have some file A for which some sector is broken on disk 1 and some file B for which some sector is broken on disk 2. No scrub is done, no files A or B are read so everything looks perfect for a naive me. One day I read the file B, see the log reporting my disk 2 has a bad sector, remove disk 2, insert a new disk and... my RAID will fail to rebuild because of another broken sector on disk 1! What do you see as a reasonable way to avoid such situations?

Ha, that's indeed interested case. I don't know if md driver owns this kind of option. Currently, md driver will drop those HDDs when there is a write-error. In the case that you mentioned, the file-B is lost (if disk-1 is dropped by file-A error), although you know you can get file-B back. However, md driver is actually taking care of device level. RAID-1 can only afford one disk (device) failure. If I were you, I'll choose RAID-6 if I'm worried that there could be a 2-disk failure.
Last edited by orion on Fri Jun 02, 2017 10:14 am, edited 1 time in total.
User avatar
orion
 
Posts: 2787
Joined: Wed May 29, 2013 11:09 am

Re: Puzzled. How should RAID-1 and RAID-5 behave in case of read errors?

Postby tech17 » Fri Jun 02, 2017 3:00 am

RAID-6 is supposed to protect me from two disk outages at the same time, but not from the slowly growing number of sector errors.
If there is no way to repair these sectors either automatically or with some scrub utility during the regular RAID lifetime, then one day the data will be lost while disks are healthy or a death of even one disk will render the RAID-6 unable to rebuild. Yes, even a RAID-6. And before I finally lose the data, during a "normal" work my RAID will experience hangs trying to read again and again bad sectors, even though the information is accessible through other disks. That is exactly what I experience now... And the only alternative to that is to replace any disk as soon as it gets one bad sector???

Well, looks like all this RAID business is not as simple as it looks originally. :?

Would much appreciate comments from the Asustor staff on this topic here, if possible.

I'll probably need to contact their support with this issue, but taking into account that the NAS manual recommends to throw away disks with bad sectors... what can I expect other than getting the same recommendation?

I think, for a regular customer who buys NAS of such a consumer class for home usage, it is financially impossible to throw away every disk of 8-10Tb after a single error. And a disk producer would probably not accept a disk with a single error for warranty exchange. What then? Do I have misconceptions about Hard Disks? Are they supposed to live for 3 years without any single error? Even big disks of 10Tb? Even if I have four of them?

And you see, my NAS+WD disk are not willing even to relocate a broken sector, which is perfectly normal operation at least for a desktop computer. Is RAID not a nightmare after I see that?
tech17
 
Posts: 10
Joined: Wed May 31, 2017 12:40 am

Re: Puzzled. How should RAID-1 and RAID-5 behave in case of read errors?

Postby core » Wed Jul 01, 2020 4:36 pm

tech17 wrote:Would much appreciate comments from the Asustor staff on this topic here, if possible.

I'll probably need to contact their support with this issue, but taking into account that the NAS manual recommends to throw away disks with bad sectors... what can I expect other than getting the same recommendation?

I think, for a regular customer who buys NAS of such a consumer class for home usage, it is financially impossible to throw away every disk of 8-10Tb after a single error. And a disk producer would probably not accept a disk with a single error for warranty exchange. What then? Do I have misconceptions about Hard Disks? Are they supposed to live for 3 years without any single error? Even big disks of 10Tb? Even if I have four of them?

And you see, my NAS+WD disk are not willing even to relocate a broken sector, which is perfectly normal operation at least for a desktop computer. Is RAID not a nightmare after I see that?


Good questions. Did you ever find more answers elsewhere?

For my Asustor NAS I bit the bullet and have two spare drives for an eventual failure in my RAID-5. I don't bother with RAID-6 since I maintain an automated backup of the RAID. I don't view a secondary failure during rebuild as a total disaster (yes, I'd have downtime as I start over). I'm counting on periodic bad block scanning to warn me of bad disks ahead of time and my backup to save me from simultaneous 2 drive failure.

I'm probably a bit naive and lucky. I ran my previous Intel Entry Storage System SS4200-E NAS (a.k.a. The Suitcase) for 12 years from 2007 ... with the original four 1 TB Hitachi Drives! It is actually still operational. For basic share drive storage it was great. But it just has no modern features or expandability.

I don't understand why Asustor staff isn't regularly in the forum giving expert answers.
AS6208T + AS6004U
User avatar
core
 
Posts: 19
Joined: Sat May 16, 2020 5:12 am

Re: Puzzled. How should RAID-1 and RAID-5 behave in case of read errors?

Postby Nazar78 » Fri Jul 03, 2020 1:54 am

IIRC the hard disk firmware won't automatically do the relocation on read error giving you the chance to continue reading the sector again. The pending relocation should occur when writing to the sector failed so somehow we need to force write on the failing sectors to kick start the process using i.e. dd or hdparm.

Interestingly though as discussed above, mdadm should have kicked out the 2nd disk marking it as failed.

To the OP, since you already have the backup and if you just want to test, you can manually let mdadm fail the second drive, then write with dd or hdparm on the affected sectors or just do a destructive write on the whole drive using badblocks. When done run smartctl to check the relocation count. You can also try add back the drive to the array while waiting for replacement.
AS5304T - 16GB DDR4 - [40TB N300 RAID10 + 5 Bay USB: 8TB RAID5 & 480GB SSD for Apps]
User avatar
Nazar78
 
Posts: 197
Joined: Wed Jul 17, 2019 10:21 pm
Location: Singapore

Re: Puzzled. How should RAID-1 and RAID-5 behave in case of read errors?

Postby tech17 » Tue Jul 07, 2020 11:03 pm

core wrote:Good questions. Did you ever find more answers elsewhere?


Unfortunately, no. I did not do any progress into studying subtleties of RAIDs, nor did I find any better explanation.

It looks for me that Asustor now supports scrub utility on the level of their GUI for RAID-5 and RAID-6 (looks like not for RAID-1)
For the best of my memory, I probably ran a scrub from the command line for my RAID-1 back 3 years ago with very controversial results.

It's a shame, but I am afraid of using RAID-5 or RAID-6 at least on my Asustor NAS, I do not want to get into troubles of a failure to rebuild the RAID.
On the other side, I now encounter even a problem to rebuild my RAID-1 (I thought at least this must be trivial). Not me alone, here is another guy who failed to rebuild his RAID-1 in the similar circumstances viewtopic.php?f=27&t=9505&p=30594&hilit=larger+disks#p30594 - for me there are some very surprising observations and facts(?) in this post.

My gut feeling is, surprisingly, that all this RAID stuff is non-mature, it may well be that some features work even better for RAID-5 than for RAID-1, and I have no idea whether it's a linux fault or a fault of this particular NAS product customization. But, again, I may be should not drive conclusions having little expertise in this field.
Still, being an end user, I am much surprised that some, supposedly, most trivial things which RAID could have provided are actually a total pain and disaster.
tech17
 
Posts: 10
Joined: Wed May 31, 2017 12:40 am

Return to Backup and Data Protection

  • You cannot post new topics in this forum
    You cannot reply to topics in this forum
    You cannot edit your posts in this forum
    You cannot delete your posts in this forum
    You cannot post attachments in this forum
  • Who is online

    Users browsing this forum: No registered users and 3 guests