🤯 When hardware problems become mental (damn Raspino)
By OctoSpacc
Caution
The content of this page has been entirely machine-translated into English, from Italiano. Therefore, it might contain any kind of errors.
Until now 2 months ago, my kingdom of Rasperino was at its maximum splendor: the Misskey instance, set up just 2 weeks earlier, < strong>it was going great, and now (almost) everything seemed destined to continue well...
And instead, problems arose. Let's say it took me a little while to notice, because they developed in a strangely gradual way.
The initial cracks
I noticed the first truly strange thing towards the beginning of December, when I realized that the system could crash by trying to perform a very banal operation but < strong>specification: create a large archive of files (compressed or not)... with any program.
This small inconvenience has, in turn, caused a secondary problem... I'll get there.
However, I didn't pay too much attention to it. How could I? The rest, if left untouched, worked, apart from some slight performance degradation due to Misskey's own work.
The first collapse
But then, those other 2 weeks of relative peace passed, and I wake up with the server crashed, and which dies badly after any of my manual restarts (unplugging and replugging the power supply, it's the only way). After 2 days of very messy searching I didn't have I absolutely understood what the general cause of the problem was, but only the most serious symptom, and by now I was almost convinced that in some mystical way > Misskey alone managed to take down the entire server, which instead went back to working properly without that particular software running. Well, there was some logic in my reasoning, given that in any case the average use of CPU and RAM was high (even if it didn't completely saturate).
In the following days, however, with a few tests I discovered that the server was not crashing due to the microblogging server, but due to what it acts as a database< /strong>: PostgreSQL (in Docker). If I ran Misskey on my PC, but let it connect to the database on the Raspino, within a few seconds, with the arrival of so many notes, the fruity server died.
By now, in any case, the need to install something else was clear to me, because I was convinced that Misskey was too heavy, and never mind.
For 2 days I tried Epicyon, a platform that was particular to say the least... and the experience was not exactly pleasant, but I think was complete, given that I squeezed out four thousand words in my dedicated article. Immediately afterwards I decided to give a try to another software that I had never seen before, namely GoToSocial. With the latter, despite it being declared alpha quality (and in fact it has some problems), I found myself - because alas now it's all over... I'm getting there, I'm getting there - very good, but that's not the point .
Increasingly suspicious problems
Just a few days later, those strange crashes started appearing again, but this time they were decidedly suspicious, because the general use of resources of the system was low. I tried to read the system logs in a productive way, but my patience had now reached the limit, and with it my lucidity , so every day I looked for the slightest suspicious but legible error, fixating on that and completely ignoring the illegible error that was always in front of me.
By now, just out of desperation, but not because I understood through reasoning that that was the problem, I decide to change the microSD card, and Now that I've done it, I bitterly regret... not having tried before! That was the problem, reckless Maremma!
The good thing is that the day before I had done a check of the file systems (ext4), both of the card and of my USB HDD, and everything had come out (approximately) clean, so I had excluded hardware problems a priori: "if the files are not corrupt..." I thought.
At about the same time (fate decided that help had to arrive late!), however, a person gave me a hand to understand what the hell those indecipherable lines, which were something like, were saying. ..
Dec 27 06:32:35 kernel: [27230.964650] INFO: task kworker/2:0:21874 blocked for more than 860 seconds.
Dec 27 06:32:35 kernel: [27230.964693] Tainted: G C 5.15.76-v7+ #1597
Dec 27 06:32:35 kernel: [27230.964709] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 27 06:32:35 kernel: [27230.964723] task:kworker/2:0 state:D stack: 0 pid:21874 ppid: 2 flags:0x00000000
Dec 27 06:32:35 kernel: [27230.964760] Workqueue: events_freezable mmc_rescan
Dec 27 06:32:35 kernel: [27230.964801] Backtrace:
Dec 27 06:32:35 kernel: [27230.964824] [<80a4ff38>] (__schedule) from [<80a50a7c>] (schedule+0x7c/0x134)
Dec 27 06:32:35 kernel: [27230.964868] r10:81f90800 r9:ffffe000 r8:00000000 r7:00000000 r6:60000013 r5:8d368000
Dec 27 06:32:35 kernel: [27230.964884] r4:ffffe000
Dec 27 06:32:35 kernel: [27230.964896] [<80a50a00>] (schedule) from [<8083f658>] (__mmc_claim_host+0xe0/0x238)
Dec 27 06:32:35 kernel: [27230.964929] r5:81f90a18 r4:00000002
Dec 27 06:32:35 kernel: [27230.964942] [<8083f578>] (__mmc_claim_host) from [<8083f7e8>] (mmc_get_card+0x38/0x3c)
Dec 27 06:32:35 kernel: [27230.964979] r10:baaf8205 r9:00000000 r8:baaf8200 r7:00000080 r6:baaf4b80 r5:00000000
Dec 27 06:32:35 kernel: [27230.964994] r4:81f91800
Dec 27 06:32:35 kernel: [27230.965007] [<8083f7b0>] (mmc_get_card) from [<80849238>] (mmc_sd_detect+0x24/0x7c)
Dec 27 06:32:35 kernel: [27230.965039] r5:81f90800 r4:81f90800
Dec 27 06:32:35 kernel: [27230.965052] [<80849214>] (mmc_sd_detect) from [<80841ca4>] (mmc_rescan+0xac/0x2d4)
Dec 27 06:32:35 kernel: [27230.965083] r5:81f90800 r4:81f90a7c
Dec 27 06:32:35 kernel: [27230.965096] [<80841bf8>] (mmc_rescan) from [<8013e158>] (process_one_work+0x250/0x57c)
Dec 27 06:32:35 kernel: [27230.965140] r9:00000000 r8:baaf8200 r7:00000080 r6:baaf4b80 r5:8e898f00 r4:81f90a7c
Dec 27 06:32:35 kernel: [27230.965153] [<8013df08>] (process_one_work) from [<8013e4e4>] (worker_thread+0x60/0x5c4)
Dec 27 06:32:35 kernel: [27230.965195] r10:baaf4b80 r9:81003d00 r8:baaf4b98 r7:00000008 r6:baaf4b80 r5:8e898f18
Dec 27 06:32:35 kernel: [27230.965210] r4:8e898f00
Dec 27 06:32:35 kernel: [27230.965223] [<8013e484>] (worker_thread) from [<80146804>] (kthread+0x178/0x194)
Dec 27 06:32:35 kernel: [27230.965264] r10:837c4000 r9:8d3a7e74 r8:00000000 r7:8e898f00 r6:8013e484 r5:8285ee00
Dec 27 06:32:35 kernel: [27230.965279] r4:8d0d3640
Dec 27 06:32:35 kernel: [27230.965291] [<8014668c>] (kthread) from [<801000d4>] (ret_from_fork+0x14/0x20)
Dec 27 06:32:35 kernel: [27230.965321] Exception stack(0x837c5fb0 to 0x837c5ff8)
Dec 27 06:32:35 kernel: [27230.965341] 5fa0: 00000000 00000000 00000000 00000000
Dec 27 06:32:35 kernel: [27230.965363] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Dec 27 06:32:35 kernel: [27230.965383] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Dec 27 06:32:35 kernel: [27230.965405] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:8014668c
Dec 27 06:32:35 kernel: [27230.965420] r4:8285ee00
Every time an error like this happened, the whole system died very badly: illness to the little bots, death strong> to the HTTP server (nginx), injuries to my article and feed aggregators (wallabag and FreshRSS), bye forever to anything that allows me to open a console via Internet on Rasperino (SSH, Telnet, and even a server set up with netcat). The only thing that kept working is the constant spitting of this exact type of error in the log file.
Now, I know that I'm strong, but with all these strange numbers in the way I absolutely couldn't see words like mmc_get_card code> or
mmc_sd_detect
! And so I really didn't understand that maybe, just maybe, the microSD cagona that I had chosen for Raspi at the beginning of September (among those free at home), when I put this poor little computer back to work as a server, it was tending to die.
I don't want to have to resort to commonplaces, but this time there's little to be done! I mean, the photo speaks for itself:
The presence of a renowned brand is not a guarantee of quality, but the absence of a brand is certainly a promise of absent quality.
Although on the PC the old junk card still seems to work - I was able to confirm this because at least I was able to make a data dump - I don't want to no longer have to deal withstuff like this! I note it in my mind as badly, therefore.
Even more time was then wasted in flashing the dump onto a new card, given that the only other two cards I had available at the time were 4 and 32 GB respectively, and I really wanted to put (after deleting various logs and caches, because the previous memory was 8 GB) everything on the 4 GB one, but nothing could be done; and in the end 32 GB was.
The peace violated
The important thing is that, I put the new SD in the raspberry server, those terrifying errors no longer occurred, and the big problems they disappeared... or at least so I thought, I wanted, I hoped.
If this article, which should have literally been published at the end of last year, is only being published now, there are reasons. Immediately after I had changed the SD card, I preferred to wait a few days, to see if things had really calmed down, and to avoid claiming victory too soon. I did well!
The suffering disk
Alas, in fact, those other things seen in the logs in the past few days were not huge holes in the water (still troubled), in particular the errors that I immediately recognized related to the USB disk.
This is something that already happened to me in the past with another USB adapter for 2.5" SATA disks, even on different machines (in the period in which I used my Nintendo Switch console as a server...), but with this one that I use now there had never been any problems. And yet now, as far as I can see, it disconnects from the host randomly, doing dieall those processes that depend on the files that are on that disk, as if suddenly there were moments in which any combination of SATA adapters and USB cables does not receive enough power (both short and long), the disk still works great on PC, so the problem is clearly the Raspino >... but go and understand why!
They tell me that the Raspi's USB-A ports suck by nature[citation required (?)], but the point is which until recently worked (all 4)! Has a diode broken in my power supply? That on the board of this damned single board computer, a capacitor has blown up? That the electricity in my house is no longer 230V, but 229V, and therefore the transformer instead of giving 5 volts in output gives 4.98? ...But what do I know.
Returning to the real world, the only sensible hypothesis seems to me to be this: by inserting and disconnecting the power connector in its port (micro USB-B 2.0, that big shit!!!) , the pins on one side or the pads on the other will have worn out, so their contact surface is smaller, so the electrical resistance is greater, and therefore the device is powered with a slightly lower voltage, and when a peripheral needs to absorb a lot, here are the patatracs.
To try to resolve
Not having another Raspone like it, and not having other 5V 3A power supplies, I will never discover the truth, but the solution in somehow I have to find it by force.
After waiting so long that the server problems have only become bigger, and the downtime much more frequent, I decide to buy a USB-A-Y cable. At worst, even if you haven't solved the problem, it's always convenient to have a cable of this type because - despite infringing USB standards 1- some devices cause a lot of trouble without it, and some manufacturers of shitty peripherals even recommend using cables of this type in case of problems (and however, they proceed to not include one in the package, indecent!).
Once the cable arrives, I arrange all the connections and I notice something particular: the current that comes from the second USB power supply to power the disk, can strong> go up the cable until it re-enters the Pi. The problem is not so much the cable, which works and respects all the laws of physics (even if not those of the USB standard), but more the fact that the Raspberry doesn't even have, I don't know, diodes in the USB-A ports. And it's a problem that I'm not discovering, just read on the official forum . In any case, to have a circuit set up like this:
- There are no risks for the instrumentation or the surrounding environment if you use proper power supplies upstream, and mine should be2;
- Practical problems there are, but also solutions and arrangements: I could, as they suggest on the forum, apply insulating tape on the +5V pad of the USB connector that goes at Raspantino; but for now there has been no real need, the only thing I have to pay attention to is that things are powered in this order, those few times when I find myself having to do a hard reset of the system:
- USB disk (connected to the Y cable port);
- Raspi (from its power port);
- After waiting at least ~10 seconds, disk connected to the Raspberry (data connector of the Y cable connected to the Raspberry).
I don't know why, especially considering that it is not needed for soft reboots, but without this procedure the boot can fail.
Finally, rest
In the end, however, hell seems to be over, and the server is now working.< br>
The flames did some damage, however: the databases of many of my hosted services became corrupt, and of 2 in particular (GoToSocial, which I mentioned before, and Peka, a chatbot based on a Markov chain) I have backups that are too old (from weeks ago) because, with the server dying, my backup scripts never managed to work... and therefore these programs are still offline now, because I haven't yet had the strength to resign myself to restoring the ancient backups.
But I buy the cable a little earlier, and turn off the server while waiting, not really, eh?
Hoping that things like this won't happen again in the short future, otherwise I will go completely and irrecoverably crazy because of these damned hardware problems, I greet you and wish you never have to damn as much as I do. 😔
🏷️ Notes and References
-
It was a surprise to me too, but the USB standard prohibits Y cables: see Update 72; translated into English,
The use of a "Y" cable (a cable with two A connectors) is prohibited on any USB device. If a USB peripheral requires more power than that allowed by the USB specification for which it was designed, it must be self-powered.
Well, how nice are the rules, but then comes reality and the think a little differently. The whole real world uses Y cables without getting too much fuss. ↩ -
(Both 5V)
- For the Pi, a 3A power supply (just above the suggested by Raspberry Foundation) which was included in a kit (excluding computer) of accessories for the Raspante, by Aukru. Oh well, after years it didn't explode, then the reviews were good anyway, and still this brand sells new power supplies, so it's fine...
- For additional power, a 1A block that was included in the package of my old low-end Huawei phone (also marketed in Europe), from 2017.