A Rack of One's Own: Difference between revisions
No edit summary |
No edit summary |
||
| (5 intermediate revisions by the same user not shown) | |||
| Line 3: | Line 3: | ||
i've never owned "enterprise-class" hardware, nor even really worked with it much. i'm of the Google school: buy COTS by lots, count on it breaking, and work around the failures. xeons and (more recently) epycs never seemed price-competitive (though i seriously considered the former for my [[Schwarzger%C3%A4t|2016 workstation build]]), especially given their reduced clocks. as amd's threadripper emerged, they weren't even holding it down on core count—only recently has sapphire rapids matched my 3970x's dotriacontacore setup (13th generation intel core processors topped out at 24 cores), and they still can't match that marvelous processor's LLC sizes. it just didn't make sense. the available motherboards furthermore always seemed a few years behind with regards to USB. if you didn't intend to take advantage of those fat support deals, the only value proposition seemed to be support for massive amounts of (ECC!) memory, and the server processors' heftier memory controllers (threadripper's cap at four memory controllers was the major reason why i went with a 3970X rather than the 64-core 3990X—how is one supposed to keep all those cores fed with only four DDR4 controllers?). | i've never owned "enterprise-class" hardware, nor even really worked with it much. i'm of the Google school: buy COTS by lots, count on it breaking, and work around the failures. xeons and (more recently) epycs never seemed price-competitive (though i seriously considered the former for my [[Schwarzger%C3%A4t|2016 workstation build]]), especially given their reduced clocks. as amd's threadripper emerged, they weren't even holding it down on core count—only recently has sapphire rapids matched my 3970x's dotriacontacore setup (13th generation intel core processors topped out at 24 cores), and they still can't match that marvelous processor's LLC sizes. it just didn't make sense. the available motherboards furthermore always seemed a few years behind with regards to USB. if you didn't intend to take advantage of those fat support deals, the only value proposition seemed to be support for massive amounts of (ECC!) memory, and the server processors' heftier memory controllers (threadripper's cap at four memory controllers was the major reason why i went with a 3970X rather than the 64-core 3990X—how is one supposed to keep all those cores fed with only four DDR4 controllers?). | ||
[[File:Racksolo.jpg]] | [[File:Racksolo.jpg|right|thumb]] | ||
with the advent of 2023, however, my storage situation was becoming untenable. i've already pushed the workstation to its limits; it might be a monstrous CaseLabs Magnum T10, but it's still hardly suitable for more than 24 3.5" drives, and the lack of physical hotswap support for half those drives was really beginning to give me a rash. seagate exos drives are outstanding from a performance and price perspective, but they're <b>loud</b> little fuckers, and especially during a zfs scrub or resilver they were pretty annoying in chorus (watercooling has otherwise silenced my workstation). with another hot atlanta summer in the post, i didn't look forward to keeping all 24 drives below the 60℃ mark in this configuration. furthermore, my current dayjob project benefits greatly from [[DDIO]], present on xeons of the second and later generations (essentially, DDIO allows PCIe devices to stream directly into a directly-connected socket's LLC, bypassing DRAM entirely). i loathe lacking local access to hardware i'm using at work almost as much as i do actually visiting the office; call me old-fashioned, but i want to be able to get my hands on the pieces slinging my bits around, and more importantly to be able to experiment with system firmware settings and topologies. i additionally needed at least three 10G+ nodes in my local network, and two [[100GbE|100G+]] nodes. | with the advent of 2023, however, my storage situation was becoming untenable. i've already pushed the workstation to its limits; it might be a monstrous CaseLabs Magnum T10, but it's still hardly suitable for more than 24 3.5" drives, and the lack of physical hotswap support for half those drives was really beginning to give me a rash. seagate exos drives are outstanding from a performance and price perspective, but they're <b>loud</b> little fuckers, and especially during a zfs scrub or resilver they were pretty annoying in chorus (watercooling has otherwise silenced my workstation). with another hot atlanta summer in the post, i didn't look forward to keeping all 24 drives below the 60℃ mark in this configuration. furthermore, my current dayjob project benefits greatly from [[DDIO]], present on xeons of the second and later generations (essentially, DDIO allows PCIe devices to stream directly into a directly-connected socket's LLC, bypassing DRAM entirely). i loathe lacking local access to hardware i'm using at work almost as much as i do actually visiting the office; call me old-fashioned, but i want to be able to get my hands on the pieces slinging my bits around, and more importantly to be able to experiment with system firmware settings and topologies. i additionally needed at least three 10G+ nodes in my local network, and two [[100GbE|100G+]] nodes. | ||
| Line 15: | Line 15: | ||
i'd hoped not to require watercooling, but with that dream dashed i set to work. the idea of in-case radiators was laughable, and anyway why have a rack if you're not gonna stuff it with shit? for a moment i considered immersion cooling, but due to other projects i already had two [[MO-RA3]] 420x420mm external radiators sitting around unused, whereas i had neither knowledge of nor hardware for immersion. there aren't very many LGA2011-compatible waterblocks; i was hoping for HEATKILLER IVs, but neither PerformancePCs nor TitanRig seem to be actually stocking parts as of late, so i settled for Alphacool Eisblock XPX Pro Aurora waterblocks at $50 a throw. as it turned out, i hardly needed the capability of the HEATKILLERS (my processors are rated at a mere 105W), so this was a win cashwise, and the blocks boast lovely ARGB. i went to install the XPXs, only to discover that the squarish LGA2011 socket has a rectangular cousin, the "LGA2011-Narrow", which despite its physical incompatibility apparently didn't rate a new socket designation. i raged skyward for a few moments, and set to designing [https://github.com/dankamongmen/openscad-models/blob/master/lga2011-narrow.scad custom LGA2011-Narrow mounts] in OpenSCAD. i printed them up on my SLA Elegoo Saturn, and—miracle of miracles!—they worked on the first try. that's the power of actually measuring things instead of just trying to eyeball millimeter distances, i guess! i then suffered the first of three major leaks, all of them due to shoddy manufacturing on some no-name G¼ compression fittings. no matter where in my loop i employed them, no matter how solid the connections looked to the eye, these pieces of shit lived to spray neon green coolant all over my electronics, my rugs, and myself. i was stupidly ready to give them one more try (see above regarding cheap bastardhood), when i noticed they didn't even have fucking o-rings. hell, even Challenger had goddamn o-rings...until it didn't, anyway. into the trash they went, and a week later i had a phat sack of chungal (chungusy? chungy? chungesque perhaps?) Koolance Blacks, which have served without fail since installation. | i'd hoped not to require watercooling, but with that dream dashed i set to work. the idea of in-case radiators was laughable, and anyway why have a rack if you're not gonna stuff it with shit? for a moment i considered immersion cooling, but due to other projects i already had two [[MO-RA3]] 420x420mm external radiators sitting around unused, whereas i had neither knowledge of nor hardware for immersion. there aren't very many LGA2011-compatible waterblocks; i was hoping for HEATKILLER IVs, but neither PerformancePCs nor TitanRig seem to be actually stocking parts as of late, so i settled for Alphacool Eisblock XPX Pro Aurora waterblocks at $50 a throw. as it turned out, i hardly needed the capability of the HEATKILLERS (my processors are rated at a mere 105W), so this was a win cashwise, and the blocks boast lovely ARGB. i went to install the XPXs, only to discover that the squarish LGA2011 socket has a rectangular cousin, the "LGA2011-Narrow", which despite its physical incompatibility apparently didn't rate a new socket designation. i raged skyward for a few moments, and set to designing [https://github.com/dankamongmen/openscad-models/blob/master/lga2011-narrow.scad custom LGA2011-Narrow mounts] in OpenSCAD. i printed them up on my SLA Elegoo Saturn, and—miracle of miracles!—they worked on the first try. that's the power of actually measuring things instead of just trying to eyeball millimeter distances, i guess! i then suffered the first of three major leaks, all of them due to shoddy manufacturing on some no-name G¼ compression fittings. no matter where in my loop i employed them, no matter how solid the connections looked to the eye, these pieces of shit lived to spray neon green coolant all over my electronics, my rugs, and myself. i was stupidly ready to give them one more try (see above regarding cheap bastardhood), when i noticed they didn't even have fucking o-rings. hell, even Challenger had goddamn o-rings...until it didn't, anyway. into the trash they went, and a week later i had a phat sack of chungal (chungusy? chungy? chungesque perhaps?) Koolance Blacks, which have served without fail since installation. | ||
[[File:Penguinrack.jpg|every rack needs a penguin!]] | [[File:Penguinrack.jpg|thumb|every rack needs a penguin!]] | ||
9x Arctic P14s on the MO-RA at 1200rpm plus an EKWB Dual XTOP with 2x D5s keep my ~3L of coolant barely suprambient, moving fluid at about 2.2Lpm at 4Krpm of pumping, with the four Xeons below 30 at idle, all of it silent. w00t! i'm using an ESP8266-based control, similar to my existing [[InaMORAta|inaMORAta]] solution (but slightly improved). three sets of Koolance Black QD4 quick disconnects, an XCPC filter, and an EKWB Quantum Torque drainage fitting ensure loop maintainability across almost six meters of EKWB ZMT soft tubing. distilled water plus three bottles of EKWB Acid Green Cyrofuel concentrate and a Singularity Computing Protium reservoir complete the successful cooling story. i've drilled two ¾" (technically 20mm) holes through the roof through which the ZMT protrudes, though i'm likely to replace that steel with glass or Lexan so that the internals are visible (i've got a CODI6 lighting up the Aurora LEDs, but it's currently disabled, which is unfortunate as they look fucking awesome). | 9x Arctic P14s on the MO-RA at 1200rpm plus an EKWB Dual XTOP with 2x D5s keep my ~3L of coolant barely suprambient, moving fluid at about 2.2Lpm at 4Krpm of pumping, with the four Xeons below 30 at idle, all of it silent. w00t! i'm using an ESP8266-based control, similar to my existing [[InaMORAta|inaMORAta]] solution (but slightly improved). three sets of Koolance Black QD4 quick disconnects, an XCPC filter, and an EKWB Quantum Torque drainage fitting ensure loop maintainability across almost six meters of EKWB ZMT soft tubing. distilled water plus three bottles of EKWB Acid Green Cyrofuel concentrate and a Singularity Computing Protium reservoir complete the successful cooling story. i've drilled two ¾" (technically 20mm) holes through the roof through which the ZMT protrudes, though i'm likely to replace that steel with glass or Lexan so that the internals are visible (i've got a CODI6 lighting up the Aurora LEDs, but it's currently disabled, which is unfortunate as they look fucking awesome). | ||
| Line 24: | Line 24: | ||
the beast at this point will still consume less than a kilowatt at full juice. i have no idea why the chassis ships with 1620W supplies, unless perhaps SAS drives consume ridiculous amounts of power? which, speaking of storage... | the beast at this point will still consume less than a kilowatt at full juice. i have no idea why the chassis ships with 1620W supplies, unless perhaps SAS drives consume ridiculous amounts of power? which, speaking of storage... | ||
[[File:Rackpower.png|thumb|Grafana time series for various power draws]] | |||
the original point of this hoss was to get hard drives out of my workstation, and it was time they went. sixteen 18TB exosen were removed from use, and joined eight more virgin 18TBs. the result is 3x octadrive raid6zs each boasting 144 total, 108 usable, for a total of 324TB accessible at any given time. at full jam, this is about another 200W of mixed 5V+12V (see "[[Schwarzger%C3%A4t_III#The_drive_problem|The Drive Problem]]" for more details), and generates the system's only noise (though that'll change once the K80s arrive). my office is now several degrees cooler, a happy change i thoroughly welcome. | the original point of this hoss was to get hard drives out of my workstation, and it was time they went. sixteen 18TB exosen were removed from use, and joined eight more virgin 18TBs. the result is 3x octadrive raid6zs each boasting 144 total, 108 usable, for a total of 324TB accessible at any given time. at full jam, this is about another 200W of mixed 5V+12V (see "[[Schwarzger%C3%A4t_III#The_drive_problem|The Drive Problem]]" for more details), and generates the system's only noise (though that'll change once the K80s arrive). my office is now several degrees cooler, a happy change i thoroughly welcome. | ||
| Line 31: | Line 33: | ||
so, that's the story of my [[Strangelet|living room supercomputer]]. 96 watercooled Xeon cores, 9984 CUDA cores, 2.3TiB DDR3, 2.8TB NVMe SSD, 324TB SATA3 rust, 280Gb/s networking on a mix of fiber and copper, and 4x 80PLUS Platinum 1280W PSUs, drawing a maximum of ~1200W. the entire rack (machine, lighting, cooling, and all externals) consumes under 600W most of the time: | so, that's the story of my [[Strangelet|living room supercomputer]]. 96 watercooled Xeon cores, 9984 CUDA cores, 2.3TiB DDR3, 2.8TB NVMe SSD, 324TB SATA3 rust, 280Gb/s networking on a mix of fiber and copper, and 4x 80PLUS Platinum 1280W PSUs, drawing a maximum of ~1200W. the entire rack (machine, lighting, cooling, and all externals) consumes under 600W most of the time: | ||
do you think this is some kind of game, son? | |||
'''update 2023-03-14''': it has been made clear to me that this board supports v3 and v4 Xeons, contrary to the documentation (which claims support only up through v2 when using MEM 1.01 boards, as i am). well, excellent. time to get 4x v4 xeons. | |||
'''previously: "[[Transfiguration|transfiguration]]" 2023-02-11''' | '''previously: "[[Transfiguration|transfiguration]]" 2023-02-11''' | ||
[[Category:Blog]] | [[Category:Blog]] | ||