Retr0id
11 hours ago
So what's the overall hashrate with this approach?
I'll try to calculate it from the information given. 12 parallel instances at a clock speed of 62.5MHz, with 68 clock cycles per hash.
62.5MHz * 12 / 68 = ~11MH/s
That seems... slow? Did I do the math right? How big of an FPGA do you need before this would compete with a GPU, and how much would it cost?
For reference, an RTX 4090 can do 21975.5 MH/s according to hashcat benchmarks.
picture
11 hours ago
Quite slow. It's largely due to the author using FPGAs wrong. Clocking down a 7-series Artix to 62.5 MHz means the design is not pipelined correctly/enough. My friend got 1 SHA256 hash per cycle at 300 MHz on 7 series, but slightly fewer of the design fit on a chip. Thruput would easily be in the GH/s range.
Keep in mind RTX4090 is 5 nm process node and has a lot more transistors and memory than XC7A100T, which is 28 nm. That's a huge difference in terms of dynamic performance. Also, the two are also released 10 years apart. If you compare RTX4090 against a similarly modern UltraScale part from Xilinx, I believe the FPGA can be notably faster than RTX4090.
benlivengood
11 hours ago
I'm assuming this space has already been heavily optimized by the Bitcoin miners on their way to ASICs.
15155
10 hours ago
Yes, but a designed-for-FPGA SHA256 implementation looks very different than an ASIC SHA256 implementation - the ASIC has far greater routing flexibility and density, and can therefore use far more combinatorial logic between register stages.
(ASIC simulation on an FPGA will retain the combinatorial stages but run at dramatically lower fMax)
benlivengood
4 hours ago
I should have been a little clearer. I meant that the miners spent a brief period optimizing FPGAs before they abandoned them entirely for ASICs, but during that brief period I'm guessing they squeezed as many hashes/watt out of the FPGAs as they could.
picture
11 hours ago
Yes, hard silicon will be another magnitude more performant than FPGAs and GPUs, but ASICs properly take on negative value when they're no longer profitable to mine with. (Note that efficiency won't be much better at the same process node. You can just pump more power through each ASIC die)
Edit - I misread your comment. ASIC designers will use FPGAs to test their design but it won't be optimized for FPGAs which have a different logic-and-memory characteristic than ASICs. There aren't many great SHA256 FPGA implementations, largely because there's not that much demand for one
the8472
10 hours ago
> but ASICs properly take on negative value when they're no longer profitable to mine with
No matmul coin where the hardware could be repurposed for AI stuff?
15155
10 hours ago
Modern BTC ASICs consist of 1600-3200 SHA256 cores and only output nonces for sha256(sha256(btcBlockHeader)) - there's no memory or ability to obtain other output.
throwawaymaths
8 hours ago
always thought it might be cool to repurpose fast double sha engines for error detection in storage arrays
throwawaymaths
8 hours ago
matmul isn't a trapdoor function
Retr0id
11 hours ago
Unfortunately I think most of that innovation happened behind closed doors, because everyone wanted to maintain their competitive advantages.
sMarsIntruder
8 hours ago
Yes, ASICS are definitely very closed source for that specific reason.
15155
10 hours ago
SHA256 is extremely FF-heavy, you need around 200k for an optimized, unrolled, pipelined implementation.
UltraScale+ chips will run a proper design at 600MHz-800MHz, big chips might be able to fit 24 cores. The Artix chip OP used is extremely slow and too small to fit this style of implementation.
ethan_smith
10 hours ago
[flagged]
Retr0id
9 hours ago
I was confused by this reply, but it would appear ethan_smith is a (rather good!) LLM bot: