llm devs are weird....
they release these fp16 models one after another, which require at least A100 to run
and nobody bothers to do q4 versions which provide 90 % of the performance for 1/4th of the vram, which in hw cost is probably 1/8th if you can run it in consumer level gpu.
idk,
they just love H100s so much