powrelay.xyz

llm devs are weird.... they release these fp16 models one after another, which require at least A100 to run and nobody bothers to do q4 versions which provide 90 % of the performance for 1/4th of the vram, which in hw cost is probably 1/8th if you can run it in consumer level gpu. idk, they just love H100s so much