The chipmaker, now the most valuable public company in the world, said strong demand for its chips should continue this quarter.

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    3
    ·
    edit-2
    22 hours ago

    On the training side, it’s mostly:

    • Paying devs to prepare the training runs with data, software architecture, frameworks, smaller scale experiments, things like that.

    • Paying other devs to get the training to scale across 800+ nodes.

    • Building the data centers, where the construction and GPU hardware costs kind of dwarf power usage in the short term.

    On the inference side:

    • Sometimes optimized deployment frameworks like Deepseek uses, though many seem to use something off the shelf like sglang

    • Renting or deploying GPU servers individually. They don’t need to be networked at scale like for training, with the highest end I’ve heard (Deepseek’s optimized framework) being like 18 servers or so. And again, the sticker price of the GPUs is the big cost here.

    • Developing tool use frameworks.

    On both sides, the biggest players burn billions on Tech Bro “superstar” developers that, frankly, seem to Tweet more than developing interesting things.

    Microsoft talks up nuclear power and such just because they want to cut out the middleman from the grid, reduce power costs, reduce the risk of power outages and such, not because there’s physically not enough power from the grid. It’s just corporate cheapness, not an existential need.