Yes. No article needed.
Yes. That’s why everyone is scrambling to create new interoperable model languages and frameworks that work on more efficient hardware.
Almost everything that is productized right now stems from work in the Python world from years ago. It got a swift uptake with Nvidia making it easier to use their hardware on compiled models, but now everyone wants more efficient options.
FPGA presents a huge upside to not being locked into a specific vendor, so some people are going that route. Others are just making their frameworks more modular to support the numerous TPU/NPU processors that everyone and their brother needlessly keeps building into things.
Something will come out of all of this, but right now the community shift is to do things without needing so much goddamn power draw. More efficient modeling will come as well, but that’s less important since everything is compiled down to something that is supported by the devices themselves. At the end of the day, this is all compilation and logic, and we just need to do it MUCH leaner and faster than the current ecosystem is creeping towards. It’s not only detrimental to the environment, it’s also not as profitable. Hopefully the latter makes OpenAI and Microsoft get their shit together instead of building more power plants.