Data Infrastructure Report: How GPU Spending Created a Networking Crisis
Power, cooling, and legacy networks become the real bottlenecks for AI deployments
Key Takeaways
AI is fundamentally a networking problem. This creates a massive growth cycle for data center infrastructure and this is early innings.
Converging data warehouses and data lakes into a single "data cloud" is the most important architectural trend in the data stack. This switch removes data silos and makes accessing data faster and cheaper.
Physical infrastructure constraints are becoming the primary bottleneck for AI deployments, not capital availability.
While AI hardware sees strong demand and AI-native companies show hyper-growth, traditional enterprises are buying but not fully consuming AI software due to migration complexity and unclear strategies.
Data and AI infrastructure is fundamentally about networking, creating a massive growth opportunity as companies scale to 100,000+ GPU clusters while wrestling with power, cooling, and legacy network constraints. Traditional enterprises are buying AI tools but struggling with actual consumption, creating a lag between infrastructure investment and software usage. The convergence toward unified data clouds and the split between speed-focused vendors like Super Micro versus enterprise-scale players like Dell are defining the competitive landscape for the next 12-18 months.
Market & Economic Outlook
Featured Companies
Arista Networks (ANET), Datadog (DDOG), Dell Technologies (DELL), Informatica (INFA), MongoDB (MDB), Snowflake (SNOW), Super Micro Computer (SMCI)
The research for this report is based on August 2025 SEC filings, earnings calls, equity research reports, and company announcements.
Market Momentum
The largest consumers of AI are increasing their Capex budgets for 2025 into 2026. Two of Arista's cloud titan customers are at or near the 100,000 GPU cluster size and are expected to grow past that mark in 2026. Along with GPU demand, Arista's primary growth narrative is the explosion in AI networking, which the CEO calls a "unique once-in-a-lifetime opportunity.”
Companies are recognizing that a governed and secure data foundation is a prerequisite for a successful AI strategy. This is driving demand for data clouds since AI agents need access to all of a company’s data in one place.
AI applications are expanding the definition of observability beyond infrastructure and logs to include GPU monitoring, LLM performance, AI agent behavior, and data pipeline reliability.
Market Dynamics
The AI server market is splitting between two primary approaches. Super Micro represents the customization model, appealing to customers who need the newest technology immediately. Dell represents the enterprise-focused model, appealing to large corporations that prioritize end-to-end solutions, global support, and financing.
Customers are actively seeking to consolidate IT tools. Vendors with broad, integrated platforms that span multiple domains like observability, security, and networking are winning larger deals as enterprises want to reduce complexity and cost.
Leading platforms are embracing open source standards. Snowflake's heavy investment in supporting open data formats like Apache Iceberg is a key example, allowing customers to use Snowflake's engine on data stored externally.
Datadog reported enterprise growth stayed flat while smaller companies increased usage in Q2. This suggests the investment wave is expanding beyond large enterprises to mid-market players.
Key Issues & Challenges
Infrastructure Constraints
Data center development is constrained by power and cooling availability. Even with Capex budgets available, physical infrastructure issues will impact AI deployments.
Heavy GPU spending has led to an underinvestment in current network infrastructure. Networks with 100,000+ GPUs in a single cluster introduce significant challenges for network architecture. Legacy systems aren’t able to process all of the traffic now, so companies are having to invest more in their current networking infrastructure.
Enterprise data is often fragmented across multiple cloud and on-premise systems, creating significant complexity for data integration and management. A key challenge for customers is ensuring their data is clean, secure, and governed before it's used for training LLMs. A failure in data quality can lead to flawed or risky AI models.
Enterprise Adoption
Snowflake and MongoDB reported continued customer cost optimization affecting their consumption-based revenue. This contrasts with strong demand for AI hardware and hyper-growth of AI-native software. Datadog's CEO observes that while enterprises are buying new products, actual usage growth remains "moderate," indicating a lag between infrastructure investment and software consumption. This likely stems from the complexity of migrating workloads and unclear AI strategies.
Notable Product Releases & Announcements
Platform Evolution & Advancements
Snowflake Cortex is a service that brings LLMs and AI to customers' data within the Snowflake platform. It allows users to perform tasks like sentiment analysis, translation, and summarization using simple SQL or Python functions, without needing to manage complex AI infrastructure.
MongoDB’s Atlas Vector Search is a key feature that allows developers to build AI-powered applications, such as semantic search and retrieval-augmented generation (RAG), directly on their operational data. Common examples include internal chatbots, real-time customer support agents, data analysis tools, and product recommendations and troubleshooting.
Datadog's Bits AI agents aim to investigate alerts and fix production issues automatically, representing a significant shift from passive monitoring to proactive remediation.
Infrastructure Innovation
Super Micro has established itself as a leader in Direct Liquid Cooling (DLC) technology. This is essential for cooling high-power GPUs and CPUs. Liquid cooling is becoming the standard for computing.
Dell's APEX as-a-Service provides a cloud-like consumption model for on-premises infrastructure. It's designed for customers who want cloud flexibility but need to keep infrastructure on-premises for security, governance, or performance reasons.
Key Insights
For IT/Data Teams
Infrastructure Planning
Having a strategy in place for GPU servers is crucial because they require high-speed networking, high-performance storage, and significant power and cooling requirements.
Teams should evaluate emerging GPU options beyond NVIDIA, such as AMD and Intel, to avoid single-vendor lock-in and potentially find cost-effective solutions for specific inference or fine-tuning workloads.
AI projects are being built with 800G technology. Teams must plan infrastructure (cabling, optics) for 800G now but understand the roadmap to 1.6T to avoid costly rip-and-replace cycles as the pace of technology accelerates.
Ethernet should be the default choice for AI networks due to its openness, scalability, and familiarity. This is also ideal for avoiding proprietary lock-in and ensuring multi-vendor flexibility.
Data Strategy
Establish a foundation of trusted data. This requires a comprehensive data management strategy focused on cataloging, cleansing, and securing critical data assets.
Adopt open table formats like Apache Iceberg to avoid vendor lock-in and allow different compute engines to access the same data. For example, Snowflake's support for Iceberg Tables provides a powerful hybrid model, letting teams use Snowflake's performance, security, and governance on data stored externally in their own cloud storage like AWS S3.
AI can accelerate data discovery, automate data quality rule enforcement, and generate data pipelines, freeing up data engineers to focus on higher-value tasks.
Operations & Monitoring
IT teams should prioritize solutions that offer GPU monitoring across different environments (cloud, on-prem), LLM observability to track prompts and model performance, and AI agent monitoring.
As data volumes expand, costs can spiral. Use vendor-provided tools and pricing models designed for cost management.
For Business Leaders
Investment Strategy
Networking infrastructure is vital for maximizing a return on GPU investments. An inefficient network can lead to 30-50% of GPU time being wasted on data exchange, directly impacting job completion times and operational costs.
A successful AI strategy depends entirely on the quality and accessibility of data. Position investments in data governance and management as enablers of business transformation, risk mitigation, and competitive advantage.
The real competitive advantage in AI will not come from using generic models, but from fine-tuning models on your own unique business data. Support and fund projects that create insights and applications that competitors cannot replicate.
Operations Planning & Financial Management
Plan for data center sprawl and interconnect since physical constraints on power, space, and cooling are forcing AI infrastructure to be distributed across multiple campuses.
Your teams are likely managing dozens, if not hundreds, of monitoring and security tools, leading to inefficiency and inflated costs. Support initiatives to consolidate into unified platforms to gain operational leverage, simplify management, and improve cross-team collaboration.
Choose AI server vendors strategically. Super Micro offers speed and access to innovative technology. Dell offers comprehensive support and supply chain stability. Align the choice with your AI strategy and risk tolerance.
The move to consumption-based software models offers flexibility but requires careful financial planning. Work with IT and finance teams to forecast usage and manage budgets to avoid unexpected costs as AI and data workloads scale.
For Investors
Immediate Opportunities (12-18 months)
AI server demand far outstrips supply. For example, Dell has $2.6 billion in AI server orders and a rapidly growing backlog. In contrast, traditional server and storage demand remains soft, with customers reprioritizing budgets toward AI initiatives.
Watch for the adoption of data observability tools and AI agents. As it gets easier for AI to write code, demand for DevOps and observability tools will expand.
Next Wave (18+ months)
As the AI hype cycle matures, focus will shift from building models to ensuring they’re accurate, compliant, and built on trusted data. This creates opportunities in data governance and model validation tools.
High-performance storage represents the critical second wave after compute servers, as AI models require vast amounts of data access.
Short Term Trends to Watch
The ability of NVIDIA and other chipmakers to ramp supply of their latest products will be the key catalyst for AI server market growth in the next 12-18 months. A major constraint is the availability of key components such as GPUs, CPUs, memory, and networking equipment.
Cloud titans will continue to grow 100,000+ GPU clusters into 2026. Some titans are discussing clusters that could potentially go to 1 million GPUs.
The US federal government is a key growth area. Monitor for announcements of new agency wins and expanded use cases.
Additional Content
Video Summary
Podcast Summary
Found value in this analysis? Subscribe for monthly insights into data infrastructure, the cybersecurity landscape, and enterprise applications.
Questions or feedback? Leave a comment or reach out to me on LinkedIn.
Sources: Analysis based on Q2 2025 earnings calls, SEC filings, and equity research reports from Arista Networks, Datadog, Dell Technologies, Informatica, MongoDB, Snowflake, and Super Micro Computer (August-September 2025).


