Biweekly Data & Analytics Digest
Posts
dbt’s AI Leap, Salesforce’s Informatica Play, and the Case for Open Table Formats

dbt’s AI Leap, Salesforce’s Informatica Play, and the Case for Open Table Formats

Biweekly Data & Analytics Digest: Cliffside Chronicle

Josh Miramant
04 Jun • Estimated Reading Time: 8 minutes

dbt Just Rebuilt the Core and layered in AI Features

dbt Labs just dropped two major announcements that reshape both the guts and the user experience of the dbt platform. First, the Fusion Engine replaces the legacy Python-only compilation with a new Rust-based engine that’s fast, composable, and finally supports multi-language transformations. That means you can mix SQL with Python, dbt-Jinja, and even REST API calls—all in the same DAG. Second, they’re introducing AI-powered onboarding tools to make dbt much more accessible to data analysts, not just engineers. This includes an in-app copilot, AI code generation, and context-aware documentation support to help users ramp faster without deep knowledge of the dbt syntax or Jinja quirks.

The AI onboarding features show dbt Labs is serious about flattening the learning curve…a key bottleneck for data orgs trying to scale adoption across business units. We’ve seen many teams bottlenecked by the handoff between analytics engineers and less technical analysts. These tools could reduce that friction dramatically.

dbt is evolving from a templated SQL tool into a true transformation platform. If you’re investing in modern data platforms like Snowflake or Databricks and still treating dbt as a SQL bundler, this relationship is changing. This changes how teams should think about data modeling, testing, and orchestration in 2025 and beyond.

AI Features. Fusion Engine.

Rethinking the Modern Data Stack: DuckLake’s Vision for a Unified Future

The team behind DuckLake just published a manifesto that critiques the modern data stack as fragmented, costly, and overly complex. Their argument? The modular tooling that once promised agility has devolved into an ecosystem of brittle integrations and overhead. Instead, they’re proposing an integrated, opinionated platform that prioritizes developer experience and seamless functionality, which minimizes the need for orchestration and glue-code. It’s a call to simplify, not just scale.

This manifesto taps into a growing sentiment we’ve seen across mid-market and enterprise data teams. Tool sprawl is slowing teams down. It’s no longer just about choosing the “best” tool. It’s about making the stack actually work end to end. DuckLake’s stance is strategic. By challenging the dominant Snowflake–dbt–Airflow–Looker pattern, they’re making a case for what the next-gen platform could look like: more unified, more developer-friendly, and less operationally taxing.

Whether they can deliver is still unknown, but they raise the question of whether we should keep stitching tools together, or rebuild with cohesion as the default.

Continue Reading.

Salesforce Acquires Informatica. An $8 Billion bet on AI

The rumors can be put to bed…Salesforce has officially acquired Informatica for approximately $8 billion, offering $25 per share in cash. This acquisition aims to integrate Informatica’s robust data management tools including data integration, governance, and metadata management…into Salesforce’s existing platforms like Data Cloud, Agentforce, MuleSoft, and Tableau. The goal is to create a unified architecture that supports agentic AI, enabling AI agents to operate with greater autonomy and reliability across enterprise environments.

This acquisition underscores the critical importance of data management in the deployment of effective AI solutions. By bringing Informatica’s capabilities in-house, Salesforce aims to enhance the quality and trustworthiness of data feeding into its AI systems. This move positions Salesforce to offer more comprehensive AI-driven services, particularly in sectors requiring stringent data governance. It’s a strategic step to solidify Salesforce’s role in the evolving landscape of AI-powered enterprise solutions.

Continue Reading.

A Pragmatic Guide to building AI Agents

Mansurova explores the emerging role of code agents, AI systems that don’t just chat or recommend, but write, test, and iterate on software autonomously. Unlike today’s copilots, code agents operate in feedback loops, combining LLMs with tool use, memory, and environment awareness to actually build and ship working applications. It’s a major shift from passive code generation to goal-driven, autonomous development workflows because the agent can reason about architecture, resolve errors, and even improve code across iterations.

Code agents challenge a key bottleneck in software and data engineering: scaling execution without scaling headcount. We think this is especially relevant in analytics engineering, where most teams are stuck in reactive loops (tweaking dbt models, rebuilding DAGs, or managing brittle pipelines). Compared to today’s AI-assisted tools like GitHub Copilot or dbt Copilot, these agents promise autonomy, not just assistance. The tradeoff? You need robust sandboxing, observability, and policy enforcement to avoid turning your dev environment into the Wild West.

The shift from copilots to code agents is underway, and data teams should be thinking now about where this fits into their platform strategy.

Continue Reading.

The Strategic Case for Open Table Formats in Modern Data Architectures

Vu Trinh argues that while cloud warehouses (Snowflake, BigQuery) made analytics easier, they locked data into proprietary formats and limited flexibility. Open table formats flip that model. They bring ACID transactions, schema evolution, and time-travel to cloud object storage (like S3 or ADLS), enabling data teams to separate storage from compute and avoid vendor lock-in. It’s about unlocking optionality across engines—Spark, Trino, DuckDB, even Snowflake itself.

Open table formats are the foundation for a true “data lakehouse”. They give teams the freedom to run diverse compute engines on the same data without duplicating pipelines or rewriting logic. This matters whether you’re trying to run AI workloads on top of Parquet, power fast BI queries, or avoid cloud tax from vertically integrated platforms. Cost savings is important, but so is future-proofing your architecture. What we’ve seen is that teams that embrace open formats gain leverage. They can choose best-fit tools without rebuilding everything. However, you need to rethink your governance, cataloging, and performance tuning strategies.

If you care about avoiding data silos and long-term agility, open formats are inevitable.

Continue Reading.

The Data + AI Event You Don’t Want to Miss

The Data + AI Summit is less than a week away! Join us in for a data-driven evening with Joe Reis, renowned data strategist and co-author of “Fundamentals of Data Engineering”, as we network, learn, and unwind in the heart of San Francisco.

Register now.

What topics interest you most in AI & Data?

We’d love your input to help us better understand your needs and prioritize the topics that matter most to you in future newsletters.

“Where there is data smoke, there is business fire”

— Thomas Redman