====== Streaming Tables ====== **Streaming Tables** is a Databricks capability that enables direct streaming data ingestion and transformation through dbt (data build tool), providing real-time or near-real-time processing of streaming data sources within dbt transformation workflows. The feature integrates Databricks' streaming infrastructure with dbt's declarative transformation language, allowing data engineers to build scalable streaming data pipelines without requiring separate stream processing frameworks. ===== Overview and Architecture ===== Streaming Tables represent an evolution in how organizations approach real-time data transformation, bridging the gap between batch transformation frameworks and continuous stream processing systems. Rather than requiring separate tools or custom code for handling streaming data, Streaming Tables enable dbt users to define transformations that automatically process data as it arrives from source systems (([[https://www.databricks.com/blog/open-platform-unified-pipelines-why-dbt-databricks-accelerating|Databricks - Open Platform Unified Pipelines (2026]])). The architecture leverages Databricks' Delta Lake as the underlying storage format, ensuring ACID transaction guarantees and schema evolution capabilities even for streaming workloads. This integration allows streaming transformations to maintain the same data quality and governance standards as batch pipelines, addressing a key challenge in real-time data architectures where consistency and reliability often conflict with velocity requirements. ===== Technical Implementation ===== Streaming Tables operate within dbt's declarative SQL framework, requiring minimal additions to existing dbt workflows. Data engineers define streaming transformations using standard dbt model syntax, with the underlying Databricks infrastructure automatically handling the streaming execution engine, state management, and incremental processing logic. The implementation supports multiple streaming sources including Kafka, cloud object storage (S3, Azure Blob Storage, GCS), and Databricks' native connectors. Transformations execute continuously, processing new data as it becomes available while maintaining efficient resource utilization through Databricks' auto-scaling capabilities. The system manages checkpoint state automatically, ensuring exactly-once or at-least-once processing semantics depending on the transformation requirements. Unlike traditional stream processing frameworks that require stateful computation code, Streaming Tables operate within the familiar dbt materialization paradigm, reducing the expertise required to implement streaming pipelines and enabling broader adoption across data teams already familiar with dbt's SQL-based approach. ===== Integration with dbt Workflows ===== Streaming Tables integrate seamlessly into existing dbt projects, allowing teams to combine streaming transformations with batch models within the same dag (directed acyclic graph). This unified approach eliminates the operational complexity of maintaining separate batch and stream processing systems. Data lineage tracking, testing, and documentation features work identically across streaming and batch models, providing consistent visibility across the entire data pipeline. The capability enables incremental refinement of raw streaming data through multiple dbt model layers, similar to traditional dimensional modeling patterns but applied to continuously arriving data. Downstream consumers can access transformed streaming data through standard Delta Lake tables, SQL APIs, or Databricks' business intelligence integrations. ===== Use Cases and Applications ===== Streaming Tables address several critical business scenarios: * **Real-time analytics dashboards** that require sub-minute latency on operational metrics * **Fraud detection systems** that process transaction streams for immediate pattern recognition * **IoT data processing** from distributed sensors requiring immediate aggregation and anomaly detection * **Log and event processing** from application infrastructure requiring rapid troubleshooting and monitoring * **Market data pipelines** requiring millisecond-level transformation of financial instruments Organizations can implement these use cases without maintaining specialized stream processing expertise, reducing the operational burden and accelerating time-to-production for streaming initiatives. ===== Advantages and Considerations ===== The primary advantages of Streaming Tables include reduced complexity through unified batch-stream infrastructure, automatic state management eliminating manual checkpoint handling, and the ability to leverage existing dbt expertise for streaming pipelines. The Delta Lake foundation provides strong consistency guarantees uncommon in traditional stream processing systems. Considerations include latency characteristics that may not suit ultra-low-latency applications requiring sub-second response times, as the system optimizes for sustained throughput rather than minimum latency. Organizations with existing Apache Spark Structured Streaming or Kafka Streams investments may require assessment of migration costs and compatibility implications. ===== See Also ===== * [[system_tables|System Tables]] * [[databricks|Databricks]] * [[lakeflow|Lakeflow]] * [[databricks_connector_for_google_sheets|Databricks Connector for Google Sheets]] * [[lakebase|Lakebase]] ===== References =====