Insight
Data transformation in Real Time Intelligence
Organizations increasingly rely on real time intelligence to gain immediate insights and respond proactively to events as they unfold. The ability to transform data instantly is central to these capabilities, enabling systems to ingest, process, and analyze information in motion
Organizations increasingly rely on real time intelligence to gain immediate insights and respond proactively to events as they unfold. The ability to transform data instantly is central to these capabilities, enabling systems to ingest, process, and analyze information in motion rather than waiting for batch intervals. Transforming and processing event driven data empowers decision-makers to act on up-to-date information, supports operational agility, and improves business outcomes.
Choosing the right technology has dramatic impact on the performance and cost of both ingestion AND consumption. Within Real Time Intelligence in Microsoft Fabric, there is a rich set of powerful data transformations capabilities from light, low overhead tweaking to powerful model and code-based transformations. How do you know when to transform data in each tool? Read on to find out!
ELT vs. Streaming: Understanding sets vs. payloads
One of the most significant differences between traditional Extract, Load, Transform (ELT) pipelines and streaming data processing lies in the granularity of transformation. ELT pipelines are designed for set-based operations; they typically ingest batches of data and process them collectively. This allows for complex transformations, aggregations, and joins across entire datasets, but at the cost of latency.
Back when I used to teach SQL Server Integration Services classes, I used to teach to stop thinking of data as rows and to think of the entire set of data that you want to move. If I had a stack of 10,000 post it notes, in which another post it was added to the pile every few minutes throughout the day, it was far more efficient to wait and pick up the entire stack and move it across the room rather than move them one by one. However, as data sizes have increased this has become harder to manage. It is one thing to move a few thousand rows across an on-prem network every day; it is an entirely different subject when you are discussing moving multi-million row sets across cloud environments that are incurring ingress/egress charges.
In contrast, streaming workloads process data by payload, transforming each event as it arrives. This enables near-instantaneous insights and actions but requires a different mindset regarding transformation logic. This is much closer to classic application-level design, where individual operations are processed. Using our post it notes example, instead of waiting to pick up the entire stack each post it is moved across the room as a new one becomes available. This allows much more rapid response rather than needing to wait for the entire stack to become available. This makes it much easier to flexibly move data, keep network traffic to a minimum, and integrate data into the fabric of the business.
The choice between set-based and event-based processing is driven by the requirements for latency and data actionability.
Real Time Intelligence
Now that we've established why, let's understand what tools are available within Fabric's Real Time Intelligence to process these rows of data. Before we dive too deep, it's important to remind everyone of Roche's Maxim:
"Process data as far upstream as possible, and as far downstream as necessary."
This principle doesn't change when processing event driven data. I might even argue that it is MORE relevant in event driven architectures, because data can be transformed almost instantaneously after generation.
Remember that event processing is not like traditional ETL tools. You are operating on a particular payload, which may contain one or multiple rows. In a payload, the stream is typically aware of what is happening in the context of that moment. One of the benefits of real time intelligence is that it contains both a stream processor (via the Eventstream engine) and a state store (via Eventhouse). Eventhouse can also serve as a very sophisticated transformation engine.
Eventstream Transformation Capabilities
With Eventstream, there are lots of capabilities available to transform data either with no-code or low-code tools. Some examples of some types of transformations you might do in an Eventstream:
- Normalizing field formats
- Performing lightweight enrichment (such as geocoding based on a location field), or filtering out irrelevant records
- SQL transformations
- Time series windows (hopping, sliding, session, snapshot, tumbling)
- Content based routing
- Schema Registration and Data Contracts
Complementary Strengths
Eventstream excels at immediate transformations over events ingested during a reasonably recent time window. Eventhouse excels at super-quick insights over a much longer time period, well beyond the confines of a single event payload.
- Eventstream's sweet spot: Time window analytics over a few seconds to minutes
- Eventhouse's sweet spot: Anything from a few minutes to days/months
Since everything that happens is logged as an event, events can be compared to previous state, contextualized, or transformed via update policies. These transformations often leverage Kusto Query Language (KQL), allowing for sophisticated filtering, aggregation, and correlation across large volumes of data.
Choosing the Right Transformation Approach
Selecting the optimal transformation strategy for real-time intelligence depends on several key factors:
1. Data Context
Context can be applied in many places in Eventstream. Use Roche's Maxim to help you decide what works best for your scenario.
2. Complexity of Logic
Simple normalization and enrichment tasks fit well within Eventstream. More advanced analytics or correlations may necessitate Eventhouse and KQL integration.
3. Scalability and Maintenance
Streaming transformations are generally easier to scale horizontally but may require careful state management. Eventhouse transformations can be more resource-intensive and complex to maintain, especially as reference datasets grow.
4. Integration and Ecosystem
Consider the broader data architecture—how Eventstream and Eventhouse fit into downstream analytics, reporting, and machine learning workflows. Will other users in your organization need to access the stream directly? Will they access via Eventhouse? Via the main database or follower databases?
5. Personal Preference
Sometimes, it's just personal preference! You can write the same transformation in either Eventstream or Eventhouse. Would you prefer to use SQL or KQL?
6. Cost Considerations
Something else to consider is cost. How many CU's within Fabric are consumed based on the type of transformation you are looking to apply? There is a trade off, but as your data volumes scale you want to ensure that your solution is as efficient as possible.
7. Schema Flexibility
In your inputs, are you expecting your schema to be relatively static with few changes and want to ensure schema on write? In this scenario, breaking that schema out during ingestion and using content based routing, array expansion, and others makes it easy for you to do so.
Does your schema have a lot of variability and change frequently? In that scenario, leveraging Kusto capabilities such as DropMappedFields allows you to code defensively from the beginning.
Key Takeaways
The power of Real Time Intelligence lies in choosing the right tool for the right transformation at the right time. Whether you're performing lightweight field normalization in Eventstream or complex historical correlation in Eventhouse, understanding the strengths of each approach ensures optimal performance, cost efficiency, and maintainability.
Remember Roche's Maxim: process data as close to the source as makes sense for your use case, but don't hesitate to leverage downstream capabilities when the context or complexity demands it.
If you're navigating AI applications of data, Fabric, or event-driven architectures and want a second opinion, feel free to reach out!