Insight
Understanding CU consumption of streaming systems in Microsoft Fabric
I really struggled with a title for this post, because it covers a lot of ground. It's a long read, but totally worth it. Ultimately I settled on "Understanding CU consumption of streaming systems in Fabric". I have been wanting to understand for a while the impact various archit
I really struggled with a title for this post, because it covers a lot of ground. It's a long read, but totally worth it. Ultimately I settled on "Understanding CU consumption of streaming systems in Fabric". I have been wanting to understand for a while the impact various architectural decisions can make on Fabric ingestion, and over the holidays here I have finally had time to play with it. The article below walks through architectural considerations for loading data into Fabric, and the decisions and tradeoffs.
I used the ubiquitous NYC Taxi data set, and picked a single month (May of 2025). For the month of May, I created a few ground rules on the data. The average number of yellow trips per day in May of 2025 was 150K, so I looked at it two different ways:
- Streaming the data to an Eventstream using an average number of rows per second throughout the day (2). Landing and refining the data in Eventhouse using update policies
- Using Spark Structured Streaming to stream the data from an Azure Event Hub to a notebook, applying the silver transformation in flight and landing the data directly to a silver table
Test Objectives
The goal was to provide an objective viewpoint of the various tools available to stream data in real-time into Fabric, and the impact on CU consumption in Fabric by choosing each option. To be complete, for the 2 streaming solutions I also measured end to end latency, to see which streaming solution in Fabric was quicker at streaming the data.
Technical Clarifications
- Eventstreams used push (event processing before ingestion) to connect to Eventhouse. I did not set any special properties in the Eventstream.
- Solution scope is limited to ingesting data into the compute engine (either Lakehouse or Eventhouse). Refreshing semantic models, or using other action systems, is not included in scope. I wanted to focus on the pure cost of CU ingestion and transformation.
- Native Fabric services were leveraged as much as possible. The generation scripts I intentionally built to be deployed as a separate solution to be run locally or outside of Fabric. This is to alleviate any "internal" backchannels that may exist, and to get a realistic sense of latency coming from an outside source.
I assumed the data coming in was raw and in a real world would need to be refined and transformed. Below are the 2 architectures I leveraged. For both Eventstream and the Spark SS notebook, the data transformation step happens directly on the data while in motion.
Architecture Flows
For Real-Time Intelligence (RTI):
- Event Hub → Eventstream → Eventhouse (with transformation)
For Spark Structured Streaming:
- Event Hub → Spark Notebook → Lakehouse (with transformation in flight)
I felt like this was a fair assumption of various architectures that customers may use to evaluate their streaming data stacks. While it does not encompass all scenarios, the goal was to be representative. I could have excluded the event hub in the RTI architecture and sent directly to eventstream, this would have further decreased the latency and the real production workload cost, however there were 2 reasons I did not do this:
- Batch comparison consistency - In the testing each batch run generated a unique identifier. By sending everything to the event hub, I was able to easily compare an exact batches' performance against the various approaches
- Realistic latency modeling - I wanted to keep the latency as close to reality as possible.
Test Setup and Monitoring
That should cover the background. I prefer to pre-create a lot of the services, below is the structure of what my workspace looks like. I always create the monitoring database.
Before I show the results, I created a custom visualization on top of the Fabric Capacity metrics app (thanks Sandeep Pawar for your help on that!) that breaks down the CU usage of each artifact by hour. Then, I could easily calculate the following for each batch run:
- Latency (excluded from the batch runs)
- CU cost (for RTI costs, since it is a combination of Eventstream and Eventhouse I calculated the both)
- Estimated capacity sizing - I roughly estimated the equivalent size Fabric capacity you might need to run each of these workloads. This is an estimate, not official in any capacity.
Test Results
Initial 15-minute Tests
| Test Type | Run | Message Count | CU Consumed in the hour | CU Hour Duration | CU/s | Average Latency (Ms) |
|---|---|---|---|---|---|---|
| Real-Time Intelligence | 1 | 2 events/sec | 3600 | 747 | ||
| - Eventstream (Push) | 4267.7367 | 3600 | 1.1855 | |||
| - Eventhouse | 3105 | 3600 | 0.8625 | |||
| Real-Time Intelligence | 2 | 2 events/sec | 3600 | 649.45 | ||
| - Eventstream (Push) | 7466.3367 | 3600 | 2.0740 | |||
| - Eventhouse | 0.7232 | 3600 | 0.0002 | |||
| Spark SS to Lakehouse (Default) | 1 | 2 events/sec | 25202.8530 | 3600 | 7.0007925 | 2244 |
| Spark SS (Small Pool) | 2 | 2 events/sec | 8822.9515 | 3600 | 2.4508 | 1447 |
Key Observations from Initial Tests
I was surprised by a few results of the tests that I ran:
- Eventhouse efficiency - Eventhouse used way less CU's than I was expecting in both of my tests (not even averaging 1 CU/s)
- Compute pool sizing impact - When using Spark SS, setting the compute pool size to small helped dramatically (moving from 7 CU/s to 2.5 CU/s, while improving the latency as well). This worked well at a small scale test, (15 minutes), but did not hold up when I ran the same small compute for an extended period
Extended 2-Hour Validation Tests
Next, I wanted to validate that the short time frame wasn't causing any anomalies in the data, so I re-ran the streaming test again, this time allowing it to run for 2 hours:
| Test Type | Message Count | CU Consumed | CU Hour Duration | CU/s | Average Latency |
|---|---|---|---|---|---|
| Real-Time Intelligence | 2 events/sec | 7200 | 3.672 | 668.7477 | |
| - Eventstream (Push) | 12809.6849 (H1: 5335.09 + H2: 7474.5949) | 7200 | 1.779 | ||
| - Eventhouse | 13635 (H1: 6615 + H2: 7020) | 7200 | 1.894 | ||
| Spark SS (Small Pool) | 2 events/sec | 34388.7695 | 7200 | 4.77 | 15183 |
Extended Tests Observations
- Consumption attribution - Notebooks pin all consumption to the start hour of when the notebook ran, whereas with RTI workloads we can see more of the even distribution. This leads to a perceived hike in the notebook run.
- Small pool limitations - Many of the short term gains that were achieved by forcing the Spark cluster to run at small were lost (gaining large latency due to back-pressure, increased CU consumption, and higher CU/s)
- RTI consistency - Eventstream and Eventhouse were extremely consistent in their consumption, running between 1 and 2 CU/s during the time period.
Volume Scale Testing
I factored this into my future tests and ran all of my future tests at 2 hours, instead of only taking samples at small 15 minute intervals. Since the first tests were essentially a trickle (2 Events Per Second/EPS), I re-ran the test several more times at higher volumes:
10X Volume Test (20 EPS):
| Test Type | Message Count | CU Consumed | CU Hour Duration | CU/s | Average Latency |
|---|---|---|---|---|---|
| Real-Time Intelligence | 20 events/sec | 7200 | 3.59 | 617.47 | |
| - Eventstream (Push) | 12881.0119 (H1: 5332; H2: 7549.0119) | 7200 | 1.79 | ||
| - Eventhouse | 12960 (H1: 4320; H2: 8640) | 7200 | 1.8 | ||
| Spark SS (Default) | 20 events/sec | 98618.6795 | 7200 | 13.70 | 2329 |
100X Volume Test (200 EPS):
| Test Type | Message Count | CU Consumed | CU Hour Duration | CU/s | Average Latency |
|---|---|---|---|---|---|
| Real-Time Intelligence | 200 events/sec | 7200 | 3.418 | 648.83 | |
| - Eventstream (Push) | 13312.2686 (H1: 7575.6415; H2: 5736.6271) | 7200 | 1.8489 | ||
| - Eventhouse | 11299.22 (H1: 3199.225; H2: 8100) | 7200 | 1.5693 | ||
| Spark SS | 200 events/sec | 168839.453 | 7200 | 23.45 | 18999 |
Volume Analysis
The volume flow usage seems to be relatively consistent for Eventstream + Eventhouse, while the Spark notebook seems to consume CU's much differently. Since it appears to "take out a loan and then pay it back", I can easily see throttling occurring as the notebook tries to provision something really high against the capacity, and then encounters throttling and capacity limits. Contrast that to Eventstream, which "pays as it goes" and seems to be able to scale much easier.
Key takeaway: The larger the volume, the more it pushes you into RTI.
It also presents me with an alarming fact - my SS notebook seems to get more expensive the larger the data volume grows. When running 2 EPS, the CU was around 2.4 – 5. As the data volume increases, my CU consumption went to 14, then 23.45.
Maximum Volume Test (5,000 EPS)
For the last test, I decided to really ramp up the volume. I set the volume coming from my local notebook at the max achievable on a single thread (5K messages (1.43 mb) per sec/ ~120 gb/day).
| Test Type | Message Count | Batch_Id | CU Consumed | CU Hour Duration | CU/s | Average Latency |
|---|---|---|---|---|---|---|
| Real-Time Intelligence | 5,000 events/sec | 8a642bad-acaf-44b9-9da7-982569414154 | 7200 | 7.17 | 1386 | |
| - Eventstream (Push) | 35679.6676 (H1: 16445.673; H2: 19233.9946) | 7200 | 4.955 | |||
| - Eventhouse | 15930 (H1: 7830; H2: 8100) | 7200 | 2.2125 | |||
| Spark SS (Default) | 5,000 events/sec | 91980.0215 | 7200 | 12.775 | 37976 |
Key Insights and Calculations
Understanding RTI Pricing Model
A key concept here is to understand how CU is used for RTI - it's volume, not number of rows. You will be charged the same whether you ingest 1 GB's worth of data at 1024 messages at 1 MB each or if you send 1048576 messages at 1 kb each. That means you should understand the average size of a row as you start any streaming project, for accurate estimations. This is similar to how other streaming platforms work as well, such as Amazon Kinesis or Confluent Kafka.
Capacity Planning Calculations
Knowing this, we can easily calculate how much capacity we would need for most RTI workloads. For example, the average row size for the NYC taxi data set is about 200-300 bytes per row. I always take the max, so we'll say 300. That means for a particular day at 2 events per second we are ingesting roughly 0.0419 GB per day. (286400300=51.9 million bytes, or 0.05 GB).
Volume Extrapolation Table
| Category | Regular Volume | X10 | X100 | X173X |
|---|---|---|---|---|
| Number of Daily Records | 173,000 | 1,500,000 | 15,000,000 | 432,000,000 |
| Bytes per row | 300 | 300 | 300 | 300 |
| Daily Volume, in bytes | 45,000,000 | 450,000,000 | 4,500,000,000 | 129,600,000,000 |
| Daily volume, in kb | 43,945.31 | 439,453.13 | 4,394,531.25 | 126,562,500 |
| Daily Volume in mb | 42.92 | 429.15 | 4,291.53 | 123,596.19 |
| Daily Volume in gb | 0.04 | 0.42 | 4.19 | 121 |
| Events Per Second | 2 | 20 | 200 | 5,000 |
I stopped at 5,000 EPS. This is equivalent to 121 GB of data being ingested per day, fitting nicely inside of an F64 with RTI.
Production Recommendations
In a real production environment, there are a few callouts I'd make:
-
Skip Event Hub when possible - I would not send the data to an event hub first unless I had to. Using a custom endpoint in Eventstream would allow me to skip that, and would result in lower CU usage. There may also be an improvement in latency, but I didn't test that.
-
Use direct ingestion for transformations - Unless the data is being transformed within the Eventstream, don't use Event Processing Before (Push). Leverage direct ingestion, which will use less CU's if you are transforming the data in Eventhouse.
-
RTI uses pay-as-you-go pricing - Rather than consuming a large chunk of CU's upfront, both core RTI services use a pay-as-you-go type (SaaS) approach. This means that although at first glance it may look like they use a lot of CU's, when you break it down to per hour/per second they are very cost-effective.
-
Accurate forecasting is possible - We can accurately forecast our Fabric capacity costs based on what we are currently leveraging and what we plan on leveraging in the future as we scale.
If you're navigating AI applications of data, Fabric, or event-driven architectures and want a second opinion, feel free to reach out!