Understanding CU consumption of streaming systems in Microsoft Fabric

I really struggled with a title for this post, because it covers a lot of ground. It's a long read, but totally worth it. Ultimately I settled on "Understanding CU consumption of streaming systems in Fabric". I have been wanting to understand for a while the impact various architectural decisions can make on Fabric ingestion, and over the holidays here I have finally had time to play with it. The article below walks through architectural considerations for loading data into Fabric, and the decisions and tradeoffs.

I used the ubiquitous NYC Taxi data set, and picked a single month (May of 2025). For the month of May, I created a few ground rules on the data. The average number of yellow trips per day in May of 2025 was 150K, so I looked at it two different ways:

Streaming the data to an Eventstream using an average number of rows per second throughout the day (2). Landing and refining the data in Eventhouse using update policies
Using Spark Structured Streaming to stream the data from an Azure Event Hub to a notebook, applying the silver transformation in flight and landing the data directly to a silver table

Test Objectives

The goal was to provide an objective viewpoint of the various tools available to stream data in real-time into Fabric, and the impact on CU consumption in Fabric by choosing each option. To be complete, for the 2 streaming solutions I also measured end to end latency, to see which streaming solution in Fabric was quicker at streaming the data.

Technical Clarifications

Eventstreams used push (event processing before ingestion) to connect to Eventhouse. I did not set any special properties in the Eventstream.
Solution scope is limited to ingesting data into the compute engine (either Lakehouse or Eventhouse). Refreshing semantic models, or using other action systems, is not included in scope. I wanted to focus on the pure cost of CU ingestion and transformation.
Native Fabric services were leveraged as much as possible. The generation scripts I intentionally built to be deployed as a separate solution to be run locally or outside of Fabric. This is to alleviate any "internal" backchannels that may exist, and to get a realistic sense of latency coming from an outside source.

I assumed the data coming in was raw and in a real world would need to be refined and transformed. Below are the 2 architectures I leveraged. For both Eventstream and the Spark SS notebook, the data transformation step happens directly on the data while in motion.

Architecture Flows

For Real-Time Intelligence (RTI):

Event Hub → Eventstream → Eventhouse (with transformation)

For Spark Structured Streaming:

Event Hub → Spark Notebook → Lakehouse (with transformation in flight)

I felt like this was a fair assumption of various architectures that customers may use to evaluate their streaming data stacks. While it does not encompass all scenarios, the goal was to be representative. I could have excluded the event hub in the RTI architecture and sent directly to eventstream, this would have further decreased the latency and the real production workload cost, however there were 2 reasons I did not do this:

Batch comparison consistency - In the testing each batch run generated a unique identifier. By sending everything to the event hub, I was able to easily compare an exact batches' performance against the various approaches
Realistic latency modeling - I wanted to keep the latency as close to reality as possible.

Test Setup and Monitoring

That should cover the background. I prefer to pre-create a lot of the services, below is the structure of what my workspace looks like. I always create the monitoring database.

Before I show the results, I created a custom visualization on top of the Fabric Capacity metrics app (thanks Sandeep Pawar for your help on that!) that breaks down the CU usage of each artifact by hour. Then, I could easily calculate the following for each batch run:

Latency (excluded from the batch runs)
CU cost (for RTI costs, since it is a combination of Eventstream and Eventhouse I calculated the both)
Estimated capacity sizing - I roughly estimated the equivalent size Fabric capacity you might need to run each of these workloads. This is an estimate, not official in any capacity.

Test Results

Initial 15-minute Tests

Test Type	Run	Message Count	CU Consumed in the hour	CU Hour Duration	CU/s	Average Latency (Ms)
Real-Time Intelligence	1	2 events/sec		3600		747
- Eventstream (Push)			4267.7367	3600	1.1855
- Eventhouse			3105	3600	0.8625
Real-Time Intelligence	2	2 events/sec		3600		649.45
- Eventstream (Push)			7466.3367	3600	2.0740
- Eventhouse			0.7232	3600	0.0002
Spark SS to Lakehouse (Default)	1	2 events/sec	25202.8530	3600	7.0007925	2244
Spark SS (Small Pool)	2	2 events/sec	8822.9515	3600	2.4508	1447

Key Observations from Initial Tests

I was surprised by a few results of the tests that I ran:

Eventhouse efficiency - Eventhouse used way less CU's than I was expecting in both of my tests (not even averaging 1 CU/s)
Compute pool sizing impact - When using Spark SS, setting the compute pool size to small helped dramatically (moving from 7 CU/s to 2.5 CU/s, while improving the latency as well). This worked well at a small scale test, (15 minutes), but did not hold up when I ran the same small compute for an extended period

Extended 2-Hour Validation Tests

Next, I wanted to validate that the short time frame wasn't causing any anomalies in the data, so I re-ran the streaming test again, this time allowing it to run for 2 hours:

Test Type	Message Count	CU Consumed	CU Hour Duration	CU/s	Average Latency
Real-Time Intelligence	2 events/sec		7200	3.672	668.7477
- Eventstream (Push)		12809.6849 (H1: 5335.09 + H2: 7474.5949)	7200	1.779
- Eventhouse		13635 (H1: 6615 + H2: 7020)	7200	1.894
Spark SS (Small Pool)	2 events/sec	34388.7695	7200	4.77	15183

Extended Tests Observations

Consumption attribution - Notebooks pin all consumption to the start hour of when the notebook ran, whereas with RTI workloads we can see more of the even distribution. This leads to a perceived hike in the notebook run.
Small pool limitations - Many of the short term gains that were achieved by forcing the Spark cluster to run at small were lost (gaining large latency due to back-pressure, increased CU consumption, and higher CU/s)
RTI consistency - Eventstream and Eventhouse were extremely consistent in their consumption, running between 1 and 2 CU/s during the time period.

Volume Scale Testing

I factored this into my future tests and ran all of my future tests at 2 hours, instead of only taking samples at small 15 minute intervals. Since the first tests were essentially a trickle (2 Events Per Second/EPS), I re-ran the test several more times at higher volumes:

10X Volume Test (20 EPS):

Test Type	Message Count	CU Consumed	CU Hour Duration	CU/s	Average Latency
Real-Time Intelligence	20 events/sec		7200	3.59	617.47
- Eventstream (Push)		12881.0119 (H1: 5332; H2: 7549.0119)	7200	1.79
- Eventhouse		12960 (H1: 4320; H2: 8640)	7200	1.8
Spark SS (Default)	20 events/sec	98618.6795	7200	13.70	2329

100X Volume Test (200 EPS):

Test Type	Message Count	CU Consumed	CU Hour Duration	CU/s	Average Latency
Real-Time Intelligence	200 events/sec		7200	3.418	648.83
- Eventstream (Push)		13312.2686 (H1: 7575.6415; H2: 5736.6271)	7200	1.8489
- Eventhouse		11299.22 (H1: 3199.225; H2: 8100)	7200	1.5693
Spark SS	200 events/sec	168839.453	7200	23.45	18999

Volume Analysis

The volume flow usage seems to be relatively consistent for Eventstream + Eventhouse, while the Spark notebook seems to consume CU's much differently. Since it appears to "take out a loan and then pay it back", I can easily see throttling occurring as the notebook tries to provision something really high against the capacity, and then encounters throttling and capacity limits. Contrast that to Eventstream, which "pays as it goes" and seems to be able to scale much easier.

Key takeaway: The larger the volume, the more it pushes you into RTI.

It also presents me with an alarming fact - my SS notebook seems to get more expensive the larger the data volume grows. When running 2 EPS, the CU was around 2.4 – 5. As the data volume increases, my CU consumption went to 14, then 23.45.

Maximum Volume Test (5,000 EPS)

For the last test, I decided to really ramp up the volume. I set the volume coming from my local notebook at the max achievable on a single thread (5K messages (1.43 mb) per sec/ ~120 gb/day).

Test Type	Message Count	Batch_Id	CU Consumed	CU Hour Duration	CU/s	Average Latency
Real-Time Intelligence	5,000 events/sec	8a642bad-acaf-44b9-9da7-982569414154		7200	7.17	1386
- Eventstream (Push)			35679.6676 (H1: 16445.673; H2: 19233.9946)	7200	4.955
- Eventhouse			15930 (H1: 7830; H2: 8100)	7200	2.2125
Spark SS (Default)	5,000 events/sec		91980.0215	7200	12.775	37976

Key Insights and Calculations

Understanding RTI Pricing Model

A key concept here is to understand how CU is used for RTI - it's volume, not number of rows. You will be charged the same whether you ingest 1 GB's worth of data at 1024 messages at 1 MB each or if you send 1048576 messages at 1 kb each. That means you should understand the average size of a row as you start any streaming project, for accurate estimations. This is similar to how other streaming platforms work as well, such as Amazon Kinesis or Confluent Kafka.

Capacity Planning Calculations

Knowing this, we can easily calculate how much capacity we would need for most RTI workloads. For example, the average row size for the NYC taxi data set is about 200-300 bytes per row. I always take the max, so we'll say 300. That means for a particular day at 2 events per second we are ingesting roughly 0.0419 GB per day. (286400300=51.9 million bytes, or 0.05 GB).

Volume Extrapolation Table

Category	Regular Volume	X10	X100	X173X
Number of Daily Records	173,000	1,500,000	15,000,000	432,000,000
Bytes per row	300	300	300	300
Daily Volume, in bytes	45,000,000	450,000,000	4,500,000,000	129,600,000,000
Daily volume, in kb	43,945.31	439,453.13	4,394,531.25	126,562,500
Daily Volume in mb	42.92	429.15	4,291.53	123,596.19
Daily Volume in gb	0.04	0.42	4.19	121
Events Per Second	2	20	200	5,000

I stopped at 5,000 EPS. This is equivalent to 121 GB of data being ingested per day, fitting nicely inside of an F64 with RTI.

Production Recommendations

In a real production environment, there are a few callouts I'd make:

Skip Event Hub when possible - I would not send the data to an event hub first unless I had to. Using a custom endpoint in Eventstream would allow me to skip that, and would result in lower CU usage. There may also be an improvement in latency, but I didn't test that.
Use direct ingestion for transformations - Unless the data is being transformed within the Eventstream, don't use Event Processing Before (Push). Leverage direct ingestion, which will use less CU's if you are transforming the data in Eventhouse.
RTI uses pay-as-you-go pricing - Rather than consuming a large chunk of CU's upfront, both core RTI services use a pay-as-you-go type (SaaS) approach. This means that although at first glance it may look like they use a lot of CU's, when you break it down to per hour/per second they are very cost-effective.
Accurate forecasting is possible - We can accurately forecast our Fabric capacity costs based on what we are currently leveraging and what we plan on leveraging in the future as we scale.

If you're navigating AI applications of data, Fabric, or event-driven architectures and want a second opinion, feel free to reach out!