APPROXIMATEDISTINCTCOUNT function in DAX with Examples

The APPROXIMATEDISTINCTCOUNT function in DAX is used to approximate the number of distinct values in a column, especially when dealing with large datasets where exact counts might be computationally expensive. This function is part of Power BI’s efforts to enhance performance by using approximation techniques when calculating distinct values.

Syntax

APPROXIMATEDISTINCTCOUNT(<column>)
  • : The column for which you want to approximate the distinct count.

When to Use APPROXIMATEDISTINCTCOUNT

  • Large datasets: It can be used when you’re working with very large tables (e.g., millions of rows) and performance is a concern.
  • Performance optimization: It helps in improving the speed of queries where exact distinct counts are not required.

Example Using Superstore Dataset

The Sample Superstore dataset typically contains sales data and is commonly used in Power BI examples. Let’s assume the following key columns in the dataset:

  • Order ID (unique order identifier)
  • Product ID (unique product identifier)
  • Customer ID (unique customer identifier)
  • Sales (amount of sales)
  • Region (sales region)

Example 1: Approximate Distinct Count of Customers

You want to calculate the approximate number of distinct customers in the Superstore dataset using the APPROXIMATEDISTINCTCOUNT function.

Step-by-Step

  1. Load Data: First, load the Superstore dataset into Power BI.
  2. Create a Measure: In the Data view or Report view, create a new measure to calculate the approximate number of distinct customers:
Approximate Customer Count = 
APPROXIMATEDISTINCTCOUNT('Sales'[Customer ID])
  1. Use in Visual: Add this measure to a Card visual in Power BI.
  • When displayed, this measure will show an approximation of the distinct customer count. This is much faster for large datasets than using DISTINCTCOUNT if the dataset is extremely large.

Explanation:

  • Column: In this example, the Customer ID column from the Sales table is passed to the APPROXIMATEDISTINCTCOUNT function. It returns the approximate count of distinct customers.

Example 2: Approximate Distinct Count of Products Sold

Suppose you want to know how many different products were sold across the entire dataset.

Step-by-Step

  1. Create a Measure for the approximate count of distinct products:
Approximate Product Count = 
APPROXIMATEDISTINCTCOUNT('Sales'[Product ID])
  1. Use in Visual: Add this measure to a Card visual or Table to see the approximate number of distinct products sold.
  • This will return an approximate count of distinct Product ID values in the Superstore dataset.

Example 3: Approximate Distinct Count in a Specific Region

Now, if you’re interested in calculating the approximate distinct customer count but only for the East region:

  1. Create a Measure for the approximate count of distinct customers in the East region:
Approximate Customer Count (East) = 
CALCULATE(
    APPROXIMATEDISTINCTCOUNT('Sales'[Customer ID]),
    'Sales'[Region] = "East"
)
  1. Use in Visual: Add this measure to a Card visual.
  • This measure will return the approximate count of distinct customers who made purchases in the East region.

When to Use

  • Large Datasets: This function is particularly useful when dealing with large tables that have millions of rows, where exact calculations could be slow and resource-intensive.
  • When Exact Values Aren’t Crucial: In many analytical contexts, an approximation of distinct counts is sufficient and preferred due to performance.

Limitations and Considerations

  • Approximation: As the name implies, this function provides an approximation, which may not always be 100% accurate.
  • Performance: It is optimized for speed and is generally much faster than the exact DISTINCTCOUNT function.
  • Use Case: Best used in scenarios where the slight variance introduced by approximation is acceptable, but speed is critical.

Conclusion

The APPROXIMATEDISTINCTCOUNT function is a useful tool for improving performance when working with large datasets in Power BI. By using it on columns like Customer ID or Product ID in the Superstore dataset, you can quickly get an approximate count of distinct values without compromising much on accuracy, but significantly improving the speed of your reports.

The error message “Approximate Distinct Count is not supported for this storage mode” occurs when the APPROXIMATEDISTINCTCOUNT function is used in a Power BI dataset or model that does not support this function due to the storage mode configuration. Here’s what this means and how you can resolve it:


Understanding the Storage Mode

Power BI supports different storage modes for datasets:

  1. Import Mode:
  • Data is fully imported into Power BI and stored in memory.
  • Most DAX functions, including APPROXIMATEDISTINCTCOUNT, are supported.
  1. DirectQuery Mode:
  • Queries are executed directly against the underlying data source.
  • Some DAX functions, including APPROXIMATEDISTINCTCOUNT, are not supported because they rely on in-memory computation.
  1. Composite Models (Mixed Storage):
  • A combination of Import and DirectQuery.
  • Function support depends on whether the table or column being queried is in Import or DirectQuery mode.

Why the Error Happens

  • The APPROXIMATEDISTINCTCOUNT function requires Import Mode.
  • If you are using DirectQuery Mode for your table (Sales in this case), the function will not work because it cannot execute directly on the data source.

Resolutions

1. Switch to Import Mode

  • If your dataset allows, change the storage mode for the table to Import:
    • Go to Model View in Power BI.
    • Select the table (Sales) and set the Storage Mode to Import.
    • Refresh the data to bring it into memory.

2. Use DISTINCTCOUNT Instead

  • If switching to Import Mode is not an option, use DISTINCTCOUNT instead of APPROXIMATEDISTINCTCOUNT. This function is supported in DirectQuery Mode, but it might have performance implications for large datasets. Example:
   Customer Count = 
   DISTINCTCOUNT('Sales'[Customer ID])

While this provides an exact distinct count, it may be slower for large datasets.

3. Hybrid Solution (Aggregated Table in Import Mode)

  • If you need the approximate distinct count for performance reasons:
    • Create a summarized or aggregated version of your table in Import Mode.
    • Use APPROXIMATEDISTINCTCOUNT on this imported table.

4. Review Storage Mode Compatibility

  • Check if your model uses a Composite Model (Mixed Storage).
  • Ensure that the Sales table is stored in Import Mode if you need to use APPROXIMATEDISTINCTCOUNT.

Checking and Changing Storage Mode

  1. Open Power BI Desktop.
  2. Go to the Model View.
  3. Select the table (e.g., Sales).
  4. In the Properties pane, find the Storage Mode setting.
  5. Change it to Import if it’s currently set to DirectQuery or Dual.

Best Practices

  • Use APPROXIMATEDISTINCTCOUNT only when working in Import Mode.
  • For large datasets in DirectQuery Mode, optimize your data model to minimize the need for frequent distinct count calculations.

By adjusting the storage mode or switching to an alternative function like DISTINCTCOUNT, you can address the issue and ensure your DAX calculations work effectively.