In this blog, I’m going to talk about filters within metrics and within explorer by using a very simple dataset as shown below:

sample data set

My dataset has 11 events comprised of two user_ids (my shard key) with several different event_names. All events happened between 5AM and 5:15AM PDT.

I’m going to create a metric to count the number of purchase events for each user_id as follows:

Here’s how I like to think about what this metric does:

– Look over all of the events in the query time frame

– Find all user_ids who were present in this period of time

– For each of the user_ids, count the number of events where event_name is purchase.

– Associate each user_id’s result with their events over the query time frame, including events for that user which don’t match the metric filters.

Further explanation of the last step is below.

Okay, now let’s take a look at the metric in action.

This query says: Look at all the events from 11/15/2016 12AM to 11/16/2016 12AM and count the number of events where the User Purchase Events metric value associated with that event is greater than 0.

If you’re thinking “Wait a sec, there was only 1 purchase event! Why are there 6 events showing up?” Let’s go back to the metric calculation steps:

Here’s how I like to think about what this metric does:

– Look over all of the events in the query time frame

       (11 events total)

– Find all user_ids who were present in this period of time

       (2 user_ids, 1 and 2)

– For each of the user_ids, count the number of events where event_name is purchase.

       (0 events for user_id 1, 1 event for user_id 2 )

– Associate each user_id’s result with their events over the query time frame, including events for that user which don’t match the metric filters.

I like to visualize this last step in the metric calculation as if we were adding in an extra column to the data like so:

Since user_id 1 has User Purchase Events equal to 0 in this time range, all of user_id 1’s events in this time range will have a User Purchase Events value of 0.

Similarly, since user_id 2 has User Purchase Events equal to 1 in this time range, all of user_id 2’s events in this time range will have a User Purchase Events value of 1.

Now, If we group by user_id and event_name:

We’ll see that we’ve captured all of user_id 2’s events. This makes sense since user_id 2 is the only one who had a purchase event in this time frame.

Now, if we were to go ahead and add an additional filter that says event_name is one of purchase, this would look over all events which have a User Purchase Events metric value greater than 0 and where the event_name is purchase.

Hope this helps!