Event data tells a detailed and captivating story about how people behave in the world. And that story is written in the language of structured logging. This blog post will walk you through Interana’s approach to event logging and how that language can be elegantly used for storytelling about human beings and how we act.

My Morning Routine, Expressed As Event Data

Event data is a representation of human or system behavior through time. It consists of three main elements; (1) who I am, (2) what I did, and (3) when it happened. For example, I could gather event data on my morning routine for a particular day:

jeff_logging_2

My Morning Routine, Expressed in JSON Format

Interana recommends JSON format for event data logging. I’ll explain why in a moment, but first let’s take a look at my morning routine events as they would be captured in Interana’s preferred JSON format:

jeff_logging_3

Newline-separated JSON  The first thing to notice is that every event is expressed as a JSON object on a line by itself (this format is sometimes called “newline-delimited JSON” and more information is available at jsonlines.org).

Required Properties  The next thing to observe is that the three key properties (timestamp, user and action) are part of every single log line. These properties are critical to Interana, and really to event data itself, since they form the core of the event, and they are mandatory on every log line. In terms of data types for these key properties, the “user” may be a string, number or hexadecimal identifier. For the “timestamp”, we recommend that you format your timestamp data according to the ISO-8601 standard. For example, 2015-10-05T14:48:00.000Z, which has a format string of %Y-%m-%dT%H:%M:%S.%fZ. We support other time formats as well, with the main recommendation being that whatever format you choose, it should be consistent across all log lines. Finally, we recommend that the “action” property should be a human-readable string that expresses the action taken.

Optional Properties  Now let’s focus on the optional properties of each event. The “brush_teeth” event tells us which toothbrush I used. The “shower” event tells us how long I sang in the shower that day. The “eat_breakfast” event tells us which foods I ate that day. These properties only appear for the type of event where they are relevant. Aside from the three key properties in the prior section, Interana is schema-less and can automatically detect these optional properties and their datatypes as they appear.

Why JSON Format?

Auto-Detection Of New Attributes  At Interana, we recognize that all logging evolves over time, as products and services evolve, and as new questions are asked about user behavior. Because every JSON log line carries information about the name and datatype of each logged attribute, Interana is able to auto-detect new attributes in the data without the need for manual intervention.

Self-Describing Datatypes  Since JSON format has different syntax for numbers versus strings, we get a hint that we should treat “alarm_model” as a string (meaning the field supports typeahead and grouping but not mathematical aggregations) and “min_of_singing” as a number (meaning the field supports mathematical aggregations but not typeahead).

Optimized For Sparse Data  JSON is a good fit for the sparse nature of typical event data. In some cases there might be more than two thousand potential event attributes (across all event types) but only 15-20 of them appear in any individual event type. For example, in my morning routine I am capturing a list of foods during the “eat_breakfast” event and the details of my toothbrush during the “brush_teeth” event, but each of these attributes only exist for events of the appropriate type. JSON can represent this kind of sparse data flexibly, because the attributes can appear in any order, and be present or missing in any row, without diminishing our ability to read each value into its correct place.

(Relatively) Human Readable  Sometimes it is necessary to triage a logging problem, and JSON format is relatively easy for a human being to read because it’s a text format and the name-value pairs are co-located so you don’t need to look elsewhere in the file to find the schema.

Interana Data Types

Interana supports a few different core data types, and below is a quick mapping of what each looks like in JSON log format, what we call the datatype in Interana, and how it can be used within Interana queries.

jeff_logging_4

There are a number of scenarios where JSON can encode more complex information (e.g. nested objects or arrays of objects) or special purpose data types (e.g. URLs, user-agents, hex identifiers). Interana provides support for transforming these complex data types into the primitives described above.

Providing Your Logs To Interana

At Interana, we typically use a batch file import architecture, which means our customers upload files on a regular cadence to a shared storage repository (Amazon S3, Azure Blob Storage, or a Unix file server) that Interana can access. While uploading logs to a shared storage repository introduces some latency into the system (on the order of a few minutes) it has the benefits that (a) logs can be re-ingested with different data transforms if you discover a problem with the data, and (b) it’s possible to triage logging issues closer to the source. Here are a few details about how we recommend you handle such uploads.

Time-based directory structure  Interana recommends uploading files into a time-based directory structure like mydata/{year}/{month}/{day}/{hour} which allows for efficient iteration through the repository looking for new files.

Upload frequency and file size  Interana recommends uploading files roughly every 1-5 minutes, and allowing each file to reach at least 1GB in size before splitting it into multiple smaller files. In other words, it’s better to give us one 1GB file every minute than 1000 1MB files every minute. Similar to the above point, this allows us to iterate through the directories more efficiently looking for new or modified files.

My Morning Routine, What Would You Like To Know?

Suppose that we collected event data on my morning routine for an entire year. We logged it in newline-delimited JSON format, and imported it into Interana. Let’s take a look at a few different views of the data.

Events From A Single Day

First of all, here’s a Time view of a single day of these events in Interana. Note that we’re looking at a 1-minute granularity, so you can see individual actions as tall “bumps” in the graph:

jeff_logging_1

All Events Over A Full Year

Here is a Stacked Area view of all events over the full year. The total number of events varies over time since some parts of my morning routine don’t always happen. (I only sometimes hit the snooze button.)

jeff_logging_5

How Long Did It Take For Me To Get Ready For Work?

It would be interesting to know how long it usually takes from when I get out of bed until I actually leave for work. We can compute this by creating a Session. We are only going to include events “get_out_of_bed” and “leave_for_work”, and only compute this on weekdays (since I don’t regularly leave for work on weekends). We are going to start a new session after 4 hours of inactivity, since I know there will be at least 4 hours between when I leave for work and get out of bed the next day.

jeff_logging_6

Once the Session is created, I can use a Number view to see a simple average of the session duration…

jeff_logging_9

…or I can use a Distribution view to get more detailed information.

jeff_logging_8

Wow. So there were a few days where the time between getting up and leaving for work was less than 10 minutes. That makes me pretty curious about what happened, so let’s drill in from this view into a Samples view of that one bar. This view shows us the raw data rows that produced the graph above, and indeed, we can see that there were four days in the last year where I practically jumped out of bed and went right to work.

jeff_logging_10

Wrapping Up

All right, you got me. My morning routine example is oversimplified since real world logging looks at behavior across a large number of users or systems. And we may not care whether I brush my teeth before or after my shower (just as long as I shower). But hopefully it provides a good mental model for what event data is, how it can be elegantly expressed in JSON log format, why we prefer JSON, and just how powerful the stories are that we can tell based on clean and simple event logging plus a purpose-built event data analytics solution like Interana.

P.S. If you’d like to learn more about event data, check out our e-book Understanding The “Event” In Event Data.