The concept of “empty calories” in food can also be applied to observability data sent to Splunk, which refers to log data that takes up storage and processing power but does not provide actual value to the health of a business. While Splunk is a leading provider of observability experiences in the industry, its comprehensive features put tremendous pressure on capacity and budget, and its customers aim to maximize their investment by getting the most valuable and diverse set of data into Splunk for analysis. However, the presence of “empty calories” in log data often forces trade-offs between costs, flexibility, and visibility.
Reducing the volume of less valuable data can dramatically improve Splunk utilization and reduce operational expenses. Using an observability pipeline like Cribl Stream can help administrators reduce less valuable data by following five simple steps: filtering out duplicate and extraneous events, routing raw, unfiltered data to an observability data lake for later recall, trimming unneeded content/fields from events, condensing logs into metrics, and decreasing operational expenses.
Filtering out extraneous data that is not contributing to insights is the easiest option to get the most out of a Splunk investment. Administrators can employ a simple filter expression to reduce the unneeded data destined for Splunk in the first place. This filtering can be configured based on meta-information, such as hostname, source, source type, or log level, or by content extracted from the events or both.
Dropping, sampling, dynamic sampling, and suppression are different filtering options that administrators can use to reduce the number of events sent to Splunk. Dropping involves discarding or routing 100% of this type of data to a cheaper destination, while sampling sends only one out of a defined sample set of many similar events to Splunk. Dynamic sampling sends low-volume data of this type to Splunk, but as volume increases, sampling begins, and suppression limits the number of copies of this type of data delivered in a specified time period. By reducing the number of events sent to Splunk, filtering out “empty calories” creates space in Splunk for a greater diversity of sources, leading to more accurate analysis and better business value.
Indexed data takes approximately four times more space to store than raw machine data. Hence, sending data to Splunk means it takes 12 times more resources to store it there compared to inexpensive object-based storage of raw, unprocessed data, making “empty calories” especially costly. Therefore, an observability strategy that includes an observability data lake and an observability pipeline with the capability to replay data from that raw data object storage is vital. Using Cribl Stream, administrators can “multi-cast” data from any source to multiple destinations, allowing pre-processed data to be routed and stored in less expensive object storage.
Using Cribl Stream can help administrators reduce the volume of “empty calories” in observability data sent to Splunk and maximize the return on their investment in Splunk infrastructure. By filtering out extraneous data that is not contributing to insights, routing raw, unfiltered data to an observability data lake for later recall, trimming unneeded content/fields from events, condensing logs into metrics, and decreasing operational expenses, administrators can improve Splunk utilization and reduce costs.