AI-powered analytics: Implementing LLMs for enhanced data insights

June 17, 2024

Table of Contents

Understanding AI-powered analytics with LLMs
Leveraging AI-powered analytics with LLMs for efficient data preparation
Advantages of integrating AI-powered analytics
Augmented analytics integration: Leveraging AI-powered analytics with Sisense and LLMs
Next steps in AI-powered analytics: Implementing strategies with LLMs

Understanding AI-powered analytics with LLMs

Since the arrival of OpenAI’s ChatGPT, there has been an explosion of Large Language Models (LLMs) that have transformed perceptions of what’s possible with Artificial Intelligence. A wide range of industries and use cases are adopting LLMs for their ability to drive productivity and deliver innovation across businesses. The pace of innovation has been profound, with models such as OpenAI’s GPT-4 and GPT-5, Google Gemini, Cohere, LLaMa, PaLM, Claude, and a vast array of emerging domain-specific models.

The first wave of LLM adoption focused on chat-based interactive conversational experiences, helping to summarize information, synthesize new content, and assist development. Now, there is a new opportunity: using LLMs to supercharge analytics data processes. This can enable data engineering teams to provide richer data and new ways for analytics end-users to slice, dice, and visualize information, all while reducing effort in the data pipeline. In this whitepaper, you’ll learn more about this opportunity, and how to reimagine your analytics approach by infusing LLMs to unlock the next level of value.

LLMs offer support for a vast array of use cases, including automating coding, generating customer service responses, providing language translation, and much more. An article in MIT Sloan Management Review recognizes the complementary nature of machine learning-based advanced analytics and the natural language capabilities of LLMs. It highlights opportunities for generative AI to address challenges in both the development and deployment phases of advanced analytics, encompassing predictive and prescriptive applications.

Automating enrichment and augmentation of analytics with new fields, segmentations, and classifications is a challenge that analytics professionals and operations teams face daily. Often, they find themselves grappling with hours of data wrangling or caught in a last-minute rush to enhance and supplement data when analytics end-users demand it urgently.

LLMs can play a pivotal role here. Acting as an on-demand resource for augmented information, they have the potential to alleviate much of the pain and effort traditionally involved in data preparation. In doing so, they empower teams to segment and dissect data in innovative ways, in real-time.

Estimates suggest that 70 to 80% of analysts’ time is spent on preparing data for analytics. LLMs can act as an on-demand resource for augmented information, having the potential to alleviate much of the pain and effort traditionally involved in data preparation.

However, applying an on-demand model is only feasible if the LLM you are using is tightly integrated with your data and analytics tools. Without this integration, the ease of self-service analytics diminishes rapidly. The good news is that it’s now possible to seamlessly integrate LLMs like OpenAI GPT to significantly boost your analytics. It’s easy to do, and we think you’ll find it’s a game-changer.

Leveraging AI-powered analytics with LLMs for efficient data preparation

Today, teams often spend way too much time in manual data prep and enrichment tasks. Estimates suggest that business and data analysts may spend as much as 70 to 80% of their time preparing data for analytics, with a significant portion dedicated to enriching and labeling data and generating new attributes to facilitate analysis. Introducing new fields to datasets for analysts to manipulate is often a laborious and challenging process. For example, consider a seemingly simple query:

What’s the breakout of my customers by industry?

It’s the kind of question that’s critical to making decisions regarding entering new markets and launching new products. However, what if the thousands of customers in the dataset weren’t segmented by industry in advance? What if “Industry” isn’t a field in the data or data model? For an analyst, this usually implies that you’re out of luck unless you can invest considerable time and effort.

It means going back to the data, adding external classification information to the data set, and reloading it in your data warehouse, and that’s incredibly time-consuming. In many cases, it requires manually matching with and reconciling third-party data sets, repetitive Google searches to tag customer records manually, and often painful, convoluted IF () and VLOOKUP () formulae in Excel or custom Python script. Not to mention updating your data model.

Worse, if teams subsequently need to classify and label data in a new way–such as by zip code or county, then it’s back to manually prepping the data again.

Advantages of integrating AI-powered analytics

An LLM is exceptionally well-suited for this kind of work, and it can do this kind of search, extraction, and reconciliation automatically and quickly. However, selecting the right moment to weave LLM-sourced information into your analytics experience is critical to making it truly useful. For example, if you collect this information during ETL, both storage and query costs will be high. Collect it too late, and you may struggle to insert it into the data model to enable the end-user’s expected BI functions.

The optimal approach involves collecting the requested data on demand and utilizing the LLM to suggest the ideal insertion point in a model, offering a highly responsive experience for the end-user. With this approach, end-users gain access to their own AI-powered, always-on personal data prep and data analyst bot as part of their analytics experience. They can easily extend and augment their data and data model by simply engaging with ChatGPT.

Using AI to assist in data preparation creates a massive opportunity for teams. While the LLM does the manual work of prepping and augmenting data, they are free to spend time on more value-added and strategic work. But more importantly, beyond freeing up time, there’s transformational value to be had―enabling teams to be much more fluid in their analytical queries, segmenting data in new ways, with the LLM doing the work behind the scenes.

Augmented analytics integration: Leveraging AI-powered analytics with Sisense and LLMs

A reference implementation of this strategy is available for Sisense Fusion, and is a perfect example of how to incorporate an LLM like OpenAI GPT into analytics processing. This implementation utilizes the flexible capabilities of Sisense Fusion to inject dynamic queries to OpenAI GPT as part of the query process. It automatically generates new database tables and relationships within the existing data model while leveraging guidance from GPT.

This reference architecture also demonstrates low-code app experience development, creating an interactive Q&A app via Sisense BloX. End-users can directly instruct GPT on how they want to augment the data set and model within their dashboards. Sisense and GPT then handle the rest.

To demonstrate how it works, consider the earlier question: “What’s the breakout of my customers by industry?”—when the “industry” attribute isn’t present in the original data model or data set. Sisense and GPT can be used to fill this gap.

Below is the example data with a list of accounts. Ultimately, we’re looking for the percentage split of customers by industry even though that attribute doesn’t exist in the data.

A table entitled “Big Accounts” with 4 columns. The source data for enhanced LLM-driven data.

Sample data list of accounts, excluding the attribute of “industry.” End-users can directly instruct the GPT on how they want to augment the data set and model within their dashboards, and then Sisense and GPT handle the rest.

To make it easy for the end-user to ask questions of GPT, Sisense BloX is used to build a simple interactive widget on a Sisense dashboard. This new UI assists in capturing what the end-user wants to request from GPT:

A user chat interface that asks the end-user to “please be specific what you request,” and has a field for the user to type in their data question.

This Sisense interface enables end-users to specify their requests for GPT, naturally and interactively.

In this example, GPT is told the end-user wants the industry segmentation for a list of companies in natural language. GPT is given the dashboard, widget, and column name, which ultimately contains the list of companies that need industry segment labels.

The UI executes a Python script that contains the question and a dynamically populated list of customer accounts formatted as JSON. This script is submitted to GPT, which then retrieves the industry for each customer. Alternatively, if we want to add GPT enrichment to the scheduled periodic data model refresh, simply executing Python rather than using the Q&A UI, we can do that too.

The result set is passed back from GPT to Sisense Fusion as a JSON document. The Python script creates new tables in the origin database based on the GPT result set and adds new relationships between tables in the model. Ultimately, it reduces data enrichment efforts from hours or days to mere seconds.

Ultimately, leveraging AI-powered analytics with Sisense reduces data enrichment efforts from hours or days to mere seconds.

In the example, GPT returns each customer and their associated industry label in JSON. Then, the script appends this as a new dimension to our data model and automatically joins our fact table of wins and losses with a natural join based on the account name.

A table with 2 columns. The output, or LLM-driven data, is now categorized by “industry.”

LLM-enhanced data. Using a GPT reduces data enrichment efforts from hours or days to mere seconds.

With the augmented data added to our data model, it’s simple to continue our analysis. For example, we can quickly create two new visualizations using Sisense Fusion that show us the distribution of our customers by industry.

A pie chart of LLM-driven data, showing the distribution of customers by industry.

Created using Sisense Fusion and a GPT, this pie chart shows the distribution of our customers by industry.

The whole integration allows analyses to be completed in seconds, rather than waiting days or weeks for an operations team to prepare and publish new information. With the interactive BloX widget, powered by GPT, augmenting and enriching any data using natural language is simple and efficient.

A software product can easily add GPT-supercharged analytics by embedding this BloX widget directly into the user interface. This integration provides end-users with greater flexibility by allowing the addition of augmented or ancillary data without requiring any involvement from the product vendor. As a result, this extends the product’s value and enhances its data capabilities.

This approach to on-demand data isn’t necessarily easy to implement in a business intelligence product. However, the Python compute kernel and the extensive API surface in Sisense Fusion allow direct query creation and dynamic updates to data models to get this kind of on-demand end-user experience. That said, just about any ETL engine can at least augment data at ingestion using this technique. Sisense’s reference architecture for GPT gives you a good starting point to try these techniques out for your own use cases.

With the interactive BloX widget, powered by GPT, augmenting and enriching any data using natural language is simple and efficient.

Next steps in AI-powered analytics: Implementing strategies with LLMs

The potential of LLMs has now expanded to supercharging analytics data processes, enabling data engineering teams to provide richer data and empowering analytics end-users to explore data in new and dynamic ways. Here are the steps to take to infuse AI into your data processes.

Identify pain points. Evaluate your current data preparation and enrichment processes to identify pain points and areas where LLMs can be leveraged for automation and augmentation.
Draft a plan. Develop a plan for incorporating LLM-powered augmented analytics into your existing processes and BI tools. Consider potential use cases where LLMs can significantly reduce manual effort and add strategic value to your analytics approach.
Careful LLM selection. Research and select LLMs that closely align with your use case and integrate seamlessly with your data and analytics tools. Explore reference implementations, such as the one provided for Sisense Fusion.
Test for integration. Test the integration of an LLM with your analytics tool, using the Sisense reference architecture for GPT as a starting point. Ensure that the LLM can provide on-demand augmented data and seamlessly integrate with your data model.

At Sisense, we’re working with companies every day that are at the forefront of AI and analytics, combining technologies to deliver powerful value. To learn how you can get started, schedule a conversation and demo with one of our AI and analytics experts today.

Whitepapers

3 Game-changing ways companies are using AI to transform analytics

Whitepapers

Augmented Analytics: the Future of Business Intelligence

Whitepapers

AI-powered analytics: Implementing LLMs for enhanced data insights

Understanding AI-powered analytics with LLMs

Leveraging AI-powered analytics with LLMs for efficient data preparation

Advantages of integrating AI-powered analytics

Augmented analytics integration: Leveraging AI-powered analytics with Sisense and LLMs

Next steps in AI-powered analytics: Implementing strategies with LLMs

3 Game-changing ways companies are using AI to transform analytics

Augmented Analytics: the Future of Business Intelligence

Guide to composable analytics: Five critical capabilities

Want the latest in analytics?

Sisense

Support

Resources