Skip to Main Content

Will AI data analytics tools assist or replace data engineers?

  • Blog
  • AI/ML

Discover how AI data analytics tools assist data engineers in seven key ways, including automating routine tasks and supercharging data prep and ETL processes.

Written By Maria Ciampa August 20, 2024

Artificial Intelligence (AI) and machine learning (ML) are shaping the future of every industry, and this is happening particularly fast in data and analytics. Humans losing jobs to robots has been the preoccupation of economists and sci-fi writers for almost 100 years. Shortly after November 2022, when OpenAI’s ChatGPT launched and grew to one million users in five days, AI became the next perceived threat to human jobs. With a recent McKinsey survey finding that 40% of respondents say their organizations will increase their investment in AI, the question is: which jobs are threatened by this increased investment?

Is AI replacing data engineering jobs?

Although humans losing jobs to robots is a familiar story, in reality, it’s far from the truth for data engineers. For example, AI can’t, on its own, source logic from numerous open-source packages or paid API services, connect disparate datasets, or fully maintain a data pipeline.

However, AI data analytics tools will enable data engineers to add more value to the business by enabling them to quickly tackle routine tasks like eliminating redundant data, filling in gaps in datasets, and pinging human engineers when anomalies arise. The demand for analytics is increasing at a frenetic pace, and for data engineers, AI-augmented data prep and ETL are the superchargers they need to meet that demand.

So it turns out that AI can act as a copilot that data engineers can leverage to deliver more value to the business, faster, and with the hunger for data across the business it’s come at just the right time.

Today’s powerful AI data analytics tools assist users in a broad range of ways. For this article, we’ll stay focused on AI for data prep and ETL.

There are several areas of the data preparation process where AI helps including:

  1. Recommending a data model structure
  2. Applying transformation rules to data
  3. Helping format data
  4. Improving data quality
  5. Monitoring your ETL and data prep process
  6. Using AI to identify outlier data
  7. Use AI to suggest how to improve your ETL process

In this article, we’ll discuss these data preparation steps in the context of augmented analytics. And you’ll learn where AI is set to disrupt analytics next, with the rise of conversational analytics, generative AI (GenAI) that can help analytics end users get answers using their own AI-powered copilot.

AI and the data pipeline

A well-structured data pipeline is a thing of beauty, seamlessly connecting multiple datasets to a business intelligence tool to allow clients, internal teams, and other stakeholders to perform complex analyses and get the most out of their data.

Data engineers thrive on thought-provoking challenges: bringing terabytes of data from wherever it lives to where it can be analyzed, transforming it using various libraries and services, and keeping the pipeline stable. However, the data preparation phase of the whole process can be extremely time-consuming and onerous. Data engineers’ time is often spent writing scripts, and even performing manual data manipulation tasks, and this can take up countless hours. This creates a roadblock to fueling the business with its increasing data demands.

Today, the way to solve this challenge is by using AI/ML-powered augmented analytics. It can be used to incorporate AI/ML to automate data preparation, insight discovery, and sharing. It also automates data science and ML model development, management, and deployment.

Fueling a cloud data warehouses using AI-powered ETL and data prep

The rise of the cloud, and cloud data warehouses has changed the way companies treat their data. In the past, well-organized databases were needed to keep records in order. Today, data comes from a wider array of different sources and more variety than ever before. The rise of the cloud has meant app sprawl across marketing, sales, finance, and service, all housing data that can be providing analytical insights, if only it’s integrated into a data warehouse. Social sources have increasingly become essential for marketing analytics. While the variety of data has changed too, from user-generated to machine-generated, through to unstructured sensory data, it can all be used for analytics, but only if it is formatted, cleansed, and enriched the right way.

And now, companies are increasingly using third-party data to enrich their business logic, from using it for benchmarking to analyzing the impact of currency fluctuations to answering questions like how the weather forecast might impact sales.

With the ability to stand up cloud data warehouses faster than their older counterparts, the clock is ticking to often fuel with the broad range of data as quickly as possible so that business teams can begin with analytics.

Where your ETL and data prep processes will benefit from AI

The saying “data is the new oil” gets tossed around enough to have already become a cliche, but for purposes of our discussion, it’s an especially apt metaphor. Most companies are sitting on huge stores of data, but in its unprocessed form, it’s not very useful. Even worse, analyzing non-normalized data boils down to potentially harmful and misleading results. To continue with the oil metaphor, you need a stable and reliable pipeline to take your data from where it’s stored to where it’ll be processed so that its true value can be harnessed.

While you’re moving that data, data engineers can digest it so that it’s closer to being in a usable state by the time it hits the BI system. BI platforms are already using AI to help with the ETL process in a variety of ways.

A strong AI analytics system can act as a second set of eyes for a busy data engineering team, freeing them to focus on the challenges that drive more value faster to the analytics team, and ultimately the business.

Let’s walk through seven key areas AI can assist your ETL processes to drive analytics.

Data model structure

AI assistance can recommend a data model structure, including which columns to join, and which to compound, and maybe even create dimension tables to facilitate the fact table joins.

Apply transformation rules to data

AI can apply simple rulesets to help standardize the data by doing things like making all text lowercase and removing blank spaces before and after values.

Train AI to format your data

If you already have a perfectly formatted dataset to use as a learning dataset, AI can be trained, using this formatted dataset, to recognize how the larger dataset should look. This allows it to take a holistic approach to cleaning and frees you from repeatedly prompting it to do specific tasks.

Improving your data quality

As AI learns how you want your data to look, it can even scan all the columns and make recommendations as to what to fix, implement active learning, or go ahead and fix errors on its own, such as removing redundant records (deduplication caused by misspelling, for example) or using context clues to fill in missing values.

Use AI to monitor your ETL process

While you’re moving your data into your BI system, the big chance for an AI assist is in monitoring the process. If a load fails, or exceeds the normal time threshold or the forecasted one, the AI can learn that and ping the engineer to let them know there’s a problem. A sudden change in the volume of data being loaded could also be worth a mention so that the engineer can look into it and see if there’s a larger problem.

Using AI to identify outlier data

Outlier detection is yet another task that an AI system can be designed to handle. For data engineers dealing with large volumes of not-quite-perfect data, this is a task that AI can take over, freeing them up for work only a human can do.

The AI can monitor tables as they are created and new data gets loaded, and check the outputs. As the AI scans the values within a column, it can test for things like uniqueness, referential integrity (to values that are keys in other tables), skewed distribution, null values, and accepted values. To summarize: the AI can check the whole table and, based on a series of rules applied to it, ask the question, “Does this column look correct?” If the AI determines that one of the rules applies and that the column values do not meet the rule’s conditions, then it would send an alert to the engineers.

Use AI to suggest how to improve your ETL process

Some other tasks an AI can assist with include showing you which joins are occurring most frequently across your model and suggesting pre-aggregation. This is useful for speedier queries down the road.

AI can also be used to scan columns and test for uniqueness. For example, if every value needs to be unique, like an ID column for all your Salesforce accounts, and there are two different users with the same account ID, then the AI can call that out. For purely numerical data, AI can identify outliers that might indicate improperly entered data. Either way, the AI is once again an extra set of eyes, recommending actions, and surfacing the results to human data engineers only when necessary.

Net net: AI enables your data engineering tasks to flow faster

The process of extracting, transforming, and loading (ETL) combines data from multiple sources into a data warehouse. This all adds up to AI can be a powerful time saver for data engineers, freeing them up from rote tasks so they can deliver data faster, and focus on driving even more value to the business. Tasks like removing duplicate records, filling blank values, formatting other data, and fixing data quality are all perfect ways to apply AI to streamline data pipelines. For data engineers, it means more time to focus on enhancing data further, such as applying clustering and segmentation to the data or preparing the data to train AI models.

No matter how your data is stored, the right AI data analytics tool can help get it into better shape when you create your single source of truth; it can also help as you load your data into your cloud data warehouse, BI platform, or data science tool. The Sisense cloud provides a secure, high-performance environment that enables you to focus on your businesses, without having to manage the technical aspects.

Sample some other ways to use AI for data prep: Download the guide

Beyond ETL: Conversational analytics with GenAI

Thinking beyond data preparation, using large language models (LLMs) for conversational experiences around analytics provides a new way to engage end users in their everyday analytics. Increasingly, everyone expects a “copilot” in their application that they can engage with. If you’re providing analytics in your app (or planning to), you may want to consider a conversational analytics copilot for your users. That way, they can ask questions about their data and get answers and explanations.

And if you’re a data engineer, promoting easier self-service for end-users is a win-win. First, it means happier end-users—enabling them to more easily answer their own queries on-demand. Second, for analytics experts or data engineers, it helps them with one-off analytics asks. And that means everyone can work better, faster, and smarter.

At Sisense, we’ve made it easy for developers to add conversational analytics based on LLMs to their apps using a composable GenAI Analytics Chatbot (beta). Using Sisense Compose SDK, engineers and developers can customize GenAI experiences and mix and match GenAI analytical building blocks using flexible React Components and APIs.

For more on AI for analytics, read the whitepaper Unlocking end-to-end AI for analytics: From ML to GenAI.

To learn more, schedule a demo.

Want the latest in analytics?