10 techniques to boost your data modeling
- Blog
- BI Best Practices
With new possibilities for enterprises to easily access and analyze their data to improve performance, data modeling is morphing too. More than arbitrarily organizing data structures and relationships, data modeling must connect with end-user requirements and questions, as well as…
- 1. Understand the business requirements and results needed
- 2. Visualize the data to be modeled
- 3. Start with simple data modeling and extend afterwards
- 4. Break business enquiries down into facts, dimensions, filters, and order
- 5. Use just the data you need, rather than all the data available
- 6. Make calculations in advance to prevent end user disagreements
- 7. Verify each stage of your data modeling before continuing
- 8. Look for causation, not just correlation
- 9. Use smart tools to do the heavy lifting
- 10. Make your data models evolve
- Better data modeling leads to greater business benefit
With new possibilities for enterprises to easily access and analyze their data to improve performance, data modeling is morphing too. More than arbitrarily organizing data structures and relationships, data modeling must connect with end-user requirements and questions, as well as offer guidance to help ensure the right data is being used in the right way for the right results. The ten techniques described below will help you enhance your data modeling and its value to your business.
1. Understand the business requirements and results needed
The goal of data modeling is to help an organization function better. As a data modeler, collecting, organizing, and storing data for analysis, you can only achieve this goal by knowing what your enterprise needs. Correctly capturing those business requirements to know which data to prioritize, collect, store, transform, and make available to users is often the biggest data modeling challenge. So, we can’t say it enough: get a clear understanding of the requirements by asking people about the results they need from the data. Then start organizing your data with those ends in mind.
2. Visualize the data to be modeled
Staring at countless rows and columns of alphanumeric entries is unlikely to bring enlightenment. Most people are far more comfortable looking at graphical representations of data that make it quick to see any anomalies or using intuitive drag-and-drop screen interfaces to rapidly inspect and join data tables. Data visualization approaches like these help you clean your data to make it complete, consistent, and free from error and redundancy. They also help you spot different data record types that correspond to the same real-life entity (“Customer ID” and “Client Ref.” for example), to then transform them to use common fields and formats, making it easier to combine different data sources.
3. Start with simple data modeling and extend afterwards
Data can become complex rapidly, due to factors like size, type, structure, growth rate, and query language. Keeping data models small and simple at the start makes it easier to correct any problems or wrong turns. When you are sure your initial models are accurate and meaningful you can bring in more datasets, eliminating any inconsistencies as you go. You should look for a tool that makes it easy to begin, yet can support very large data models afterward, also letting you quickly “mash-up” multiple data sources from different physical locations.
4. Break business enquiries down into facts, dimensions, filters, and order
Understanding how business questions can be defined by these four elements will help you organize data in ways that make it easier to provide answers. For example, suppose your enterprise is a retail company with stores in different locations, and you want to know which stores have sold the most of a specific product over the last year. In this case, the facts would be the overall historical sales data (all sales of all products from all stores for each day over the past “N” years), the dimensions being considered are “product” and “store location”, the filter is “previous 12 months”, and order might be “top five stores in decreasing order of sales of the given product”. By organizing your data using individual tables for facts and for dimensions, you facilitate the analysis for finding the top sales performers per sales period, and for answering other business intelligence questions as well.
5. Use just the data you need, rather than all the data available
Computers working with huge datasets can soon run into problems of computer memory and input-output speed. However, in many cases, only small portions of the data are needed to answer business questions. Ideally, you should be able to simply check boxes on-screen to indicate which parts of datasets are to be used, letting you avoid data modeling waste and performance issues.
6. Make calculations in advance to prevent end user disagreements
A key goal of data modeling is to establish one version of the truth, against which users can ask their business questions. While people may have different opinions on how an answer should be used, there should be no disagreement on the underlying data or the calculation used to get to the answer. For example, a calculation might be required to aggregate daily sales data to derive monthly figures, which can then be compared to show best or worst months. Instead of leaving everyone to reach for their calculators or their spreadsheet applications (both common causes of user error), you can avoid problems by setting up this calculation in advance as part of your data modeling and making it available in the dashboard for end users.
7. Verify each stage of your data modeling before continuing
Each action should be checked before moving to the next step, starting with the data modeling priorities from the business requirements. For example, an attribute called the primary key must be chosen for a dataset, so that each record in the dataset can be identified uniquely by the value of primary key in that record. Suppose you chose “ProductID” as a primary key for the historical sales dataset above. You can verify that this is satisfactory by comparing a total row count for “ProductID” in the dataset with a total distinct (no duplicates) row count. If the two counts match, “ProductID” can be used to uniquely identify each record; if not, look for another primary key. The same technique can be applied to a join of two datasets to check that the relationship between them is either one-to-one or one-to-many and to avoid many-to-many relationships that lead to overly complex or unmanageable data models.
8. Look for causation, not just correlation
Data modeling includes guidance in the way the modeled data is used. While empowering end users to access business intelligence for themselves is a big step forwards, it is also important that they avoid jumping to wrong conclusions. For example, perhaps they see that sales of two different products appear to rise and fall together. Are sales of one product driving sales of the other one (a cause and effect relationship), or do they just happen to rise and fall together (simple correlation) because of another factor such as the economy or the weather? Confusing causation and correlation here could lead to targeting wrong or non-existent opportunities, and thus wasting business resources.
9. Use smart tools to do the heavy lifting
More complex data modeling may require coding or other actions to process data before analysis begins. However, if such “heavy lifting” can be done for you by a software application, this frees you from the need to learn about different programming languages and lets you spend time on other activities of value to your enterprise. A suitable software product can facilitate or automate all the different stages of data ETL (extracting, transforming, and loading). Data can be accessed visually without any coding required, different data sources can be brought together using a simple drag-and-drop interface, and data modeling can even be done automatically based on the query type.
10. Make your data models evolve
Data models in business are never carved in stone because data sources and business priorities change continually. Therefore, you must plan on updating or changing them over time. For this, store your data models in a repository that makes them easy to access for expansion and modification, and use a data dictionary or “ready reference” with clear, up-to-date information about the purpose and format of each type of data.
Better data modeling leads to greater business benefit
Business performance in terms of profitability, productivity, efficiency, customer satisfaction, and more can benefit from data modeling that helps users quickly and easily get answers to their business questions. Key success factors for this include linking to organizational needs and objectives, using tools to speed up the steps in readying data for answers to all queries, and making priorities of simplicity and common sense. Once these conditions are met, you and your business, whether small, medium, or big, can expect your data modeling to bring you significant business value.