I’m a big fan of data and exploratory data analysis (EDA). I find it helpful to understand the data that is being responsibly collected and analyzed before making decisions that will impact operations or cost a lot of time and money. While I imagine a lot of you are nodding your heads at this, not all companies are collecting or using the right data to inform their decisions.
I use the word responsibly a lot in this post. It is important. I am talking about the practice of data governance. If you have solid practice across:
- ensuring high data quality
- delivering effective role-based access
- making it as secure as possible
AND you employ people who are responsible for its creation, maintenance, and retirement, then you practice data governance. You are responsibly acquiring, using and disposing of your data.
Where you are vs. Where you’re going
Data is harder to collect and use than most people think. People are always convinced they either have more data than they really do or that the quality of their data is better than it is. Part of the difficulty is in ensuring you are collecting and managing data responsibly and sustainably. A lot of it is because of design and architecture decisions that are made when teams are just getting started.
If you’re just getting started, no pressure, right? If you’re dealing with other people’s choices, technical debt is a real cost of business, and it will demand a percentage of your annual budget. Everyone know where you’re at? Cool, let’s go.
Explore and learn
I have spent a lot of years in the data analytics market, and if you invest in the basics of good data management systems and processes and hire some great data scientists/analysts, you are going to waste less money and resources chasing the wrong things. If you are lucky enough to already have a data team that is actively and responsibly collecting, managing and analyzing your data, build a great relationship with them.
What do you know about the data you have?
Performing some exploratory data analysis on the data you have will help you understand a lot and show you where the gaps in your data lie. You can spot trends and patterns or anomalies. You can assess the quality of your data. You can test some of your ideas to see if they’re worth the spend. Sit down with your data pros and tell them what you want to know. Ask them to model it. If you have access to the data you need, go ahead and perform your own EDA and figure out what’s missing/low quality.
What’s missing and do you really need it?
You may find that what you have isn’t enough or the right type of data to give you the answers you are looking for, now what?
- Start with the questions you can’t answer. Determine if you have data that can infer responses or if you need new data. Assess the importance of these questions to your strategy, the amount of data needed for reliable analysis, and whether your infrastructure can securely handle this data.
- Not all data are created equal. Some of it is expensive to collect and/or manage. For example, you want your customers to tell you what they think on a regular basis. The sustained outreach and reward campaigns you will need to make direct responses usable, while respecting the privacy laws of your customers’ home countries, and the infrastructure needed to secure the data and enable analysis require a significant ongoing investment.
Work with your data team or data analytics service provider to build the data model you need to answer your most important questions. Don’t forget to understand and account for the impact and costs this will foist upon your existing teams and infrastructure as well as any new costs you may incur.
Data are investments. Invest wisely.
No matter what, data costs money. All those articles that call data a precious commodity that people can make a lot of money with rarely discuss the real costs to collect, manage, and use it responsibly and effectively. This is especially true in the age of GDPR and data sovereignty (where data is stored and how it is accessed and processed). Ensuring your systems and processes are compliant will save you a lot of headaches and fines. Demonstrating to your customers and partners that you are a dependable steward of their data, keeping it secure and not abusing their privacy, is simply good business.
A clear path to the Promised Land
It is usually at this point that someone yells, “Gen AI can do all of it. I don’t need to mess about with this.” Gen AI (or what I call packaged LLMs) is a fun tool to do searches on general topics, sort your email, or create chat bots. It is not a reproducible resource to guide your strategic business decisions. Look at how much stolen data and human input the current models required for training and testing, but they are still unreliable narrators. There are some good decision bots that can automate processes based upon defined and tested thresholds, but you’re not going to make strategic decisions based solely upon the output of a bunch of bots highly tuned for a specific task either.
Starting with EDA, building good models, and securing the right data will lead you to a system that can provide reproducible insights that will lead you to making better decisions. There is no shortcut. It will cost money. Best to do it right and reap the rewards.
Comments