How to Run an Effective Data Science POC in 7 Steps
Introduction
Data Science projects are complex and always inherit the risk of not being feasible. As opposed to large-scale IT projects though, there are ways to quickly act on ideas in a sandbox environment. This allows for a fail-fast-approach and enables companies to sustainably allocate their resources towards those projects, which will reliably create value — may it be through the optimization of processes, enabling new services or increasing customer loyalty.
A proof of concept (POC) is a popular way for businesses to evaluate the viability of a system, product, or service to ensure it meets specific needs or sets of predefined requirements. POCs should prove the larger value of a system, ensuring it’s aligned with forwarding the company’s longer-term strategic objectives.
What does running a POC mean in practice specifically for data science? When it comes to the evaluation of data science solutions, POCs should prove not just that a solution solves one particular, specific problem, but that a system will provide widespread value to the company: that it’s capable of bringing a data-driven perspective to a range of the business’s strategic objectives.
From my internship and consultant experience along with my own research, the number of POCs executed by companies keeps increasing in the race to implement data science for competitive advantage. While the industry might indicate a very high “success” rate, the number of POCs that have successfully translated into production is not that obvious. I recently read a very detailed white paper from Dataiku, which lays out the 7 essential elements to keep the project on track for an efficient, effective, and most of all successful POC. I want to share these elements in this post as a way to raise more awareness about this issue for new data scientists entering the field.
1 — Concrete Use Case
The first, and possibly most important, step to running a successful POC is choosing a use case. Without this, a POC simply can’t exist. To hone in on a use case for the POC, you start with a list of critical business issues from which to choose, possibly soliciting feedback and ideas from teams across the company for a variety of use cases. Then you can look at them and determine:
What is the current process?
Would the use of data in general or data science/machine learning techniques specifically help with this business issue, and if so, how?
Do we have the data to use this for a POC?
Where is the data stored and how can it be accessed?
Are we willing to work on this use case with an external partner?
Will this use case help me make money, save money, or do something I can’t do today?
2 — Reasonable Deadline
In general, a maximum of 60 days is sufficient for a POC because it allows for proper evaluation without taking too much time away from staff who are balancing other ongoing work and projects.
For small and medium-sized companies, it’s usually possible to choose a use case that can be fully fleshed out and completed (including deployment into production) during this two or three-month time frame. This might mean having modest goals, foregoing the most complex or largest problems in favor of a straightforward problem with the possibility of large impact.
For larger companies, that may have more overhead and processes, this might not be possible. But instead of expanding the time of the POC, which again, can tie up valuable resources for longer than desired, it’s a good idea to separate the project and run smaller, more contained tests in parallel with each team involved. In other words, work on small parts of a larger problem at the same time rather than choosing to tackle the whole problem in a longer POC.
3 — Clear Deliverables
Of course, one of the most important factors in restricting a POC to a reasonable timeframe is the presence of clear deliverables. Because without them, the process can drag on, as no one is really sure what to consider done or what to consider a success (or when).
Ideally, the final deliverable is putting a data project based on the selected use case into production. But setting up deliverables along the way as well for individual teams evaluating their subset of the project can also be helpful checkpoints to keep the POC moving along.
4 — Right Individuals
To run a successful, efficient POC, people will need to be involved in all parts of the organization. The data scientists and/or analysts, of course, will necessarily be connected the most to the project. But also the IT team will need to test the solution’s ability to be put into production, any business teams involved with or impacted by the results of the project should be involved (and even in the same room sometimes!), as well as end-users of the solution, etc. Going back to the churn example, since the end “customer” in this case is the marketing team, they would need to be a part of the POC in addition to the data scientist(s) and analyst(s).
One mistake that teams often make in running a POC is that, in an effort to lower the overall impact on the work of people across the company, they don’t involve all relevant stakeholders. While well-intentioned, this is a mistake. Firstly because it doesn’t achieve one of the primary purposes of the POC: taking the first step toward becoming a data-driven organization from the core and instead of isolating the job of data analysis and data projects to just one small group of people. But secondly, it doesn’t allow the results of the POC to be evaluated accurately if the teams most impacted by the project don’t have a say.
Another mistake is, of course, swinging too far in the other direction and involving too many people, which can slow down progress and efficiency There is no need to run the POC with every single person who will ultimately be using or impacted by a particular project — a few representatives from each team or group is generally sufficient.
5 — Thinking Production
Data science and data projects shouldn’t happen in a vacuum, so neither should a POC. It’s critical to actually integrate the POC into the operations of the company. If your company isn’t doing this now for other data and analytics projects (or doesn’t do it well), a POC is a great place to start.
It all goes back to the what and why: the goal of a POC isn’t just to complete one simple project. Rather, it is to open up the floodgate of data value possibilities so that the platform can continue to deliver business insights, project after project, even after the POC is over. And in order to deliver that value, projects (including the use case for the POC) need to actually go into production and not get stuck in a prototyping or sandbox phase.
Going into production means implementing a solution in a real-life environment. For a recommendation engine in e-commerce, that means getting the recommendation engine up and running in the online store. For a churn prediction model, it means having an automated solution for getting churn information to the team that needs it (likely the marketing team) at a regular cadence. Going one step further, it could also mean further automating the churn prevention processes (e.g., firing off emails with discount codes or other special promotions to those predicted to be churners).
6 — Ensuring Autonomy
Often, a POC affords companies the opportunity to work with experts in the field with lots of experience in getting data projects off the ground and into production. This is a huge advantage, and you should definitely take the opportunity if it’s available. No matter how simple a product seems, working with experts (likely the product’s sales and/or technical teams) comes with the added advantage of learning from other companies on what works and what doesn’t. Remember, the experts do this all the time, so they can help you avoid basic mistakes and guide you into the best possible outcomes.
But working with the expert(s) also comes with the risk of lacking autonomy once the POC is over. If your staff isn’t comfortable with all the elements of the POC and the data project, once the POC is over, they will not be able to deliver the same value on their own. So while it’s important to take advantage of expertise and guidance, it’s equally important to ensure staff from all teams involved with the POC are fully trained and autonomous on each of its elements.
7 — Agility and Focus
A POC begins with a specific use case, but aside from that, there is no clear or pre-defined solution to the business problem at hand (that’s what you’ll be looking for during the POC process). Delving into your data can turn up interesting results, especially with the guidance of outside experts who may bring a fresh perspective.
In the POC process, the best results come from teams that are both agile — eager to pivot in new directions they didn’t foresee — but also focused and not straying too far away from the original problem when interesting insights inevitably come up.
Ultimately, though you’ll have experts guiding you and you are likely testing a product that you have not yet purchased, drive the POC effort as you own it. Even if you don’t end up using the solution, teams should still get something out of the experience — have your mindset on gaining value regardless of the outcome.
Conclusion
By forgetting any of these 7 elements, you risk frustrating the involved stakeholders for a negative perspective fo the project that can bias decision-making. Furthermore, you might also risk evaluating the results of an incomplete effort that was doomed to failure from the start, thus wasting time and money. Adhering to these 7 steps will allow organizations to move from POC to implementation quickly and lead to more time and money being saved.
I hope that this article has helped you better understand how to make an AI PoC successful. I’d highly you to check out other materials from Dataiku as well. They have a lot of good content surrounding the advanced analytics ecosystem.