While data science projects are different from software engineering projects, teams need to trust that the end result will be better when done together.
Data Scientists and Agile Teams
Article Jun 20, 2020
Rob Keefer
Data science is often more similar to a research project than software engineering. In fact, many Data Scientists find themselves spending weeks performing data engineering tasks before data science can begin. Different techniques and approaches will lead to both dead ends and surprising insights. This winding road of data science is very different than the planned set of tasks typical of software engineering projects. This difference can lead to confusion and frustration when data science tasks end up as stories in an Agile team’s backlog.
Many agile software engineering teams try to scope the size of their user stories to be completed in 3 days, or at most 5 days. When the training of a neural network can take longer than 3 days, the tasks of data science do not easily fit into the expectations of an engineering team.
There are three principles that can guide data science and software engineering teams working together:
Things Go Better when Done Together
An important principle to keep in mind is that both teams are working toward the same goal. A simple commitment to working together, believing that the end result will be better when done together will lead to practices that work for your teams. Where there is lack of respect or unreasonable demands, creativity and synergistic problem solving are squelched. However, if both sides are excited to learn how to work together, a different energy will emerge and over time the teams will discover what works best for them.
Always Know How Things are Going
An advantage of good Agile teams is that by using the progress board and daily standup meetings, the entire team can know how things are going on a project at any point in time. While the daily standup meeting can be used by a data science team, the progress board may look a little different.
For example, from an engineering perspective a user story may be “As a Nurse Practitioner I want to see a list of the top 10 high risk patients in order to prioritize my morning.” This could be a relatively straight forward story involving a database query and presenting the results of the query to the user. Daily status reports on this task will give the team confidence that the developer working on the task is making progress.
However, from a data science perspective this story is not 3 to 5 days. It is likely to take 3 to 5 weeks, or more. A risk model may need to account for many different factors including the criticality of the health issues of the current patient population, the effectiveness of the medications/treatments associated with each patient, time the patient has been under supervision, and many other factors.
Gathering the information to develop this model will require more than a simple understanding of the available patient data. It will also need to account for best practices, medication adherence, and hospital policy. While this type of research project is difficult to force into a list of user stories, it is vital for the team to know how the project is going. The team needs to help clear obstacles and celebrate wins, but not worry so much about day-to-day tasks.
Once the data scientist is relatively confident in the model, the process for making the result of the model accessible to the engineering team is similar to other engineering tasks, and can be tracked within a traditional agile process.
Things Should Work as Expected
In Extreme Programming, Kent Beck coined the term spike to describe a task aimed at answering a question or gathering information, rather than at producing shippable product. A data science project can seem like collection of spikes rather than completion of User Stories. However, even in this research-oriented environment, a data scientist should be able to clearly articulate the next spike and set an expectation for what will be learned from the experience.
Thus, while the data science team may not be comfortable giving long-term estimates on tasks, they should be able to clearly define the next set of tasks with reasonable timelines. The conversation should entail a plan and justification of the prioritization of the tasks chosen to pursue next. If an Agile team expects the data science team to produce a shippable deliverable or accurate estimates, they will be disappointed and frustrated.
While data science projects are different from software engineering projects, teams need to trust that the end result will be better when done together. From this belief, the teams can learn how to communicate with the fidelity and frequency such that everyone is comfortable knowing how things are going and that a proper set of expectations are being met.
Looking for a guide on your journey?
Ready to explore how human-machine teaming can help to solve your complex problems? Let's talk. We're excited to hear your ideas and see where we can assist.
Let's Talk