DataOps: Better Data Analytics, Obtained Faster

By Greg Henson, CEO of Henson Group

dataops

Continuous Integration and Continuous Delivery (CI/CD) is the most important thing DevOps shares with the much newer process DataOps.

DevOps, the joining of software developers and IT operators to radically increase the velocity at which a software product could be brought to market, did so by immediately gathering feedback directly from users and bringing it back to developers who immediately and iteratively coded new upgrades. This resulted in many companies deploying dozens of upgrade iterations per day!

The Need to Know Faster with Better Analytics

Many digital transformation initiatives begin with the clear perception that the more data can be leveraged to improve business outcomes, the better. Data has been variously referred to as “the new bacon” or “the new oil” in recognition of the huge monetization potential available from the data a company already owns.

This has created new urgency to extract value from huge data entities and performing advanced analytics upon them to obtain useful answers.

NewVantage Partners CEO Randy Bean tells us, “DataOps firms recognize that the process and tools they offer need to help build and reinforce a data-driven culture.”

Referring to the similarities and differences between DataOps and the involved and iterative qualities of DevOps, Bean adds, “DataOps aims to do the same when it comes to delivering analytic velocity for end users in the enterprise using agile data management practices.”

What Does DataOps Replace?

Chief data officers (CDO) and data-analytics professionals respond to these challenges in one of three ways:

Heroism – Data-analytics teams work long hours to compensate for the gap between performance and expectations. When a deliverable is met, the data-analytics team is considered heroes. However, yesterday’s heroes are quickly forgotten when there is a new deliverable to meet. Also, this strategy is difficult to sustain over a long period of time, and it, ultimately, just resets expectations at a higher level without providing additional resources. The heroism approach is also difficult to scale up as an organization grows.

Hope – When a deadline must be met, it is tempting to just quickly produce a solution with minimal testing, push it out to the users and hope it does not break. This approach has inherent risks. Eventually, a deliverable will contain data errors, upsetting the users and harming the hard-won credibility of the data-analytics team.

Caution – The team decides to give each data-analytics project a longer development and test schedule. Effectively, this is a decision to deliver higher quality, but fewer features to users. One difficulty with this approach is that users often don’t know what they want until they see it, so a detailed specification might change considerably by the end of a project. The slow and methodical approach might also make the users unhappy because the analytics are delivered more slowly than their stated delivery requirements and as requests pile up, the data-analytics team risks being viewed as bureaucratic and inefficient.

None of these approaches adequately serve the needs of both users and data-analytics professionals, but there is a way out of this bind. The challenges above are not unique to analytics, and in fact, are shared by other organizations.”

Why DataOps

Randy Bean also gives credit for the development of DataOps to Andy Palmer, CEO and co-founder of Tamr Inc. a next-generation data curation company in 2013 along with Turing award winner Dr. Mike Stonebraker.

“People have been managing data for a long time,” explains Palmer in his seminal 2015 article, From DevOps to DataOps: Why It’s Time to Embrace “DataOps” as a New Discipline, “but we’re at a point now where the quantity, velocity and variety of data available to a modern enterprise can no longer be managed without a significant change in the fundamental infrastructure. The design point must focus on the thousands of sources that are not controlled centrally and frequently change their schema without notification — much in the way that websites change frequently without notifying search engines,”

DataOps Defined

Palmer suggests, “I believe that it’s time for data engineers and data scientists to embrace a similar new discipline — let’s call it ‘DataOps’ — that at its core addresses the needs of data professionals on the modern internet and inside the modern enterprise,” citing the “democratization of analytics” and the implementation of “built-for-purpose” database engines as two trends driving the need.

He then clarifies the distinct differences between DevOps and DataOps defining DevOps as the combination of software engineering, quality assurance, and technology operations, then launching into more detail about DataOps.

“DataOps acknowledges the interconnected nature of data engineering, data integration, data quality and data security/privacy and aims to help an organization rapidly deliver data that accelerates analytics and enables previously impossible analytics,” explains Andy Palmer. “The “ops” in DataOps is very intentional. The operation of infrastructure required to support the quantity, velocity and variety of data available in the enterprise today is radically different than what traditional data management approaches have assumed. The nature of DataOps embraces the need to manage MANY data sources and MANY data pipelines with a wide variety of transformations.”

On their website, the DataOps.dev community offers even more clarity, explaining:

“Similar to how DevOps changed the way we develop software, DataOps is changing the way we create data products. By leveraging DevOps methodologies, teams have achieved speed, quality, and flexibility by employing a Delivery Pipeline and Feedback Loop to create and maintain software products.

DataOps employs a similar workflow to achieve the same goals for teams building data products. While both are based on agile frameworks, they differ greatly in their implementation of build, test, and release.

DataOps requires the coordination of ever-changing data and everyone who works with data across an entire business, whereas DevOps requires coordination among software developers and IT.”

Agile Roots

Another thing DataOps does share with DevOps is its roots in Agile Methodology. In homage, a DataOps Manifesto was introduced which says,

Whether referred to as data science, data engineering, data management, big data, business intelligence, or the like, through our work we have come to value in analytics:

Individuals and interactions over processes and tools

Working analytics over comprehensive documentation

Customer collaboration over contract negotiation

Experimentation, iteration, and feedback over extensive upfront design

Cross-functional ownership of operations over siloed responsibilities

Think “Strategic Market Advantage”

The goal is to monetize the data your company already owns by processing it faster to generate deeper, more valuable analytics that enable you to get to market faster with more accurately targeted offerings to a far-better-qualified audience. If that’s of interest to your company, our team of DataOps engineers will help you for no initial cost. Contact us today: DataOps | Henson Group Azure Expert MSP | Microsoft’s #1 Azure CSP Reseller