The Engineer-iZATION of Data Practices

Alok
3 min readJan 7, 2023

The desire for maintainability, extensibility reusability, and scalability (other “-ity” could be added as well) is causing the demand for “ENGINEER-iZATION” of different data management facets.

1. Data Reliability Engineering — Inspired by Google SRE, refers to the work of creating — Standards, process, platform etc. to keep the data applications + Data reliable across the data lifecycle. Building the platform approach for Data observability is the focus here.

2. Data Governance Engineering — Responsible for building the platforms for data governance pillars — Discoverability, Quality, Data sharing, and Security/privacy. As a trend, the “Quality” is evolving out from governance to DRE.

3. Data Engineering — The standardization and tooling aspect of data transformation comes under Data Engineering. The observability part of data engineering is mostly ad hoc at the moment but eventually will move to DRE. Most of big data platforms features are centered around data engineering tasks.

4. Analytics Engineering — Responsible for preparing and managing business friendly datasets (think of logical models). This is a data layer for BI and analytics. The intent/scope is to deliver well-defined, transformed, tested, documented, and code-reviewed data sets for BI and analytics consumption.

5. ML Engineering — Responsible for data scientist work but with the vigor of software engineering principles. As no surprise, the evolution is due to rising expectation of productionizing ML models. The highlight here is about growing prominence and the need for streamlining ML workflows.

6. Privacy Engineering — Responsible for building privacy in rather than bolting it on — aka privacy by design. The expected outcome is to build trustworthy products and systems.

The current state

The map shows the different emerging Engineering practices, the generated value & maturity. Do note that the “Low” or “medium” is NOT indicating the IMPORTANCE of the practice. Practices/Area are essential for successful data value extraction!!

X-axis — maturity in terms of tools/framework.

Novel — Newly discovered. Being explored.

Emerging — Clarity but no standard. Mostly custom solutions

Good — Best Practices emerged but not fully matured.

Best — Almost a commodity with standardized tools/frameworks/platform.

Y-Axis — Proven value for the business.

LOW — Not much known/Experimental.

MEDIUM — Few compelling case studies.

HIGH — Well proven and widely adopted.

But WHY Engineer-ization

1. As no surprise, data is being used in ever-higher impact applications.

2. Humans are less in loop for most of the business operations.

3. A data engineer can’t be trained in so many capabilities.

How to use the map

1. If you are a services firm, then you may want to consider organizing your engineering capabilities around these VALUE oriented offerings.

2. If you are looking for a new space (as a startup) for product ideas, then “Novel” and “Emerging” space would be a good area to explore.

3. If you are looking for a proven idea/opportunities but not scaled yet, then “Emerging” and “Good” space would be a good area to explore.

4. If you are looking to reduce/rationalize, then “Best” space would be good area to explore.

Key take away

Engineering practices are not just marketing or sales tools but give clarity to the purpose, growth and roadmap for individuals and organization.

--

--

Alok

Certified Solution architect | Experience technologist | Speaker | Developer. Views and opinions are personal