97 Things Every Data Engineer Should Know: Collective Wisdom from the Experts


With this in-depth book, data engineers will learn powerful, real-world best practices for managing data—both big and small. Contributors from companies including Google, Microsoft, IBM, Facebook, Databricks, and GitHub share their experiences and lessons learned on cleaning, prepping, wrangling, and storing data.

I contributed the six chapters on topics ranging from data documentation, community building, field naming, documentation, and validation.

O’Reilly Media

I contributed six chapters to the book:

  • Develop communities - not just code: On building developing communities along with code bases and empowering versus patronizing your data product’s customers
  • Give data products a front-end with latent documentation: On low effort practices for improving data documentation and usability
  • There’s no such thing as data quality: On the value of data “fit for purpose”
  • The many meanings of missingness: On causes and consequences of null field encoding
  • Column names as contracts: On embedding metadata and performance “contracts” in column names
  • Data validation needs more than summary statistics: On the importance of context-informed data validation