The Data Engineering Podcast: Column Names as Contracts

data
Discussing how column names can serve as a light-weight alternative to data catalogs and contracts and how to implement this approach with dbtplyr
Published

January 12, 2022

Communication and shared context are the hardest part of any data system. In recent years the focus has been on data catalogs as the means for documenting data assets, but those introduce a secondary system of record in order to find the necessary information. In this episode Emily Riederer shares her work to create a controlled vocabulary for managing the semantic elements of the data managed by her team and encoding it in the schema definitions in her data warehouse. She also explains how she created the dbtplyr package to simplify the work of creating and enforcing your own controlled vocabularies.