What methods can I incorporate sophisticated logic into my dataform process?

Summary

I am looking to enhance the functionality of my Dataform pipeline by introducing a layer of modularity (via JavaScript functions) that can identify when there is a disruptive change in the schema of my raw data source. This system would then automatically adjust all the .SQLX scripts within my project, eliminating the need for manual intervention.

--

Context

To provide some context, I have a data collection script that streams JSON-encoded raw data into a Pub/Sub topic at a high rate. A PubSub-to-BigQuery subscriber is then used to load this raw data into an hourly partitioned table in BigQuery. Subsequently, this table acts as the foundation for over 20 user-facing data tables. Essentially, Dataform reads from this raw table periodically and carries out the necessary transformations to update the various tables, each defined by its respective .SQLX script containing business logic.

--

Issue

Occasionally, changes are made to the schema of the raw table (such as modifying field names or types), which requires extensive manual adjustments on the Dataform side to ensure that the query logic aligns with the new schema and prevents any crashes. My goal is to streamline this process and reduce the amount of manual work needed whenever such modifications occur.

--

Potential Resolutions

Currently, Terraform rebuilds the raw table automatically when the schema is altered, but assigns it a different name with a version number (e.g., table_1-0-0 --> table_2-0-0). I propose incorporating JavaScript functions that instruct Dataform on how to handle the data based on the version/name of the raw table. This should be implemented in such a way that if new fields are added or existing ones are modified in the raw table, I do not have to manually update all 20 .SQLX files and reconfigure how these fields are referenced in the queries, etc.

It may sound repetitive, but is this type of automation achievable? Thank you in advance

Answer №1

Although it's a bit belated, I wanted to share an update on the solution I implemented to achieve my desired outcome.

I divided my code into two separate modules:

  • constants.js: This file stored global variables containing the full paths to various raw tables in BigQuery, each corresponding to a different schema version.
  • modules.js: Here, I created a function that performs a UNION ALL operation on all past and present schema versions. I then called this function within the pre_operations block of my main user-facing table during each incremental run.

The contents of modules.js:

// Function for selecting data from current and previous schemas
function set_latest_schema() {
    return `
      WITH UNIONED AS (
        SELECT * FROM ${constants.raw_4} WHERE DATE(ingestion_time) >= "2023-01-03"
        UNION ALL
        SELECT * FROM ${constants.raw_3} WHERE DATE(server_timestamp) >= "2023-01-01"
        UNION ALL
        SELECT * FROM ${constants.raw_2} WHERE DATE(server_timestamp) >= "2023-01-01"
        UNION ALL
        SELECT * FROM ${constants.raw_1_1} WHERE DATE(server_timestamp) >= "2023-01-01"
        UNION ALL
        SELECT * FROM ${constants.raw_1} WHERE DATE(server_timestamp) >= "2023-01-01"
      ),
    `
}

The contents of constants.js:

// List of source raw tables (newest to oldest)
const raw_4 = '`project_id.dataset_id.table_id_5`';
const raw_3 = '`project_id.dataset_id.table_id_4`';
const raw_2 = '`project_id.dataset_id.table_id_3`';
const raw_1_1 = '`project_id.dataset_id.table_id_2`';
const raw_1 = '`project_id.dataset_id.table_id_1`';

In case of a schema change, a new raw table is created with a corresponding constant pointing to it, a line is added to the UNION ALL statement, and a Pub/Sub message triggers a cloud function to delete existing user-facing tables. My orchestration layer in Cloud Workflows then initiates frequent executions to rebuild the updated user-facing tables. While not perfect, manual changes in query logic are still necessary to account for the new schema.

Additionally, I've realized that simplifying schema changes due to new fields is possible by using a repeated record structure where custom key-value pairs representing new fields can be added to the raw table without needing to rebuild it. This streamlines the process when facing breaking changes.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Concealing a Vuejs dropdown when clicking outside of the component

I am currently working on a Vuejs project where I am creating a menu component. This menu consists of 2 dropdowns, and I have already implemented some methods and used Vue directives to ensure that when one dropdown is clicked, the other hides and vice ver ...

The module 'react/lib/React' could not be located within the file 'ReactTestUtils.js'

Setting up unit-tests for a React Native project using ReactTestUtils is causing an issue with npm test. The error message I receive is: Cannot find module 'react/lib/React' from 'ReactTestUtils.js' I suspect that my dependencies are ...

What is the best way to change the size of a QR code

My current HTML code: <input id="text" type="text"/> <div id="qrcode"></div> Previous version of JAVASCRIPT code: var qrcode = new QRCode("qrcode"); $("#text").on("keyup", function () { qrcode.makeCode($(this).val()); }).keyup().focus ...

The selected value from a dropdown list may occasionally come back as text

I am facing an issue with a dropdown list on my form that has Integer Values set to display text. The problem arises when I run the code to show the value and associated text, as the text is being displayed as the value itself. Is there any workaround avai ...

A significant number of middleware packages, such as compress, are no longer provided as part of the

I recently added [email protected], [email protected], [email protected] and [email protected] (just to be safe). However, I am encountering this error message when trying to execute sails lift /Users/myuser/myproject/backend/node_modu ...

Ways to manage absent embedded expressions in template literals

I'm curious about the most effective way to handle expressions in a template literal. The following code functions correctly var val1 = "Hello" var val2 = "world" var template = `${val1} ${val2}!` console.log(template) However, suppose for some rea ...

What could be causing the issue with my Angular integration with Jira Issue Collector to not function properly?

Looking to link the Jira issue collector with an Angular application I attempted something along the lines of Component.ts import { Component, OnInit } from '@angular/core'; import * as jQuery from 'jquery'; declare global { inter ...

The Mantine date picker is causing an error stating that objects are not valid as a React child

I'm currently experimenting with utilizing Mantine Date in NextJS. The goal is to have the selected date displayed in the HTML text heading when a user clicks on it. For instance, if a user selects January 1st, 2023, the text should show like this: Da ...

grab the destination URL from the embedded frame

Thank you for your attention. I am working with three iframes spread across different HTML documents. Here is how they are organized: In the file iframemain.html, there is: <iframe src="iframeparent.html" width="100%" height="600"> </iframe> ...

What is the best way to create a dynamic information page using both V-for and V-if in Vue.js?

Hey everyone, I recently attempted to set up this layout: https://i.sstatic.net/WbcBX.png This company offers a variety of services grouped into different categories, each containing sub-options and specific details. This is how my JSON data is structur ...

Modify the onerror function of the image tag within the onerror function

Here is a way to display images using the img tag: If 1.jpg exists, show 1.jpg. If not, check for 2.jpg and display it if it exists. If neither 1.jpg nor 2.jpg exist, display 3.jpg. <img src="1.jpg" onerror="this.src='2.jpg'; this.oner ...

Please ensure that the table is empty before reloading data into it

I am facing an issue while binding data from the database. The data is being bound every 5 seconds, however, it does not clear the previous data and keeps accumulating. Instead of displaying just 3 rows when there are 3 in the database, it adds 3 rows ev ...

Conflicting Joomla Modules

I'm currently working on a project at the following link: www.eltotaldesign.com/planeteco Within this project, I have integrated the modules DJ Image Slider and Altra Switcher. Despite attempting to install Easy JQuery and other methods, I have been ...

The function is not recognized in C# programming language

Whenever I try to trigger functions by clicking buttons, nothing seems to happen and an error message appears in the console. Uncaught ReferenceError: AddressInputSet is not defined at HTMLButtonElement.onclick I'm new to this and could use ...

Encountering a JavaScript problem in Google Chrome?

Something strange is happening when I try to place an image in the canvas... "Uncaught TypeError: Failed to execute 'drawImage' on 'CanvasRenderingContext2D': The provided value is not of type '(HTMLImageElement or HTMLVideo ...

Retrieving data from a dynamic array using jQuery

Within my code, I am working with an array that contains IDs of div elements (specifically, the IDs of all child div elements within a parent div with the ID of #area): jQuery.fn.getIdArray = function () { var ret = []; $('[id]', this).each(fu ...

Issue encountered while serializing the `.product` object retrieved from the `getStaticProps` function in NextJS, in conjunction with

I ran into an error message saying "Error: Error serializing .product returned from getStaticProps in "/products/[id]". Reason: undefined cannot be serialized as JSON. Please use null or omit this value." This issue occurred while I was attempting to crea ...

"We are unable to set a value for a global array unless we utilize the .push() method

It's puzzling that I can't populate a global array without using the .push() method. (function() { var globalEmail = []; var testUpdate = function() { var arr = [1, 2, 3]; if (globalEmail.length > 1) { gl ...

Learn how to properly convert a string into a valid URL using the Next JS router when pushing pages

I'm dealing with a string that looks like this: /dashboard/products/:id. My goal is to utilize Next Js's router to navigate to that URL and replace the placeholder :id with an actual id. Here's the code I've written: {products.map(prod ...

Activate the Bootstrap Jquery/Ajax inline editing feature by simply clicking on the Edit button

Seeking recommendations for a way to implement inline editing. When the edit button is clicked, I want the label's content to be replaced with an input text field that can be updated in my MySQL database. This is what my code looks like: <label s ...