Configuring Streamdal with Terraform

We love our gorgeous UI and we know you will too. However, we understand that manually configuring all your pipelines can be a daunting task at scale. We’ve created a Terraform provider to help. This blog will show you how to quickly get a pipeline up and running with our Terraform provider.

Why Terraform?

As the current industry leader, Terraform was an obvious first choice of Infrastructure as Code (IaC) tooling to support configuring the Streamdal server. We do plan to support additional IaC tools in the future.

The Terraform provider and documentation

Getting Started

Skip this section if you are already familiar with Terraform and simply copy/paste the provider block below.

Install the Terraform command line tool
Create a directory/folder to hold the config file and Terraform data. Let’s call ours streamdal-config
Now create an empty text file called main.tf. This is where we’ll put the Streamdal pipeline definitions, written in Hashicorp Configuration Language(HCL).
In the main.tf file, we’ll configure a Terraform provider, which is used to take our HCL definitions and make the necessary API calls to the Streamdal Server:

 terraform {
   required_providers {
     streamdal = {
       version = "0.1.2"
       source  = "streamdal/streamdal"
     }
   }
 }

 provider "streamdal" {
   token              = "streamdal-server-token-here"
   address            = "streamdal-server-address-here:8082"
   connection_timeout = 10
 }

Now run terraform init in your terminal to download the required provider. You should see output similar to the following:

Now we’re ready to begin configuring our first pipeline and audience.

Setting up our first Pipeline

If you skipped the previous section, copy/paste the provider block into your .tf file.

We’ll start off with a basic pipeline definition that detects some PII (an email address) in a JSON payload and masks it.

We’ll use the pipeline resource to configure this:

resource "streamdal_pipeline" "mask_email" {
  name = "Mask Email"

  step {
    name = "Detect Email Field"

    # We specify abort conditions here since we don't want
    # to continue with the second step if there is nothing
    # to transform.
    on_false {
      abort = "abort_current" # No email found
    }
    on_error {
      abort = "abort_current" # An error occurred
    }
    dynamic = false
    detective {
      type   = "pii_email"
      args   = [] # no args for this type
      negate = false
      path   = "" # No path, we will scan the entire payload
    }
  }

  step {
    name    = "Mask Email Step"
    dynamic = true
    transform {
      mask_value {
        # No path needed since dynamic=true
        # We will use the results from the first detective step
        path = ""

        # Mask the email field(s) we find with asterisks
        mask = "*"
      }
    }
  }
}

Using the streamdal_pipeline resource, we’ll configure a pipeline named “Mask Email”. This pipeline has two-step blocks that define each step of the pipeline.

The first step uses the “Detective” module (detective{})to look for email addresses (type="pii_email")anywhere in the JSON payload it receives.

This type of matcher does not require any arguments (args=[]), and by omitting a path to a specific field (path=""), the entire payload will be scanned for any fields that have a value of an email address.

Now on to the second step…

We define another step{} block to create a second step. The important bit here is dynamic=true which indicates that we will use the results from the first step.

We will mask the email in the JSON payload, using the “Transform” module ( transform{}). Inside the transform{} block, we’ll configure a mask_value transformation, with an empty path="".

Note: We’re not specifying a path here because the previous step will pass any paths it detects as arguments to the second step.

Great! We’ve defined our first pipeline, and now we need to set up the “audience” to use this pipeline on.

Defining and Assigning an Audience

An audience is a definition that the Streamdal SDK uses to determine which pipelines to apply to a given call to .Process(). It consists of 4 pieces of data:

service_name — A name indicating the service you’re running the SDK in.
component_name —This is used to indicate the source or destination of the data that you will be processing. Typically something like kafka, postgres, rabbitmq, etc
operation_name — Some kind of short descriptor indicating the operation that is processing the data.
operation_type — Either consumer or producer

For demo purposes, let’s say we’re running the Streamdal SDK in our billing service, which consumes data from a Kafka topic in order to generate a sales report.

Using the streamdal_audience Terraform resource, we would define the audience as follows:

resource "streamdal_audience" "billing_sales_report" {
  service_name   = "billing-svc"
  component_name = "kafka"
  operation_name = "gen-sales-report"
  operation_type = "consumer"
  pipeline_ids   = [resource.streamdal_pipeline.mask_email.id]
}

For folks unfamiliar with Terraform, the value of the pipeline_ids field is dynamically populated by the result of the streamdal_pipeline resource we previously defined.

Now save your main.tf file, and we’ll create our pipeline and audience in the Streamdal server with the terraform apply command:

Let’s open up our Streamdal console to see the results:

Streamdal console with our new pipeline and audience

As you can see, our audience is now defined and the “Mask Email” pipeline has been created and assigned to it.

We’re now ready to process data through our pipeline and prevent PII leakage!

Full Example

We have created a repository located at https://github.com/streamdal/blog-terraform-demo with the Terraform code from this article and also an example Golang app for you to run.

We’ve got a community!

Want to nerd out with me and other misfits about your experiences with monorepos, deep-tech, or anything engineering-related?

Join our Discord, we’d love to have you!

Back

wordstream

Spillin’ that deep-tech startup tea.