Fri Apr 26 2024
Streamdal Reduces Downtime Resolution Time By Greater Than 90% For Shopify Issues
What happens when external data doesn't adhere to expected forms, or schema changes occur without notice? Alerts start going off, slack channels begin to light up, and e-mails start coming in.
If these issues are not fixed on time, it means unhappy customers, poor SLA's, and a poor reputation. What is the most effective way to handle these issues? We will explore how our customers were able to reduce their recovery time by over 90% using Streamdal when handling periodic upstream schema changes from Shopify.
The Problem: Shopify Schema Changes
It's a very normal thing to expect schemas to evolve over time. Updates might happen to accommodate new features, regulatory compliance, data engineering, data quality, business logic, or new integrations. The list goes on. When these unexpected changes come down from Shopify, downstream consumers begin to fill up dead-letter queues. While engineers on-call race to catch up.
Here is what a fix might look like with legacy tools:
APM Data Analysis
Get into Splunk or Datadog (or some other APM) and parse through the alerts and logs. You've done this before so the alerts you set up make it somewhat easier to find out what happened and how.
After some time you've found out why the issue occurred. It was a Shopify schema change, so you plan the fixes:
- Write (or update) a script that fixes the event data in a dead letter.
- Replay the events back into your services.
- Then update the consumer schema to prevent future incidents.
This process of finding out why streaming data-related issues occur can be very time-consuming. Even more, is the time spent writing or modifying scripts to fix these data issues. It can take 2-3 hours or more. So how do you bring that time to recovery down to about 15 minutes? Here is how you can do it with Streamdal and Streaming Data Performance Monitoring:
Streaming Data Collection
Streamdal hooks into your messaging systems as a polite consumer. It relays all the event data into an indexed, logical grouping of data called a Collection. We collect from any messaging system like Kafka, RabbitMQ, GCP PubSub, NATS, CDC systems, Kinesis, and many more. We also support any encoding, whether it's Protobuf, Avro, Thrift, JSON. It will be decoded in real-time as human-readable JSON to observe and operate on.
From a Collection, data can be searched and queried with Lucene-like syntax across any point in time. If you set up a collection for your Shopify topics or queues, then you will have a single source of truth to work from.
Smart Dead-Letter Queue & Functions
For Shopify schema changes, you won't ever have to spend hours finding which event and where. You will have a curated set waiting for you in the Dead-Letter. With recurring functions, you can get the fix in first while you update the consumer.
Data Monitors & Alerts
Like most APM products, SDPM comes with monitoring and alerting. Including, familiar integrations catered to streaming data and event-driven systems. From the Collection or Dead-Letter, you can semantically monitor for:
- Messages entering Dead-Letter
- Message rates
- Schema evolution
- Fields (like PII, string contains, time, true/false, and many more)
For Shopify, you might want to monitor for schema evolution or messages entering Dead-Letter. With Streamdal Functions, you can write custom monitors for more scoped or bespoke needs. We currently support PagerDuty, Slack, CircleCI, Okta, Email, Prometheus, Github, and Datadog.
Streamdal allows you to replay your messages to any location; back into your broker like Kafka, a Postgres server, or even an S3 bucket. We support all destinations. For our Shopify example, once you have fixed issues with the data, you can then replay those messages back into your systems for reprocessing. Replays can be ad-hoc or recurring just like Functions.
If you have experienced the pain of lengthy recovery time due to APM's lack of support for streaming data issues, you should consider adding SDPM to your monitoring stack. When dealing with Shopify or external data changes, you too can reduce your recovery time by greater than 90% with Streamdal.
Jon is the Solutions Engineer at Streamdal. He has helped maintain and refine customer service and experience with over 10 years of experience in business operations, retail, and logistics. A lover of Fungi and Mycoremediation, and a fanatical asynchronous, event-driven enthusiast. Here at Streamdal he is the main point of contact for onboarding and integrations.
Wed May 31 2023
Why Use Anomaly Detection for Streaming Data?
Anomaly detection is one of the critical capabilities businesses need to maximize the value of streaming data; it allows them to quickly identify patterns, trends, and irregularities.
Wed May 24 2023
What Is Event-Driven Architecture and Why Use It?
Event-driven architecture (EDA) is a software design paradigm that promotes loose coupling between components by using events as the primary communication mechanism.