🎉 Read about our $5.4M funding announcement on TechCrunch! 🎉

Data Protection

Wed Jun 28 2023

PII Compliance for Event-Driven Architectures


Jon Wright

Event-driven architectures allow businesses to build scalable, resilient, and responsive apps. However, the nature of event-driven apps poses data privacy and compliance challenges. Event-driven systems stream large volumes of data between multiple services, making it difficult to identify and remove personally identifiable information (PII) before it has propagated to storage services, databases, and, potentially, third-party services.

For businesses lacking an effective PII compliance solution, data can be a substantial liability. In this article, we discuss what PII is, common sources of PII compliance failure, and strategies to enforce PII compliance in event-driven architectures.

What is Personally Identifiable Information?

Personally identifiable information (PII) is any data that can be used to directly or indirectly identify an individual. It includes data like names, addresses, phone numbers, email addresses, Social Security numbers, driver’s license numbers, and passport numbers. PII also encompasses sensitive information, such as medical records, financial data, or anything else that, when combined with other data, can be used to establish an individual’s identity.

In the US and Europe, a wide variety of state, national, and international regulations govern PII collection, storage, and processing, including GDPR, HIPAA, FERPA, SOX, CCPA, and others.

PII Compliance and Event-Driven Applications

Event-driven architectures can inadvertently expose PII, creating compliance risks. One way this can happen is if PII is injected into the event stream. Consider a customer support service that issues an event whenever a customer creates a support ticket. The event data includes the customer ID and information about the issue. But when the customer created the ticket, they also pasted in their personal address or perhaps their credit card number.

That PII is now in the event stream, which means it may be consumed by a range of related services, stored by them in databases, and perhaps passed on to a data lake or analytics service. There are many routes by which PII could get into the event stream, and once it does, it is likely to propagate to other app components and to third-party services.

If a business doesn’t properly identify PII, it may not even know it is storing and processing sensitive data. This is particularly problematic when PII is unknowingly managed alongside innocuous data in systems without the high degree of protection PII compliance requires. Furthermore, a business that can’t identify PII can’t delete it when requested (a GDPR and CCPA requirement). Nor can they amend it on request (a HIPAA requirement).

The Challenges of Identifying PII in Event-Driven Applications

As we’ve already mentioned, in event-driven architectures, large volumes of data flow between the components of a decentralized system. PII may be quickly disseminated across the system, making it harder to monitor and track.

Additionally, most event-driven apps depend on append-only, immutable event streams in solutions like RabbitMQ-Streams or Kafka. This immutability makes removing PII immensely difficult. Once you have discovered an issue, you might:

  • Update producers and consumers with new PII handling logic, or fix schema issues
  • Audit the affected data as it flowed downstream with fixes or removing it entirely
  • Pray you have a source of truth other than what is stored in your topic partitions / logs, and restore your systems to a working state

PII identification is further complicated because event-driven systems often deal with complex, nested data structures. The data being transmitted can vary greatly in structure and content, both of which are likely to evolve over time, making it difficult to consistently identify and filter PII.

On top of which, event-driven applications often handle a large volume of events at high speed. The most effective solution is to identify and react to PII in data streams in real time, but that can be enormously challenging given streaming data’s velocity and complexity.

PII Detection with Streamdal

The most critical PII compliance measure for event-driven apps is data classification and PII discovery. The ability to quickly and accurately identify personal information within your system is the foundation of any effective PII compliance strategy. If you can’t reliably discover PII, other compliance measures are bound to be ineffective.

Streamdal’s streaming data monitoring solution simplifies PII compliance for apps designed around an event-driven architecture. Our anomaly detection engine can detect PII in your data streams and allow you to run custom functions (to edit, transform, obfuscate, remove, etc), with optional alerting, so you can quickly respond to potential compliance risks.

Get started with Streamdal today, or book a demo to learn more.

Jon Wright

Solutions Engineer

Jon is the Solutions Engineer at Streamdal. He has helped maintain and refine customer service and experience with over 10 years of experience in business operations, retail, and logistics. A lover of Fungi and Mycoremediation, and a fanatical asynchronous, event-driven enthusiast. Here at Streamdal he is the main point of contact for onboarding and integrations.

Continue Exploring

Data Protection

Thu Jul 27 2023

Data Protection: Challenges and Opportunities


Daniel Selans

Explore data protection strategies, key regulations, and the role of automation in safeguarding sensitive information in an ever-evolving digital landscape.

Read more >
Data Protection

Wed Jul 19 2023

Data Consistency in Distributed Enterprise Applications


Daniel Selans

Learn about data consistency in distributed enterprise apps, why it matters, and how to maintain it using validation and real-time data monitoring.

Unlock real-time data visibility today!

Get Started

backed by


Privacy PolicyTerms and Conditions

© 2023 Streamdal, Inc.