2025 International Workshop on Privacy Engineering - IWPE'25 / International Workshop on Privacy Engineering

BEST PRESENTATION AWARD
From Cluttered to Clustered: Addressing Privacy Threat Explosion through Automated Clustering,
by Jonah Bellemans, Mario Raciti, Dimitri Van Landuyt, Laurens Sion, Giampaolo Bella, Lieven Desmet and Wouter Joosen.

June 30th

San Giobbe Economic Campus, Cannaregio 873, Venice, Room 9C

09:00-09:10	Welcome, introductions and opening remarks
09:10-11:00	Session 1. Chair: Frank Pallas -Engineering Privacy Engineering, Invited Talk by Jasper Enderman, Former Privacy Engineering Manager at Vinted & Anonymisation SME at AstraZeneca. I believe that most privacy engineers, whether in industry or academia, join our field because of a strong belief in privacy as a human right. In order to better protect these rights, cross-collaboration within and between industry and academia is essential, and yet, one of the key issues in privacy engineering is a lack of information flow from industry into academia and vice versa. These silos threaten the field by potentially developing parallel vocabularies and methodologies, presenting a significant hurdle to knowledge transfer. In addition, privacy engineering’s recency in industry leaves it with a weaker identity compared to older engineering fields. IWPE has historically been a venue for exactly these exchanges, and the workshop has a history of reaching out to industry experts for lessons learned and facilitating closer collaboration. In this talk, I’ll be going over my years of experience of privacy engineering within different sectors of industry, discussing the issues faced, as well as addressing what we need from leadership in the privacy engineering field. I’ll also go over what companies can do in order to strengthen privacy engineering and build better teams, as well as offering some suggestions for academia, with the aim of strengthening academic-industry collaboration. I hope to leave participants with a stronger sense of the identity of privacy engineering as a field. -Identification of Compositional Risks in Data Protection Impact Assessments and Beyond by Henrik Graßhoff, Meiko Jensen, Malte Hansen, and Nils Gruschka. When personal data is processed in a distributed service composition, privacy risks may emerge solely from the choice of data processors included in the composition. For instance, different sub-processors may unknowingly rely on the same cloud provider, allowing for unintended linkability of personal data between different sub-processors at that cloud provider. As such compositional risks to privacy are beyond the scope of each individual risk assessment, they are likely to be overseen when performing a data protection impact assessment. In this paper, we propose a novel protocol to detect and manage such compositional risks to privacy. Following an initial problem definition and requirements elicitation, we elaborate how our protocol identifies candidates for compositional risks and how this information may be used to improve the results of a data protection impact assessment over service compositions including multiple data processors. -From Cluttered to Clustered: Addressing Privacy Threat Explosion through Automated Clustering by Jonah Bellemans, Mario Raciti, Dimitri Van Landuyt, Laurens Sion, Giampaolo Bella, Lieven Desmet and Wouter Joosen. Threat modeling methods and tools that involve the systematic and structured elicitation of privacy threats often lead to extensive threat lists, especially when applied to complex systems comprised of many elements — a phenomenon described as ‘threat explosion’. Grouping and merging these lists of fine-grained threats is a costly and time-consuming matter, yet often desired in support of threat management, i.e. follow up, prioritization, mitigation, etc. In this paper, we explore different strategies for automatically clustering threats at the basis of similarity or other relations. In particular, we propose three overall strategies for automated clustering: (1) location-based, where threats are clustered based on their location in the given system, (2) threat type based, where threats are grouped if they are of the same type, and (3) description-based, where threats are clustered based on similarity in their name or textual description. In addition, we define a number of strategies that combine location- and threat type-based strategies. We then perform an exploratory experiment in the case study of DP-3T, a COVID-19 contact tracing protocol used extensively during the pandemic. In this experiment, we apply the automated clustering strategies to a list comprised of 358 generated LINDDUN threats, of which 125 were manually established as relevant and documented. Through manual clustering and grouping, this led to the definition of 43 clusters. We compare the performance of the automated strategies in how well they approximate this expert clustering. Our results indicate that the clustering strategies, and especially description-based clustering techniques that apply NLP, lead to clusters similar to the manually established expert baseline. This study underscores the promise of automated techniques in reducing the manual effort that is required in the overall management of threats after the first elicitation phase.
11:00-11:30	Coffee Break
11:30-13:00	Session 2. Chair: Meiko Jensen -Right Here, Right Now: User Perceptions of In-Place Contextual Privacy Options by Florian Dehling, Jan Tolsdorf and Luigi Lo Iacono. In modern online services and apps, legally required privacy information and controls are often placed on separate pages or menus, forcing users to leave their primary tasks to review privacy statements or adjust settings. This disrupts usability by causing unnecessary friction, context switching, and information overload. We propose In-Place Contextual Privacy Option (IPCPOs), a Transparency-Enhancing Technology (TET) that integrates relevant privacy controls directly into the user’s workflow. IPCPOs tailor privacy information and settings to the immediate context, reducing the set of controls and information provided to contextual needs. In a study with 442 participants in an e-commerce setting, we found that IPCPOs should prioritize information on personal data type, processing purposes, and data recipients, alongside offering privacy controls. While IPCPOs score high on perceived transparency, only perceived control and privacy concerns significantly drive adoption intention. This work demonstrates how IPCPOs help comply with data protection obligations while reducing usability burdens. -Explaining and Visualizing Synthetic Data Quality Using Statistical Distances by Juko Yamamoto, Takayuki Miura, Rina Okada, Masanobu Kii and Atsunori Ichikawa. The quality of synthetic data is crucial for ensuring its usability and reliability in various applications, yet evaluating its utility remains a challenge.Statistical distances quantify deviations between synthetic and real data, but their effectiveness varies.We analyze eight statistical distance measures for categorical attributes through empirical evaluation.To better interpret these differences, we introduce the concept of a quality explanation distribution, which provides a structured probabilistic view of how synthetic data deviates under a given statistical distance constraint.A gradient-based optimization approach is implemented to explore these distributions, revealing statistical distance-specific trends. Our experiments, conducted on the Adult Dataset, show that Kullback–Leibler divergence (KLD) emphasizes low-frequency attributes more than Total Variation Distance (TVD) and L2 distance, making it preferable for analyzing rare categories, whereas TVD and L2 better capture overall distributional trends.Additionally, we observe that in high-dimensional distributions, L∞ distance is less effective in capturing distributional characteristics.These findings emphasize the importance of selecting appropriate statistical distances and show that quality explanation distributions improve interpretability. - Unlocking the DMA’s Potential: User-Centric AI Assistants Built on Privacy Engineering Principles , Industry Talk by Tilman Herbrich (CIPP/E), lawyer and partner in the Data & Technology department at Spirit Legal. This session explores how individuals can harness their right to data portability under the Digital Markets Act (DMA) to build fully privacy-compliant personal AI Assistants. In a cooperation of the law firm Spirit Legal with the startup privma.eu ("Privacy made in Europe"), we demonstrate how user data retrieved via real-time APIs from gatekeepers - such as Google, Meta, Amazon, Apple, and TikTok - can be lawfully aggregated and deployed to create hyper-personalized assistants that provide results which are superior to any existing models. By applying the principles of privacy by design and default, the AI Assistant ensures full alignment with GDPR, EDPB guidance, and national supervisory authorities. This approach empowers users to reclaim control over their data and to rectify inaccurate inferences - responding to ethical concerns raised by researchers like Prof. Dr. Michal Kosinski since 2013. Legally sound and technically innovative, our framework proves that the DMA, GDPR, and DSA are not merely regulatory hurdles, but catalysts for competitive, trustworthy AI services rooted in transparency, accountability, and user empowerment.
13:00-14:00	Lunch
14:00-15:30	Session 3. Chair: David Rodriguez Torrado -Formguard: Continuous Privacy Testing for Websites Using Automated Interaction Replay by Tim Vlummens and Gunes Acar. Websites commonly use third-party scripts for purposes such as advertising, analytics and payment processing. In recent years, several popular third-party scripts fell victim to supply-chain attacks where users' login credentials and credit card details were stolen. These devastating attacks sometimes remain hidden for several weeks until they are discovered. In this paper, we present Formguard, a continuous testing tool that detects web-based supply-chain attacks in an automated manner. Formguard allows website owners to record complex interactions such as logging in, signing up or checking out a product on their websites. These recordings can then be periodically replayed, while monitoring the HTTP requests and WebSocket messages, accesses to input fields, and information on the embedded scripts. The periodic and automated testing allows for faster detection of malicious supply-chain attacks and potential compliance issues that are impossible to detect with non-interactive security scanners. While Formguard specializes in detecting digital skimming attacks, it can also perform various privacy tests against different aspects of a website including embedded scripts, HTTP headers and cookies. We evaluate Formguard through two case studies. First, a long-term robustness test on 75 websites shows that even complex recordings remain replayable for several months, suggesting minimal maintenance workload for website owners. Second, we use Formguard's crawl mode to study access to and exfiltration from login and registration forms on 100,000 websites, revealing access to password fields on over 10K sites by third-party scripts. Finally, we discuss the challenges of automated testing for modern web forms, providing insights that may benefit researchers and practitioners. -PILLAR: LINDDUN Privacy Threat Modeling using LLMs by Majid Mollaeefar, Andrea Bissoli, Dimitri Van Landuyt and Silvio Ranise. The rapid evolution of Large Language Models (LLMs) has unlocked new possibilities for applying artificial intelligence across a wide range of fields, including privacy engineering. As modern applications increasingly handle sensitive user data, safeguarding privacy has become more critical than ever. To ensure robust data protection, potential threats must be identified and addressed early in the development process. Privacy threat modeling frameworks like LINDDUN offer structured approaches for uncovering these risks, yet they often require significant manual effort, expert knowledge, and detailed system information—making the process time-intensive and reliant on thorough analysis. To address these challenges, we introduce PILLAR (Privacy risk Identification with LINDDUN and LLM Analysis Report), a new tool that implements and automates the LINDDUN framework through LLM integration to streamline and enhance privacy threat modeling. PILLAR automates key parts of the LINDDUN process, such as generating DFDs from unstructured textual inputs (e.g. system descriptions), eliciting privacy threats, and risk-based threat prioritization. By leveraging the capabilities of LLMs, PILLAR can take natural language descriptions of systems and transform them into comprehensive threat models with limited input from users. Furthermore, PILLAR is capable of simulating multi-agent collaboration, allowing different LLM instances to play different contributor roles in a virtual threat modeling workshop. Rather than merely reducing the workload on analysts, PILLAR shifts their involvement from repetitive, tedious tasks to more meaningful and impactful interventions—such as refining the scope of analysis or completing critical components like the DFD. This allows experts to focus on the aspects that truly matter for a robust threat modeling process while enhancing both efficiency and accuracy. - Automating Data Subject Requests email ingestion with LLMs, Industry Talk by Stefano Bennati, Responsible AI Engineering team lead at HERE Technologies. GDPR Article 12. grants Data Subjects Rights (DSR) over data that belongs to them, for example the right to access or delete data stored by a service provider. The standard mean of sending DSR requests is via email, which is easy to write for a data subject, but difficult to process for a service provider. This talk presents preliminary results using LLMs to extract relevant information from DSR emails, for input in an automated DSR processing tool.
15:30-15:45	Wrap-Up and Award Ceremony