NOTICE: Due to a lapse in annual appropriations, most of this website is not being updated. Learn more.
Form submissions will still be accepted but will not receive responses at this time. Sections of this site for programs using non-appropriated funds (such as NVLAP) or those that are excepted from the shutdown (such as CHIPS and NVD) will continue to be updated.
An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
Synthetic Data Generation Using Combinatorial Testing and Variational Autoencoder
Published
Author(s)
Krishna Khadka, Jaganmohan Chandrasekaran, Yu Lei, Raghu N. Kacker, D. Richard Kuhn
Abstract
Data is a crucial component in machine learning. However, many datasets contain sensitive information such as personally identifiable health and financial data. Access to these datasets must be restricted to avoid potential security concerns. Synthetic data generation addresses this problem by generating artificial data that are similar to, and thus could be used in place of, the original real-world data. This research introduces a synthetic data generation approach called CT-V AE that uses Combinatorial Testing (CT) and Variational Autoencoder (VAE). We first use VAE to learn the distribution of the real-world data and encode it in a latent, lower-dimensional space. Next, we use CT to sample the latent space by generating a t-way set of latent vectors, each of which represents a data point in the latent space. A synthetic dataset is generated from the t-way set by decoding each latent vector in the set. Our experimental evaluation suggests that machine learning models trained with synthetic datasets generated using our approach could achieve performance that is very similar to those trained with real-world datasets. Furthermore, our approach performs better than several state-of-the-art synthetic data generation approaches.
Proceedings Title
IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW
Conference Dates
April 16-20, 2023
Conference Location
Dublin, IE
Conference Title
12th International Workshop on Combinatorial Testing
Khadka, K.
, Chandrasekaran, J.
, Lei, Y.
, Kacker, R.
and Kuhn, D.
(2023),
Synthetic Data Generation Using Combinatorial Testing and Variational Autoencoder, IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW, Dublin, IE, [online], https://doi.org/10.1109/ICSTW58534.2023.00048, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=936332
(Accessed October 14, 2025)