Synthetic Data Sets empower data scientists to train financial AI models faster on more robust, highly-realistic, and PII-free data.

Scroll ↓

IBM
2023 - 2025

Won “Best AI Solution — Data Insights & Knowledge Management” at the FinTech Futures Banking Tech Awards USA

IBM Synthetic Data Sets are a family of virtually-generated, privacy-compliant data sets that catalyze enterprise adoption of Large Language Models, Generative AI, and Agentic AI by enabling faster access to rich data that businesses need for predictive financial solutions. Our team transformed what started as research idea into a fully-realized infrastructure product from 0-1.

Business impact

↑ Proof of Concepts

Increase in AI software and hardware (Spyre) POCs across IBM Z, Partner Ecosystem, and Global Sales accounts by providing data that sped up the overall AI model lifecycle process, illustrating product value to customers faster.

User impact

Significantly less time spent on data pre-processing

Data scientists went from spending 6+ months to spending 2 weeks on data pre-processing (cleaning, balancing, labeling), some getting started on AI model training immediately after downloading the data sets.

Contribution

Design Lead

I led the design strategy, research, and visual design for end-to-end user experiences spanning the AI model lifecycle with a focus on data pre-processing. I also grew skills in partner ecosystem sales plays, pricing, versioning, Data-aaS, and agent-based modeling techniques.

Digital twin isometric illustration

IBM Blue Studio