Amazon Fraud Detector automates sampling for imbalanced model training datasets

Amazon Fraud Detector is a fully managed service that makes it easy to identify potentially fraudulent online activities, such as the creation of fake accounts or online payment fraud, using customized machine learning (ML) models. To train an ML model, customers provide a dataset that contains examples of legitimate and fraudulent events related to the business activity they want to evaluate for fraud risk. These fraud datasets are often highly imbalanced. For example, a dataset containing one million past transactions may only include 5,000 fraudulent ones, corresponding to a fraud rate of 0.5%. This imbalance in the training data can lead to lower model performance, which results in the customer capturing less fraud. There are a number of common techniques used to treat imbalanced datasets, but applying them requires ML expertise and the best technique often depends on the characteristics of the particular dataset.

Source:: Amazon AWS