The product catalog is a strategic asset for Amazon. It powers unrivaled product discovery, informs customer buying decisions, offers a huge selection across a large number of categories and positions Amazon as the first stop for shopping online. As an applied scientist in ASCS you will help us make the world’s best product catalog even better and improve the experience for millions of customers.
You will analyze how information within Amazon’s catalog affects our customers and help devise short term and long term strategy for expanding and enhancing the catalog. You will have the opportunity to design new data analytical workflows at a scale rarely available elsewhere, utilizing state-of-the-art data science and machine learning tools such as Spark, Python, and Theano and Amazon’s cloud computing technologies such as Elastic Map Reduce (EMR), Kinesis, and Redshift.
You will apply your knowledge about data science by creating algorithmic solutions that combine techniques like clustering, pattern mining, predictive modeling, deep learning, statistical testing, information retrieval, and natural language processing and apply them to huge data amounts of data describing the products in the catalog and the customer interactions. You evaluate with scientific rigor and provide inputs to business strategy and technical direction. You will collaborate with software engineering teams to integrate your algorithmic solutions into large-scale highly complex Amazon production systems.
You will encounter many challenges, including
- scale (build models for billions of products in the catalog utilizing trillions of customer interactions),
- accuracy (extreme requirements for precision or recall due to impact of getting it wrong, e.g. extremely high precision for merging identical products or extremely high recall for identifying hazardous materials)
- speed (generate predictions for millions of new or changed products with low latency),
- diversity (products need to be classified into >100k’s of categories across 16 languages),
- high dimensionality (feature engineering and selection using 1k’s of structured product features with millions of values, unstructured product data, product images, customer searches, clicks, reviews, etc.), and
- noise (build models robust to varying quality of data provided by millions of sellers and labels derived implicitly or collected from humans).
You will need to be creative and go far beyond text book solutions to deal with these challenges. Meeting the business requirements will involve combining several different machine learning algorithms with domain knowledge into complex data analytical workflows that automate what can be automated and efficiently utilize experts when needed to mitigate risk.
You will help us to
- Identify which product information matters most to our customers.
- Extract product information from unstructured data to augment the catalog.
- Consolidate spelling variations of product attributes like brand, size, or color.
- Automatically classify new products with low latency into our highly detailed product categories.
- Improve the categorization of our huge catalog with minimum manual effort.
- Identify identical products and similar products such as different sizes and colors of the same shoe.
- Identify restricted products such as hazardous materials.
- Estimate the financial impact of catalog improvements.
We use the resulting models to automatically improve the catalog when possible, and design efficient machine assisted workflows to allow expert review for high impact decisions. Your solutions will directly impact the customer experience by making products discoverable, presenting them in the right place, with complete and accurate product information to enable informed purchase decisions.