Math 231 Intro to Data Science

<snip> see: Gradient-Boosted Decision Trees

gradient-boosted decision treesWhen building out their deep learning model, they considered combining their decision tree with a deep neural-network (DNN). The process of combining two models into a super-model is called “ensembling”. After testing, however, they didn’t see meaningful improvements over the raw decision tree, so they dropped the combined approach.

Dropping ensembling was really helpful because DNNs are much faster to train than gradient-boosted decision trees (GBDT). They got it to 1/8th the speed, thanks to the use of GPUs. Not only that, the runtime latency is better with strict DNNs.

To start the migration, they brought over many of their engineered features to the new model (listing price, click rate, etc), then they had to normalize them. Normalizing is a process where you take multiple features and scale them down to the same scale. I believe this makes it so the model can learn faster because it isn’t jumping all over the number line, but I’m not certain.

Not all of their features were numbers. They also had things like text-based features they needed to incorporate. To solve this, they created “custom embeddings”, which appear to be a way to turn words into numbers. They noted that they had to build it themsleves because most embeddings were trained on prose, not item data for an ecommerce website.

As they began to roll the feature out, they ran into a bunch of common software engineering issues, like migrating a live system. Ultimately, the DNN approach feels a bit like a black box to me, but I better understand the GBDT case and have some interesting areas to look into going forward.

Some remaining open questions:

  1. Why are search results ranked in multiple passes? Are there limits to the number of passes that improve things? Is it slower to use more passes? How much?
  2. How does search ranking work? Normalized discounted cumulative gain (NDCG) seems to be a common way of doing it, but I don’t know anything about it.
  3. How does custom embedding work? That seems like magic.