Partial dependence plot

Partial dependence plots are one of ways how to find out how a single feature influences predictions. After the model is trained, we can select one feature and go row by row and for every sample change the feature to 10th, 20th... percentile of the feature distribution. For a single sample, we get a line plot and we can see how the feature influences the prediction for the sample. For multiple samples, we average the results (lines) and get the partial dependence plot.

What is ML used for

  • Autonomous driving - different sensors for Google and Tesla; lidar is blind on snow and while raining; sensors can be muddy.
  • Detecting age and gender of customers in a shop - problems when camera stops working and is replaced by a new one
  • Detecting hearth dysfunctions - Fourier features and spectrograms allegedly do not work well, weird
  • Reading documents
  • Finding edges and reading of citizen IDs
  • Mutation maker - finding genetical mutations for bacteria to produce wanted proteins
  • Reaction pathfinder
  • Anomaly detection for metrics - instead of writing the rules manually
  • Searching for collegues with an expertise you need

Disease trajectories

If we record the sequence in which a person get diseases, we get their disease trajectory. By collecting data from a lot of people, some disease trajectories get more statistically significant. We can then study how drugs affect these statistically significant disease trajectories - e.g. trying to cure the initial diseases before they develop to worse or maybe some disease trajectories are caused by having certain genetic dispositions and we could develop cures specifically for people with these genes.

Hearth dysfunctions

There is a conference about ML in cardiology and with it connected ML challenge.

Invariant risk minimization

An example with colored MNIST, e.g. zero is always red in training dataset, but it is always green in test dataset. The distribution is not the same for training and test dataset and models tend to fail, because the easiest way for them to learn is to learn that zero is always red instead of learning its shape.

Other keywords to note down for this problem: risk-averse reinforcement learning, invariant risk minimization, distributionally robust optimization.

Searching for experts

McKinsey is a big company and they had a problem, that as people did not know everyone, it was hard to find the right expert or consultant for a problem. So they let people fill in their profiles and created a ML-based recommender that parsed profiles and recommended relevant experts. They also mentioned, that at first there were some gods of McKinsey, people that work in the company for a long time and they participated in a lot of projects. To solve the problem with recommending only these people over and over again, they introduced exponential decay so that old projects of theirs are not basically counted with. Also, there is a distinction between junior and senior employees, each are compared only within the relevant group. This tool is not for staffing, but for recommending experts.

Private federated learning

  • Federated learning - have a model on the server, send to devices, collect data, periodically update the model on server and resend to devices
  • Differential privacy - every device adds a random noise before sending data to the server update. As the noise is sampled from distribution centered around zero, noise cancels while collecting data from multiple devices and the central model on the server basically learns without noise. If somebody was able to intercept and decrypt the data, they will learn nothing as data are send with the noise.
  • Private federated learning - federated learning using differential privacy

Sidenote - I actually did not realize that face recognition is running on the device as usually ML models run on the server not client side. Moving the model to the device is granting more privacy to customers. On the other hand, the model (parameters and intellectual property) is better protected on server.

Additional notes

  • It is possible to restrict interactions between groups of features for LightGBM.
  • Open sourced tool developed in MSD for protein engineering
  • Open sourced tool developed in MSD for biosynthetic gene cluster detection and classification
  • Book recommendation on investing and trading: Options, Futures, and Other Derivatives by John Hull
  • Banzhaf values - similar to Shapley values
  • Kolmogorov complexity - Length of the shortest program that generates the pattern
  • Elementary cellular automata - divided to Wolfram's classes
  • Hava Siegelmann's work on out-of-domain generalization