Training TensorFlow.js models in Data StudioData StudioTensorflow
To see this in action I shared this report. It uses the Boston Housing data set where you are trying to predict the median house value. It's only been tested in the desktop version of Chrome and should be viewed as a proof of concept at this stage.
I have to say I really like the potential of using TensorFlow.js in Data Studio.
- Tight security model. As I said earlier, no data leaves the community visualization, which takes care of many privacy issues if you want to use sensitive data. No need to export your data as a CSV file and send to an external service.
- Having your (training) data in a Google Sheet somehow makes it more approachable to me than a CSV file. I can update and clean data much more easily in Sheets than in a CSV. In Data Studio I can then play around with the features to see if it changes model performance.
- I could see this working well for consultants that want to build a model for a client using test data, but they don't need to have access to the client's real data. The client would just connect to their Google Sheet and train on the private data.
- Data Studio would be great to teach machine learning concepts. Anyone can create beautiful reports and test different parameters, and you get to see how training works live in the browser. Honestly, I feel I get machine learning much more now.
Having said all this, the tight security policy of community visualizations does mean that you cannot import and load a machine learning model that was trained somewhere else (perhaps in Python or Node.js). Particularly for text and image classification where model training could take a very long time, it would be important to be able to use a model for prediction, not for training. Although I haven't tried this out yet, I think it might actually be possible to use community connectors to overcome this.
Furthermore, to train machine learning models you typically want non-aggregated data, so each line in your Sheet should be one observation. To make each line unique I added an ID column to the data set. Otherwise Data Studio will gladly summarize your data.
I'm glad this proof of concept works, but there's more work to be done and to validate this approach by using Data Studio and TensorFlow.js in a real world application.