Training TensorFlow.js models in Data Studio

Community visualizations in Data Studio are built with Javascript, and after dabbling with creating a visualization with D3, I thought I'd try my hand at integrating TensorFlow.js into Data Studio. My initial motivation was to see if it can actually be done and to learn machine learning concepts. The TensorFlow.js team built some great demos that I used as a starting point.

For privacy reasons, outside communications are completely blocked off in community visualizations. All work and computation has to be done by the client-side Javascript. You can't load external resources, such as a CDN-hosted visualization package or call remote APIs. Even the UI has to be built in JS, not in HTML. I thought tensorflow.js might be a non-starter for those reasons, but it turns out that you can train and save your model locally, and then do predictions.

To see this in action I shared this report. It uses the Boston Housing data set where you are trying to predict the median house value. It's only been tested in the desktop version of Chrome and should be viewed as a proof of concept at this stage.

/>

I have to say I really like the potential of using TensorFlow.js in Data Studio.

Having said all this, the tight security policy of community visualizations does mean that you cannot import and load a machine learning model that was trained somewhere else (perhaps in Python or Node.js). Particularly for text and image classification where model training could take a very long time, it would be important to be able to use a model for prediction, not for training. Although I haven't tried this out yet, I think it might actually be possible to use community connectors to overcome this.

Furthermore, to train machine learning models you typically want non-aggregated data, so each line in your Sheet should be one observation. To make each line unique I added an ID column to the data set. Otherwise Data Studio will gladly summarize your data.

I'm glad this proof of concept works, but there's more work to be done and to validate this approach by using Data Studio and TensorFlow.js in a real world application.