Training TensorFlow.js models in Data Studio

Training models and predicting data using TensorFlow.js in Data Studio

Community visualizations in Data Studio are built with Javascript, and after dabbling with creating a visualization with D3, I thought I’d try my hand at integrating TensorFlow.js into Data Studio. My initial motivation was to see if it can actually be done and to learn machine learning concepts. The TensorFlow.js team built some great demos that I used as a starting point.

For privacy reasons, outside communications are completely blocked off in community visualizations. All work and computation has to be done by the client-side Javascript. You can’t load external resources, such as a CDN-hosted visualization package or call remote APIs. Even the UI has to be built in JS, not in HTML. I thought tensorflow.js might be a non-starter for those reasons, but it turns out that you can train and save your model locally, and then do predictions.

To see this in action I shared this report. It uses the Boston Housing data set where you are trying to predict the median house value. It’s only been tested in the desktop version of Chrome and should be viewed as a proof of concept at this stage.

I have to say I really like the potential of using TensorFlow.js in Data Studio.

  • Tight security model. As I said earlier, no data leaves the community visualization, which takes care of many privacy issues if you want to use sensitive data. No need to export your data as a CSV file and send to an external service.
  • Having your (training) data in a Google Sheet somehow makes it more approachable to me than a CSV file. I can update and clean data much more easily in Sheets than in a CSV. In Data Studio I can then play around with the features to see if it changes model performance.
  • I could see this working well for consultants that want to build a model for a client using test data, but they don’t need to have access to the client’s real data. The client would just connect to their Google Sheet and train on the private data.
  • Data Studio would be great to teach machine learning concepts. Anyone can create beautiful reports and test different parameters, and you get to see how training works live in the browser. Honestly, I feel I get machine learning much more now.

Having said all this, the tight security policy of community visualizations does mean that you cannot import and load a machine learning model that was trained somewhere else (perhaps in Python or Node.js). Particularly for text and image classification where model training could take a very long time, it would be important to be able to use a model for prediction, not for training. Although I haven’t tried this out yet, I think it might actually be possible to use community connectors to overcome this.

Furthermore, to train machine learning models you typically want non-aggregated data, so each line in your Sheet should be one observation. To make each line unique I added an ID column to the data set. Otherwise Data Studio will gladly summarize your data.

I’m glad this proof of concept works, but there’s more work to be done and to validate this approach by using Data Studio and TensorFlow.js in a real world application.

comments powered by Disqus