Google Developers Blog: TensorFlow

Coral summer updates: Post-training quant support, TF Lite delegate, and new models!

Tuesday, August 6, 2019

Posted by Vikram Tank (Product Manager), Coral Team

Coral’s had a busy summer working with customers, expanding distribution, and building new features — and of course taking some time for R&R. We’re excited to share updates, early work, and new models for our platform for local AI with you. The compiler has been updated to version 2.0, adding support for models built using post-training quantization—only when using full integer quantization (previously, we required quantization-aware training)—and fixing a few bugs. As the Tensorflow team mentions in their Medium post “post-training integer quantization enables users to take an already-trained floating-point model and fully quantize it to only use 8-bit signed integers (i.e. `int8`).” In addition to reducing the model size, models that are quantized with this method can now be accelerated by the Edge TPU found in Coral products. We've also updated the Edge TPU Python library to version 2.11.1 to include new APIs for transfer learning on Coral products. The new on-device back propagation API allows you to perform transfer learning on the last layer of an image classification model. The last layer of a model is removed before compilation and implemented on-device to run on the CPU. It allows for near-real time transfer learning and doesn’t require you to recompile the model. Our previously released imprinting API, has been updated to allow you to quickly retrain existing classes or add new ones while leaving other classes alone. You can now even keep the classes from the pre-trained base model. Learn more about both options for on-device transfer learning. Until now, accelerating your model with the Edge TPU required that you write code using either our Edge TPU Python API or in C++. But now you can accelerate your model on the Edge TPU when using the TensorFlow Lite interpreter API, because we've released a TensorFlow Lite delegate for the Edge TPU. The TensorFlow Lite Delegate API is an experimental feature in TensorFlow Lite that allows for the TensorFlow Lite interpreter to delegate part or all of graph execution to another executor—in this case, the other executor is the Edge TPU. Learn more about the TensorFlow Lite delegate for Edge TPU. Coral has also been working with Edge TPU and AutoML teams to release EfficientNet-EdgeTPU: a family of image classification models customized to run efficiently on the Edge TPU. The models are based upon the EfficientNet architecture to achieve the image classification accuracy of a server-side model in a compact size that's optimized for low latency on the Edge TPU. You can read more about the models’ development and performance on the Google AI Blog, and download trained and compiled versions on the Coral Models page. And, as summer comes to an end we also want to share that Arrow offers a student teacher discount for those looking to experiment with the boards in class or the lab this year. We're excited to keep evolving the Coral platform, please keep sending us feedback at coral-support@google.com.

Coral updates: Project tutorials, a downloadable compiler, and a new distributor

Friday, May 31, 2019

Posted by Vikram Tank (Product Manager), Coral Team

We’re committed to evolving Coral to make it even easier to build systems with on-device AI. Our team is constantly working on new product features, and content that helps ML practitioners, engineers, and prototypers create the next generation of hardware. To improve our toolchain, we're making the Edge TPU Compiler available to users as a downloadable binary. The binary works on Debian-based Linux systems, allowing for better integration into custom workflows. Instructions on downloading and using the binary are on the Coral site. We’re also adding a new section to the Coral site that showcases example projects you can build with your Coral board. For instance, Teachable Machine is a project that guides you through building a machine that can quickly learn to recognize new objects by re-training a vision classification model directly on your device. Minigo shows you how to create an implementation of AlphaGo Zero and run it on the Coral Dev Board or USB Accelerator. Our distributor network is growing as well: Arrow will soon sell Coral products.

Updates from Coral: A new compiler and much more

Thursday, April 11, 2019

Posted by Vikram Tank (Product Manager), Coral Team

Coral has been public for about a month now, and we’ve heard some great feedback about our products. As we evolve the Coral platform, we’re making our products easier to use and exposing more powerful tools for building devices with on-device AI. Today, we're updating the Edge TPU model compiler to remove the restrictions around specific architectures, allowing you to submit any model architecture that you want. This greatly increases the variety of models that you can run on the Coral platform. Just be sure to review the TensorFlow ops supported on Edge TPU and model design requirements to take full advantage of the Edge TPU at runtime. We're also releasing a new version of Mendel OS (3.0 Chef) for the Dev Board with a new board management tool called Mendel Development Tool (MDT). To help with the developer workflow, our new C++ API works with the TensorFlow Lite C++ API so you can execute inferences on an Edge TPU. In addition, both the Python and C++ APIs now allow you to run multiple models in parallel, using multiple Edge TPU devices. In addition to these updates, we’re adding new capabilities to Coral with the release of the Environmental Sensor Board. It’s an accessory board for the Coral Dev Platform (and Raspberry Pi) that brings sensor input to your models. It has integrated light, temperature, humidity, and barometric sensors, and the ability to add more sensors via it's four Grove connectors. The secure element on-board also allows for easy communication with the Google Cloud IOT Core. The team has also been working with partners to help them evaluate whether Coral is the right fit for their products. We’re excited that Oivi has chosen us to be the base platform of their new handheld AI-camera. This product will help prevent blindness among diabetes patients by providing early, automated detection of diabetic retinopathy. Anders Eikenes, CEO of Oivi, says “Oivi is dedicated towards providing patient-centric eye care for everyone - including emerging markets. We were honoured to be selected by Google to participate in their Coral alpha program, and are looking forward to our continued cooperation. The Coral platform gives us the ability to run our screening ML models inside a handheld device; greatly expanding the access and ease of diabetic retinopathy screening.” Finally, we’re expanding our distributor network to make it easier to get Coral boards into your hands around the world. This month, Seeed and NXP will begin to sell Coral products, in addition to Mouser. We're excited to keep evolving the Coral platform, please keep sending us feedback at coral-support@google.com. You can see the full release notes on Coral site.

Introducing Coral: Our platform for development with local AI

Wednesday, March 6, 2019

Posted by Billy Rutledge (Director) and Vikram Tank (Product Mgr), Coral Team AI can be beneficial for everyone, especially when we all explore, learn, and build together. To that end, Google's been developing tools like TensorFlow and AutoML to ensure that everyone has access to build with AI. Today, we're expanding the ways that people can build out their ideas and products by introducing Coral into public beta. Coral is a platform for building intelligent devices with local AI. Coral offers a complete local AI toolkit that makes it easy to grow your ideas from prototype to production. It includes hardware components, software tools, and content that help you create, train and run neural networks (NNs) locally, on your device. Because we focus on accelerating NN's locally, our products offer speedy neural network performance and increased privacy — all in power-efficient packages. To help you bring your ideas to market, Coral components are designed for fast prototyping and easy scaling to production lines. Our first hardware components feature the new Edge TPU, a small ASIC designed by Google that provides high-performance ML inferencing for low-power devices. For example, it can execute state-of-the-art mobile vision models such as MobileNet V2 at 100+ fps, in a power efficient manner.

Coral Camera Module, Dev Board and USB Accelerator For new product development, the Coral Dev Board is a fully integrated system designed as a system on module (SoM) attached to a carrier board. The SoM brings the powerful NXP iMX8M SoC together with our Edge TPU coprocessor (as well as Wi-Fi, Bluetooth, RAM, and eMMC memory). To make prototyping computer vision applications easier, we also offer a Camera that connects to the Dev Board over a MIPI interface. To add the Edge TPU to an existing design, the Coral USB Accelerator allows for easy integration into any Linux system (including Raspberry Pi boards) over USB 2.0 and 3.0. PCIe versions are coming soon, and will snap into M.2 or mini-PCIe expansion slots. When you're ready to scale to production we offer the SOM from the Dev Board and PCIe versions of the Accelerator for volume purchase. To further support your integrations, we'll be releasing the baseboard schematics for those who want to build custom carrier boards. Our software tools are based around TensorFlow and TensorFlow Lite. TF Lite models must be quantized and then compiled with our toolchain to run directly on the Edge TPU. To help get you started, we're sharing over a dozen pre-trained, pre-compiled models that work with Coral boards out of the box, as well as software tools to let you re-train them. For those building connected devices with Coral, our products can be used with Google Cloud IoT. Google Cloud IoT combines cloud services with an on-device software stack to allow for managed edge computing with machine learning capabilities. Coral products are available today, along with product documentation, datasheets and sample code at g.co/coral. We hope you try our products during this public beta, and look forward to sharing more with you at our official launch.

Introduction to Fairness in Machine Learning

Tuesday, November 27, 2018

Posted by Andrew Zaldivar, Developer Advocate, Google AI A few months ago, we announced our AI Principles, a set of commitments we are upholding to guide our work in artificial intelligence (AI) going forward. Along with our AI Principles, we shared a set of recommended practices to help the larger community design and build responsible AI systems. In particular, one of our AI Principles speaks to the importance of recognizing that AI algorithms and datasets are the product of the environment—and, as such, we need to be conscious of any potential unfair outcomes generated by an AI system and the risk it poses across cultures and societies. A recommended practice here for practitioners is to understand the limitations of their algorithm and datasets—but this is a problem that is far from solved. To help practitioners take on the challenge of building fairer and more inclusive AI systems, we developed a short, self-study training module on fairness in machine learning. This new module is part of our Machine Learning Crash Course, which we highly recommend taking first—unless you know machine learning really well, in which case you can jump right into the Fairness module. The Fairness module features a hands-on technical exercise. This exercise demonstrates how you can use tools and techniques that may already exist in your development stack (such as Facets Dive, Seaborn, pandas, scikit-learn and TensorFlow Estimators to name a few) to explore and discover ways to make your machine learning system fairer and more inclusive. We created our exercise in a Colaboratory notebook, which you are more than welcome to use, modify and distribute for your own purposes. From exploring datasets to analyzing model performance, it's really easy to forget to make time for responsible reflection when building an AI system. So rather than having you run every code cell in sequential order without pause, we added what we call FairAware tasks throughout the exercise. FairAware tasks help you zoom in and out of the problem space. That way, you can remind yourself of the big picture: finding the undesirable biases that could disproportionately affect model performance across groups. We hope a process like FairAware will become part of your workflow, helping you find opportunities for inclusion.

FairAware task guiding practitioner to compare performances across gender. The Fairness module was created to provide you with enough of an understanding to get started in addressing fairness and inclusion in AI. Keep an eye on this space for future work as this is only the beginning. If you wish to learn more from our other examples, check out the Fairness section of our Responsible AI Practices guide. There, you will find a full set of Google recommendations and resources. From our latest research proposal on reporting model performance with fairness and inclusion considerations, to our recently launched diagnostic tool that lets anyone investigate trained models for fairness, our resource guide highlights many areas of research and development in fairness. Let us know what your thoughts are on our Fairness module. If you have any specific comments on the notebook exercise itself, then feel free to leave a comment on our GitHub repo. On behalf of many contributors and supporters,Andrew Zaldivar – Developer Advocate, Google AI

New AIY Edge TPU Boards

Wednesday, July 25, 2018

Posted by Billy Rutledge, Director of AIY Projects Over the past year and a half, we've seen more than 200K people build, modify, and create with our Voice Kit and Vision Kit products. Today at Cloud Next we announced two new devices to help professional engineers build new products with on-device machine learning(ML) at their core: the AIY Edge TPU Dev Board and the AIY Edge TPU Accelerator. Both are powered by Google's Edge TPU and represent our first steps towards expanding AIY into a platform for experimentation with on-device ML.

The Edge TPU is Google's purpose-built ASIC chip designed to run TensorFlow Lite ML models on your device. We've learned that performance-per-watt and performance-per-dollar are critical benchmarks when processing neural networks within a small footprint. The Edge TPU delivers both in a package that's smaller than the head of a penny. It can accelerate ML inferencing on device, or can pair with Google Cloud to create a full cloud-to-edge ML stack. In either configuration, by processing data directly on-device, a local ML accelerator increases privacy, removes the need for persistent connections, reduces latency, and allows for high performance using less power.

The AIY Edge TPU Dev Board is an all-in-one development board that allows you to prototype embedded systems that demand fast ML inferencing. The baseboard provides all the peripheral connections you need to effectively prototype your device — including a 40-pin GPIO header to integrate with various electrical components. The board also features a removable System-on-module (SOM) daughter board can be directly integrated into your own hardware once you're ready to scale.

The AIY Edge TPU Accelerator is a neural network coprocessor for your existing system. This small USB-C stick can connect to any Linux-based system to perform accelerated ML inferencing. The casing includes mounting holes for attachment to host boards such as a Raspberry Pi Zero or your custom device. On-device ML is still in its early days, and we're excited to see how these two products can be applied to solve real world problems — such as increasing manufacturing equipment reliability, detecting quality control issues in products, tracking retail foot-traffic, building adaptive automotive sensing systems, and more applications that haven't been imagined yet. Both devices will be available online this fall in the US with other countries to follow shortly. For more product information visit g.co/aiy and sign up to be notified as products become available.

Showcase your innovations at the 2018 China-US Young Makers Competition

Tuesday, April 17, 2018

Posted by Bill Luan, Senior Program Manager & Greater China Regional Lead, Developer Relations The 2018 China-U.S. Young Maker Competition launched this week by the event co-organizer Hackster.IO. Project submissions are now open to all makers, developers, and students ages 18-40 in both China and the United States. Google is the corporate sponsor for this year's competition. Since 2014, this competition has been running annually in supporting the U.S.-China High-Level Consultation on People-to-People Exchange program. The competition encourages makers in both countries to create innovative products focusing on community development, education, environmental protection, health & fitness, energy, transportation and sustainable development. Participants have the freedom to choose appropriate technologies to enable their innovations, and we encourage makers to consider open source technologies, such as TensorFlow and AIY Projects for artificial intelligence use cases, Android Studio for mobile applications, as well as Android Things for IoT solutions. The top 10 projects in the U.S. will win an all-expenses-paid trip to Beijing, to compete against Chinese makers on August 13-17 for the chance at $30,000 in prizes. Further, there are 35 additional chances to win Google prizes! So join the competition, and let your innovation shine on the global stage! For more details, please see the event announcement on Hackster.IO.

Highlights from TensorFlow Dev Summit 2018

Friday, April 6, 2018

Originally posted by Sandeep Gupta, Product Manager for TensorFlow, on behalf of the TensorFlow team on the TensorFlow Blog. On March 30th, we held the second TensorFlow Developer Summit at the Computer History Museum in Mountain View, CA! The event brought together over 500 TensorFlow users in-person and thousands tuning into the livestream at TensorFlow events around the world. The day was filled with new product announcements along with technical talks from the TensorFlow team and guest speakers. Here are the highlights from the event: Machine learning is solving challenging problems that impact everyone around the world. Problems that we thought were impossible or too complex to solve are now possible with this technology. Using TensorFlow, we've already seen great advancements in many different fields. For example:

Astrophysicists are using TensorFlow to analyze large amounts of data from the Kepler mission to discover new planets.

Medical researchers are using ML techniques with TensorFlow to assess a person's cardiovascular risk of a heart attack and stroke.

Air Traffic Controllers are using TensorFlow to predict flight routes through crowded airspace for safe and efficient landings.

Engineers are using TensorFlow to analyze auditory data in the rainforest to detect logging trucks and other illegal activities.

Scientists in Africa are using TensorFlow to detect diseases in Cassava plants to improving yield for farmers.

We're excited to see these amazing uses of TensorFlow and are committed to making it accessible to more developers. This is why we're pleased to announce new updates to TensorFlow that will help improve the developer experience! We're making TensorFlow easier to use Researchers and developers want a simpler way of using TensorFlow. We're integrating a more intuitive programming model for Python developers called eager execution that removes the distinction between the construction and execution of computational graphs. You can develop with eager execution and then use the same code to generate the equivalent graph for training at scale using the Estimator high-level API. We're also announcing a new method for running Estimator models on multiple GPUs on a single machine. This allows developers to quickly scale their models with minimal code changes. As machine learning models become more abundant and complex, we want to make it easier for developers to share, reuse, and debug them. To help developers share and reuse models, we're announcing TensorFlow Hub, a library built to foster the publication and discovery of modules (self-contained pieces of TensorFlow graph) that can be reused across similar tasks. Modules contain weights that have been pre-trained on large datasets, and may be retrained and used in your own applications. By reusing a module, a developer can train a model using a smaller dataset, improve generalization, or simply speed up training. To make debugging models easier, we're also releasing a new interactive graphical debugger plug-in as part of the TensorBoard visualization tool that helps you inspect and step through internal nodes of a computation graph in real-time.

Model training is only one part of the machine learning process and developers need a solution that works end-to-end to build real-world ML systems. Towards this end, we're announcing the roadmap for TensorFlow Extended (TFX) along with the launch of TensorFlow Model Analysis, an open-source library that combines the power of TensorFlow and Apache Beam to compute and visualize evaluation metrics. The components of TFX that have been released thus far (including TensorFlow Model Analysis, TensorFlow Transform, Estimators, and TensorFlow Serving) are well integrated and let developers prepare data, train, validate, and deploy TensorFlow models in production.

TensorFlow is available in more languages and platforms Along with making TensorFlow easier to use, we're announcing that developers can use TensorFlow in new languages. TensorFlow.js is a new ML framework for JavaScript developers. Machine learning in the browser using TensorFlow.js opens exciting new possibilities, including interactive ML and support for scenarios where all data remains client-side. It can be used to build and train modules entirely in the browser, as well as import TensorFlow and Keras models trained offline for inference using WebGL acceleration. The Emoji Scavenger Hunt game is a fun example of an application built using TensorFlow.js.

We also have some exciting news for Swift programmers: TensorFlow for Swift will be open sourced this April. TensorFlow for Swift is not your typical language binding for TensorFlow. It integrates first-class compiler and language support, providing the full power of graphs with the usability of eager execution. The project is still in development, with more updates coming soon! We're also sharing the latest updates to TensorFlow Lite, TensorFlow's lightweight, cross-platform solution for deploying trained ML models on mobile and other edge devices. In addition to existing support for Android and iOS, we're announcing support for Raspberry Pi, increased support for ops/models (including custom ops), and describing how developers can easily use TensorFlow Lite in their own apps. The TensorFlow Lite core interpreter is now only 75KB in size (vs 1.1 MB for TensorFlow) and we're seeing speedups of up to 3x when running quantized image classification models on TensorFlow Lite vs. TensorFlow. For hardware support, TensorFlow now has integration with NVIDIA's TensorRT. TensorRT is a library that optimizes deep learning models for inference and creates a runtime for deployment on GPUs in production environments. It brings a number of optimizations to TensorFlow and automatically selects platform specific kernels to maximize throughput and minimizes latency during inference on GPUs. For users who run TensorFlow on CPUs, our partnership with Intel has delivered integration with a highly optimized Intel MKL-DNN open source library for deep learning. When using Intel MKL-DNN, we observed up to 3x inference speedup on various Intel CPU platforms. The list of platforms that run TensorFlow has grown to include Cloud TPUs, which were released in beta last month. The Google Cloud TPU team has already delivered a strong 1.6X performance increase in ResNet-50 performance since launch. These improvements will be available to TensorFlow users with the 1.8 release soon. Enabling new applications and domains using TensorFlow Many data analysis problems are solved using statistical and probabilistic methods. Beyond deep learning and neural network models, TensorFlow now provides state-of-the-art methods for Bayesian analysis via the TensorFlow Probability API. This library contains building blocks like probability distributions, sampling methods, and new metrics and losses. Many other classical ML methods also have increased support. As an example, boosted decision trees can be easily trained and deployed using pre-made high-level classes. Machine learning and TensorFlow have already helped solve challenging problems in many different fields. Another area where we see TensorFlow having a big impact is in genomics, which is why we're releasing Nucleus, a library for reading, writing, and filtering common genomics file formats for use in TensorFlow. This, along with DeepVariant, an open-source TensorFlow based tool for genome variant discovery, will help spur new research and advances in genomics. Expanding community resources and engagement These updates to TensorFlow aim to benefit and grow the community of users and contributors - the thousands of people who play a part in making TensorFlow one of the most popular ML frameworks in the world. To continue to engage with the community and stay up-to-date with TensorFlow, we've launched the new official TensorFlow blog and the TensorFlow YouTube channel. We're also making it easier for our community to collaborate by launching new mailing lists and Special Interest Groups designed to support open-source work on specific projects. To see how you can be a part of the community, visit the TensorFlow Community page and as always, you can follow TensorFlow on Twitter for the latest news. We're incredibly thankful to everyone who has helped make TensorFlow a successful ML framework in the past two years. Thanks for attending, thanks for watching, and remember to use #MadeWithTensorFlow to share how you are solving impactful and challenging problems with machine learning and TensorFlow! #imgHalf { width: 75%; display: block; margin: 10px auto 10px auto; }

Announcing TensorRT integration with TensorFlow 1.7

Tuesday, March 27, 2018

Posted by Laurence Moroney (Google) and Siddarth Sharma (NVIDIA) Today we are announcing integration of NVIDIA® TensorRT^TM and TensorFlow. TensorRT is a library that optimizes deep learning models for inference and creates a runtime for deployment on GPUs in production environments. It brings a number of FP16 and INT8 optimizations to TensorFlow and automatically selects platform specific kernels to maximize throughput and minimizes latency during inference on GPUs. We are excited about the new integrated workflow as it simplifies the path to use TensorRT from within TensorFlow with world-class performance. In our tests, we found that ResNet-50 performed 8x faster under 7 ms latency with the TensorFlow-TensorRT integration using NVIDIA Volta Tensor Cores as compared with running TensorFlow only.

Sub-Graph Optimizations within TensorFlow Now in TensorFlow 1.7, TensorRT optimizes compatible sub-graphs and let's TensorFlow execute the rest. This approach makes it possible to rapidly develop models with the extensive TensorFlow feature set while getting powerful optimizations with TensorRT when performing inference. If you were already using TensorRT with TensorFlow models, you know that certain unsupported TensorFlow layers had to be imported manually, which in some cases could be time consuming. From a workflow perspective, you need to ask TensorRT to optimize TensorFlow's sub-graphs and replace each subgraph with a TensorRT optimized node. The output of this step is a frozen graph that can then be used in TensorFlow as before.

During inference, TensorFlow executes the graph for all supported areas, and calls TensorRT to execute TensorRT optimized nodes. As an example, if your graph has 3 segments, A, B and C. Segment B is optimized by TensorRT and replaced by a single node. During inference, TensorFlow executes A, then calls TensorRT to execute B, and then TensorFlow executes C. The newly added TensorFlow API to optimize TensorRT takes the frozen TensorFlow graph, applies optimizations to sub-graphs and sends back to TensorFlow a TensorRT inference graph with optimizations applied. See the code below as an example. # Reserve memory for TensorRT inference engine gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = number_between_0_and_1) ... trt_graph = trt.create_inference_graph( input_graph_def = frozen_graph_def, outputs = output_node_name, max_batch_size=batch_size, max_workspace_size_bytes=workspace_size, precision_mode=precision) # Get optimized graph The per_process_gpu_memory_fraction parameter defines the fraction of GPU memory that TensorFlow is allowed to use, the remaining being available for TensorRT. This parameter should be set the first time the TensorFlow-TensorRT process is started. As an example, a value of 0.67 would allocate 67% of GPU memory for TensorFlow and the remaining 33 % for TensorRT engines. The create_inference_graph function takes a frozen TensorFlow graph and returns an optimized graph with TensorRT nodes. Let's look at the function's parameters:

input_graph_def: frozen TensorFlow graph

outputs: list of strings with names of output nodes e.g. ["resnet_v1_50/predictions/Reshape_1"]

max_batch_size: integer, size of input batch e.g. 16

max_workspace_size_bytes: integer, maximum GPU memory size available for TensorRT

precision_mode: string, allowed values "FP32", "FP16" or "INT8"

As an example, if the GPU has 12GB memory, in order to allocate ~4GB for TensorRT engines, set the per_process_gpu_memory_fraction parameter to ( 12 - 4 ) / 12 = 0.67 and the max_workspace_size_bytes parameter to 4000000000. Lets apply the new API to ResNet-50 and see what the optimized model looks like in TensorBoard. The complete code to run the example is available here. The image on the left is ResNet-50 without TensorRT optimizations and the right image is after. In this case, most of the graph gets optimized by TensorRT and replaced by a single node (highlighted).

Optimized INT8 Inference performance TensorRT provides capabilities to take models trained in single (FP32) and half (FP16) precision and convert them for deployment with INT8 quantizations at reduced precision with minimal accuracy loss. INT8 models compute faster and place lower requirements on bandwidth but present a challenge in representing weights and activations of neural networks because of the reduced dynamic range available.

Dynamic Range

Minimum Positive Value

FP32

-3.4×1038 ~ +3.4×1038

1.4 × 10−45

FP16

65504 ~ +65504

5.96 x 10-8

INT8

-128 ~ +127

To address this, TensorRT uses a calibration process that minimizes the information loss when approximating the FP32 network with a limited 8-bit integer representation. With the new integration, after optimizing the TensorFlow graph with TensorRT, you can pass the graph to TensorRT for calibration as below. trt_graph=trt.calib_graph_to_infer_graph(calibGraph) The rest of the inference workflow remains unchanged from above. The output of this step is a frozen graph that is executed by TensorFlow as described earlier. Automatically use Tensor Cores on NVIDIA Volta GPUs TensorRT runs half precision TensorFlow models on Tensor Cores in VOLTA GPUs for inference. Tensor Cores, provide 8x more throughput than single precision math pipelines. Half precision (also known as FP16) data compared to higher precision FP32 vs FP64 reduces memory usage of the neural network. This allows training and deployment of larger networks, and FP16 data transfers take less time than FP32 or FP64 transfers. Each Tensor Core performs D = A x B + C, where A, B, C and D are matrices. A and B are half-precision 4x4 matrices, whereas D and C can be either half or single precision 4x4 matrices. The peak performance of Tensor Cores on the V100 is about an order of magnitude (10x) faster than double precision (FP64) and about 4 times faster than single precision (FP32). Availability We are excited about this release and will continue to work closely with NVIDIA to enhance this integration. We expect the new solutions ensure the highest performance possible while maintaining the ease and flexibility of TensorFlow. And as TensorRT supports more networks, you will automatically benefit from the updates without any changes to your code. To get the new solution, you can use the standard pip install process once TensorFlow 1.7 is released: pip install tensorflow-gpu r1.7 Till then, find detailed installation instructions here: https://github.com/tensorflow/tensorflow/tree/r1.7/tensorflow/contrib/tensorrt Try it out and let us know what you think! h2 { font-size: 130%; } #imgFull { display: block; margin: 10px auto; padding: 0; width: 90%; } .flexParent { display: flex; justify-content: space-around; align-items: center; width: 100%; } .flexChild { width: 50%; } .flexChild img { width: 100%; margin: 0; } table, tr, td { border: 1px solid gray; } tr { width: 100%; } td { width: 33%; padding: 1%; text-align: center; } #green { background: #528e25; color: white; font-weight: bold; }

Machine Learning Crash Course

Friday, March 2, 2018

Posted by Barry Rosenberg, Google Engineering Education Team Today, we're happy to share our Machine Learning Crash Course (MLCC) with the world. MLCC is one of the most popular courses created for Google engineers. Our engineering education team has delivered this course to more than 18,000 Googlers, and now you can take it too! The course develops intuition around fundamental machine learning concepts. What does the course cover? MLCC covers many machine learning fundamentals, starting with loss and gradient descent, then building through classification models and neural nets. The programming exercises introduce TensorFlow. You'll watch brief videos from Google machine learning experts, read short text lessons, and play with educational gadgets devised by instructional designers and engineers. How much does it cost? MLCC is free. I don't get it. Why are you offering MLCC to everyone? We believe that the potential of machine learning is so vast that every technical person should learn machine learning fundamentals. We're offering the course in English, Spanish, Korean, Mandarin, and French. Does the real world make an appearance in the course? Yes, MLCC ends with short lessons on designing real-world machine learning systems. MLCC also contains sections enabling you to learn from the mistakes that our experts have made. Do I have enough mathematical background to understand MLCC? Understanding a little algebra and a little elementary statistics (mean and standard deviation) is helpful. If you understand calculus, you'll get a bit more out of the course, but calculus is not a requirement. MLCC contains a helpful section to refresh your memory on the background math. Is this a programming course? MLCC contains some Python programming exercises. However, those exercises comprise only a small percentage of the course, which non-programmers may safely skip. I'm new to Python. Will the programming exercises be too hard for me? Many of the Google engineers who took MLCC didn't know any Python but still completed the exercises. That's because you'll write only a few lines of code during the programming exercises. Instead of writing code from scratch, you'll primarily manipulate the values of existing variables. That said, the code will be easier to understand if you can program in Python. But how will I learn machine learning concepts without programming? MLCC relies on a variety of media and hands-on interactive tools to build intuition in fundamental machine learning concepts. You need a technical mind, but you don't need programming skills. How can I show off my machine learning skills? As your knowledge about Machine Learning grows, you can test your skill by helping others. We're also kicking off a Kaggle competition to help DonorsChoose.org. DonorsChoose.org is an organization that empowers public school teachers from across the country to request materials and experiences they need to help their students grow. Teachers submit hundreds of thousands of project proposals each year; 500,000 proposals are expected in 2018. Currently, DonorsChoose.org relies on a large number of volunteers to screen the proposals. The Kaggle competition hopes to help DonorsChoose.org use ML to accelerate the screening process, which will enable volunteers to make better use of their time. In addition, this work should help increase the consistency of decisions about projects. Is MLCC Google's only machine learning educational project? MLCC is merely one of many ways to learn about machine learning. To explore the universe of machine learning educational opportunities from Google, see our new Learn with Google AI program at g.co/learnwithgoogleai. To start on MLCC, see g.co/machinelearningcrashcourse.

Announcing TensorFlow 1.5

Friday, January 26, 2018

Posted by Laurence Moroney, Developer Advocate We're delighted to announce that TensorFlow 1.5 is now public! Install it now to get a bunch of new features that we hope you'll enjoy! Eager Execution for TensorFlow First off, Eager Execution for TensorFlow is now available as a preview. We've heard lots of feedback about the programming style of TensorFlow, and how developers really want an imperative, define-by-run programming style. With Eager Execution for TensorFlow enabled, you can execute TensorFlow operations immediately as they are called from Python. This makes it easier to get started with TensorFlow, and can make research and development more intuitive. For example, think of a simple computation like a matrix multiplication. Today, in TensorFlow it would look something like this: x = tf.placeholder(tf.float32, shape=[1, 1]) m = tf.matmul(x, x) with tf.Session() as sess: print(sess.run(m, feed_dict={x: [[2.]]})) If you enable Eager Execution for TensorFlow, it will look more like this: x = [[2.]] m = tf.matmul(x, x) print(m) You can learn more about Eager Execution for TensorFlow here (check out the user guide linked at the bottom of the page, and also this presentation) and the API docs here. TensorFlow Lite

The Developer preview of TensorFlow Lite is built into version 1.5. TensorFlow Lite, TensorFlow's lightweight solution for mobile and embedded devices, lets you take a trained TensorFlow model and convert it into a .tflite file which can then be executed on a mobile device with low-latency. Thus the training doesn't have to be done on the device, nor does the device need to upload data to the cloud to have it worked upon. So, for example, if you want to classify an image, a trained model could be deployed to the device and classification of the image is done on-device directly. TensorFlow Lite includes a sample app to get you started. This app uses the MobileNet model of 1001 unique image categories. It recognizes an image and matches it to a number of categories, listing the top 3 that it recognizes. The app is available on both Android and iOS. You can learn more about TensorFlow Lite, and how to convert your models to be available on mobile here. GPU Acceleration Updates If you are using GPU Acceleration on Windows or Linux, TensorFlow 1.5 now has CUDA 9 and cuDNN 7 support built-in. To learn more about NVidia's Compute Unified Device Architecture (CUDA) 9, check out NVidia's site here. This is enhanced by the CUDA Deep Neural Network Library (cuDNN), the latest release of which is version 7. Support for this is now included in TensorFlow 1.5. Here are some Medium articles on GPU support on Windows and Linux, and how to install them on your workstation (if it supports the requisite hardware) Documentation Site Updates In line with this release we've also overhauled the documentation site, including an improved Getting Started flow that will get you from no knowledge to building a neural network to classify different types of iris in a very short time. Check it out!

Other Enhancements Beyond these features, there's lots of other enhancements to Accelerated Linear Algebra (XLA), updates to RunConfig and much more. Check the release notes here. Installing TensorFlow 1.5 To get TensorFlow 1.5, you can use the standard pip installation (or pip3 if you use python3) $ pip install --ignore-installed --upgrade tensorflow pre.prettyprint { padding: 10px; }

Creating Custom Estimators in TensorFlow

Tuesday, December 19, 2017

Posted by the TensorFlow Team Welcome to Part 3 of a blog series that introduces TensorFlow Datasets and Estimators. Part 1 focused on pre-made Estimators, while Part 2 discussed feature columns. Here in Part 3, you'll learn how to create your own custom Estimators. In particular, we're going to demonstrate how to create a custom Estimator that mimics DNNClassifier's behavior when solving the Iris problem. If you are feeling impatient, feel free to compare and contrast the following full programs:

Source code for Iris implemented with the pre-made DNNClassifier Estimator here.

Source code for Iris implemented with the custom Estimator here.

Pre-made vs. custom As Figure 1 shows, pre-made Estimators are subclasses of the tf.estimator.Estimator base class, while custom Estimators are an instantiation of tf.estimator.Estimator:

Figure 1. Pre-made and custom Estimators are all Estimators. Pre-made Estimators are fully-baked. Sometimes though, you need more control over an Estimator's behavior. That's where custom Estimators come in. You can create a custom Estimator to do just about anything. If you want hidden layers connected in some unusual fashion, write a custom Estimator. If you want to calculate a unique metric for your model, write a custom Estimator. Basically, if you want an Estimator optimized for your specific problem, write a custom Estimator. A model function (model_fn) implements your model. The only difference between working with pre-made Estimators and custom Estimators is:

With pre-made Estimators, someone already wrote the model function for you.

With custom Estimators, you must write the model function.

Your model function could implement a wide range of algorithms, defining all sorts of hidden layers and metrics. Like input functions, all model functions must accept a standard group of input parameters and return a standard group of output values. Just as input functions can leverage the Dataset API, model functions can leverage the Layers API and the Metrics API. Iris as a pre-made Estimator: A quick refresher Before demonstrating how to implement Iris as a custom Estimator, we wanted to remind you how we implemented Iris as a pre-made Estimator in Part 1 of this series. In that Part, we created a fully connected, deep neural network for the Iris dataset simply by instantiating a pre-made Estimator as follows: # Instantiate a deep neural network classifier. classifier = tf.estimator.DNNClassifier( feature_columns=feature_columns, # The input features to our model. hidden_units=[10, 10], # Two layers, each with 10 neurons. n_classes=3, # The number of output classes (three Iris species). model_dir=PATH) # Pathname of directory where checkpoints, etc. are stored. The preceding code creates a deep neural network with the following characteristics:

A list of feature columns. (The definitions of the feature columns are not shown in the preceding snippet.) For Iris, the feature columns are numeric representations of four input features.

Two fully connected layers, each having 10 neurons. A fully connected layer (also called a dense layer) is connected to every neuron in the subsequent layer.

An output layer consisting of a three-element list. The elements of that list are all floating-point values; the sum of those values must be 1.0 (this is a probability distribution).

A directory (PATH) in which the trained model and various checkpoints will be stored.

Figure 2 illustrates the input layer, hidden layers, and output layer of the Iris model. For reasons pertaining to clarity, we've only drawn 4 of the nodes in each hidden layer.

Figure 2. Our implementation of Iris contains four features, two hidden layers, and a logits output layer. Let's see how to solve the same Iris problem with a custom Estimator. Input function One of the biggest advantages of the Estimator framework is that you can experiment with different algorithms without changing your data pipeline. We will therefore reuse much of the input function from Part 1: def my_input_fn(file_path, repeat_count=1, shuffle_count=1): def decode_csv(line): parsed_line = tf.decode_csv(line, [[0.], [0.], [0.], [0.], [0]]) label = parsed_line[-1] # Last element is the label del parsed_line[-1] # Delete last element features = parsed_line # Everything but last elements are the features d = dict(zip(feature_names, features)), label return d dataset = (tf.data.TextLineDataset(file_path) # Read text file .skip(1) # Skip header row .map(decode_csv, num_parallel_calls=4) # Decode each line .cache() # Warning: Caches entire dataset, can cause out of memory .shuffle(shuffle_count) # Randomize elems (1 == no operation) .repeat(repeat_count) # Repeats dataset this # times .batch(32) .prefetch(1) # Make sure you always have 1 batch ready to serve ) iterator = dataset.make_one_shot_iterator() batch_features, batch_labels = iterator.get_next() return batch_features, batch_labels Notice that the input function returns the following two values:

batch_features, which is a dictionary. The dictionary's keys are the names of the features, and the dictionary's values are the feature's values.

batch_labels, which is a list of the label's values for a batch.

Refer to Part 1 for full details on input functions. Create feature columns As detailed in Part 2 of our series, you must define your model's feature columns to specify the representation of each feature. Whether working with pre-made Estimators or custom Estimators, you define feature columns in the same fashion. For example, the following code creates feature columns representing the four features (all numerical) in the Iris dataset: feature_columns = [ tf.feature_column.numeric_column(feature_names[0]), tf.feature_column.numeric_column(feature_names[1]), tf.feature_column.numeric_column(feature_names[2]), tf.feature_column.numeric_column(feature_names[3]) ]Write a model function We are now ready to write the model_fn for our custom Estimator. Let's start with the function declaration: def my_model_fn( features, # This is batch_features from input_fn labels, # This is batch_labels from input_fn mode): # Instance of tf.estimator.ModeKeys, see below The first two arguments are the features and labels returned from the input function; that is, features and labels are the handles to the data your model will use. The mode argument indicates whether the caller is requesting training, predicting, or evaluating. To implement a typical model function, you must do the following:

Define the model's layers.

Specify the model's behavior in three the different modes.

Define the model's layers If your custom Estimator generates a deep neural network, you must define the following three layers:

an input layer

one or more hidden layers

an output layer

Use the Layers API (tf.layers) to define hidden and output layers. If your custom Estimator generates a linear model, then you only have to generate a single layer, which we'll describe in the next section. Define the input layer Call tf.feature_column.input_layer to define the input layer for a deep neural network. For example: # Create the layer of input input_layer = tf.feature_column.input_layer(features, feature_columns) The preceding line creates our input layer, reading our features through the input function and filtering them through the feature_columns defined earlier. See Part 2 for details on various ways to represent data through feature columns. To create the input layer for a linear model, call tf.feature_column.linear_model instead of tf.feature_column.input_layer. Since a linear model has no hidden layers, the returned value from tf.feature_column.linear_model serves as both the input layer and output layer. In other words, the returned value from this function is the prediction. Establish Hidden Layers If you are creating a deep neural network, you must define one or more hidden layers. The Layers API provides a rich set of functions to define all types of hidden layers, including convolutional, pooling, and dropout layers. For Iris, we're simply going to call tf.layers.Dense twice to create two dense hidden layers, each with 10 neurons. By "dense," we mean that each neuron in the first hidden layer is connected to each neuron in the second hidden layer. Here's the relevant code: # Definition of hidden layer: h1 # (Dense returns a Callable so we can provide input_layer as argument to it) h1 = tf.layers.Dense(10, activation=tf.nn.relu)(input_layer) # Definition of hidden layer: h2 # (Dense returns a Callable so we can provide h1 as argument to it) h2 = tf.layers.Dense(10, activation=tf.nn.relu)(h1) The inputs parameter to tf.layers.Dense identifies the preceding layer. The layer preceding h1 is the input layer.

Figure 3. The input layer feeds into hidden layer 1. The preceding layer to h2 is h1. So, the string of layers now looks like this:

Figure 4. Hidden layer 1 feeds into hidden layer 2. The first argument to tf.layers.Dense defines the number of its output neurons—10 in this case. The activation parameter defines the activation function—Relu in this case. Note that tf.layers.Dense provides many additional capabilities, including the ability to set a multitude of regularization parameters. For the sake of simplicity, though, we're going to simply accept the default values of the other parameters. Also, when looking at tf.layers you may encounter lower-case versions (e.g. tf.layers.dense). As a general rule, you should use the class versions which start with a capital letter (tf.layers.Dense). Output Layer We'll define the output layer by calling tf.layers.Dense yet again: # Output 'logits' layer is three numbers = probability distribution # (Dense returns a Callable so we can provide h2 as argument to it) logits = tf.layers.Dense(3)(h2) Notice that the output layer receives its input from h2. Therefore, the full set of layers is now connected as follows:

Figure 5. Hidden layer 2 feeds into the output layer. When defining an output layer, the units parameter specifies the number of possible output values. So, by setting units to 3, the tf.layers.Dense function establishes a three-element logits vector. Each cell of the logits vector contains the probability of the Iris being Setosa, Versicolor, or Virginica, respectively. Since the output layer is a final layer, the call to tf.layers.Dense omits the optional activation parameter. Implement training, evaluation, and prediction The final step in creating a model function is to write branching code that implements prediction, evaluation, and training. The model function gets invoked whenever someone calls the Estimator's train, evaluate, or predict methods. Recall that the signature for the model function looks like this: def my_model_fn( features, # This is batch_features from input_fn labels, # This is batch_labels from input_fn mode): # Instance of tf.estimator.ModeKeys, see below Focus on that third argument, mode. As the following table shows, when someone calls train, evaluate, or predict, the Estimator framework invokes your model function with the

mode
parameter

set as follows: Table 2. Values of mode.

Caller invokes custom Estimator method...

Estimator framework calls your model function with the mode parameter set to...

train()

ModeKeys.TRAIN

evaluate()

ModeKeys.EVAL

predict()

ModeKeys.PREDICT

For example, suppose you instantiate a custom Estimator to generate an object named classifier. Then, you might make the following call (never mind the parameters to my_input_fn at this time): classifier.train( input_fn=lambda: my_input_fn(FILE_TRAIN, repeat_count=500, shuffle_count=256)) The Estimator framework then calls your model function with mode set to ModeKeys.TRAIN. Your model function must provide code to handle all three of the mode values. For each mode value, your code must return an instance of tf.estimator.EstimatorSpec, which contains the information the caller requires. Let's examine each mode. PREDICT When model_fn is called with mode == ModeKeys.PREDICT, the model function must return a tf.estimator.EstimatorSpec containing the following information:

the mode, which is tf.estimator.ModeKeys.PREDICT

the prediction

The model must have been trained prior to making a prediction. The trained model is stored on disk in the directory established when you instantiated the Estimator. For our case, the code to generate the prediction looks as follows: # class_ids will be the model prediction for the class (Iris flower type) # The output node with the highest value is our prediction predictions = { 'class_ids': tf.argmax(input=logits, axis=1) } # Return our prediction if mode == tf.estimator.ModeKeys.PREDICT: return tf.estimator.EstimatorSpec(mode, predictions=predictions) The block is surprisingly brief--the lines of code are simply the bucket at the end of a long hose that catches the falling predictions. After all, the Estimator has already done all the heavy lifting to make a prediction:

The input function provides the model function with data (feature values) to infer from.

The model function transforms those feature values into feature columns.

The model function runs those feature columns through the previously-trained model.

The output layer is a logits vector that contains the value of each of the three Iris species being the input flower. The tf.argmax method selects the Iris species in that logits vector with the highest value. Notice that the highest value is assigned to a dictionary key named class_ids. We return that dictionary through the predictions parameter of tf.estimator.EstimatorSpec. The caller can then retrieve the prediction by examining the dictionary passed back to the Estimator's predict method. EVAL When model_fn is called with mode == ModeKeys.EVAL, the model function must evaluate the model, returning loss and possibly one or more metrics. We can calculate loss by calling tf.losses.sparse_softmax_cross_entropy. Here's the complete code: # To calculate the loss, we need to convert our labels # Our input labels have shape: [batch_size, 1] labels = tf.squeeze(labels, 1) # Convert to shape [batch_size] loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits) Now let's turn our attention to metrics. Although returning metrics is optional, most custom Estimators return at least one metric. TensorFlow provides a Metrics API (tf.metrics) to calculate different kinds of metrics. For brevity's sake, we'll only return accuracy. The tf.metrics.accuracy compares our predictions against the "true labels", that is, against the labels provided by the input function. The tf.metrics.accuracy function requires the labels and predictions to have the same shape (which we did earlier). Here's the call to tf.metrics.accuracy: # Calculate the accuracy between the true labels, and our predictions accuracy = tf.metrics.accuracy(labels, predictions['class_ids']) When the model is called with mode == ModeKeys.EVAL, the model function returns a tf.estimator.EstimatorSpec containing the following information:

the mode, which is tf.estimator.ModeKeys.EVAL

the model's loss

typically, one or more metrics encased in a dictionary.

So, we'll create a dictionary containing our sole metric (my_accuracy). If we had calculated other metrics, we would have added them as additional key/value pairs to that same dictionary. Then, we'll pass that dictionary in the eval_metric_ops argument of tf.estimator.EstimatorSpec. Here's the block: # Return our loss (which is used to evaluate our model) # Set the TensorBoard scalar my_accurace to the accuracy # Obs: This function only sets value during mode == ModeKeys.EVAL # To set values during training, see tf.summary.scalar if mode == tf.estimator.ModeKeys.EVAL: return tf.estimator.EstimatorSpec( mode, loss=loss, eval_metric_ops={'my_accuracy': accuracy})TRAIN When model_fn is called with mode == ModeKeys.TRAIN, the model function must train the model. We must first instantiate an optimizer object. We picked Adagrad (tf.train.AdagradOptimizer) in the following code block only because we're mimicking the DNNClassifier, which also uses Adagrad. The tf.train package provides many other optimizers—feel free to experiment with them. Next, we train the model by establishing an objective on the optimizer, which is simply to minimize its loss. To establish that objective, we call the minimize method. In the code below, the optional global_step argument specifies the variable that TensorFlow uses to count the number of batches that have been processed. Setting global_step to tf.train.get_global_step will work beautifully. Also, we are calling tf.summary.scalar to report my_accuracy to TensorBoard during training. For both of these notes, please see the section on TensorBoard below for further explanation. optimizer = tf.train.AdagradOptimizer(0.05) train_op = optimizer.minimize( loss, global_step=tf.train.get_global_step()) # Set the TensorBoard scalar my_accuracy to the accuracy tf.summary.scalar('my_accuracy', accuracy[1]) When the model is called with mode == ModeKeys.TRAIN, the model function must return a tf.estimator.EstimatorSpec containing the following information:

the mode, which is tf.estimator.ModeKeys.TRAIN

the loss

the result of the training op

Here's the code: # Return training operations: loss and train_op return tf.estimator.EstimatorSpec( mode, loss=loss, train_op=train_op) Our model function is now complete! The custom Estimator After creating your new custom Estimator, you'll want to take it for a ride. Start by instantiating the custom Estimator through the Estimator base class as follows: classifier = tf.estimator.Estimator( model_fn=my_model_fn, model_dir=PATH) # Path to where checkpoints etc are stored The rest of the code to train, evaluate, and predict using our estimator is the same as for the pre-made DNNClassifier described in Part 1. For example, the following line triggers training the model: classifier.train( input_fn=lambda: my_input_fn(FILE_TRAIN, repeat_count=500, shuffle_count=256))TensorBoard As in Part 1, we can view some training results in TensorBoard. To see this reporting, start TensorBoard from your command-line as follows: # Replace PATH with the actual path passed as model_dir tensorboard --logdir=PATH Then browse to the following URL: localhost:6006 All the pre-made Estimators automatically log a lot of information to TensorBoard. With custom Estimators, however, TensorBoard only provides one default log (a graph of loss) plus the information we explicitly tell TensorBoard to log. Therefore, TensorBoard generates the following from our custom Estimator:

Figure 6. TensorBoard displays three graphs. In brief, here's what the three graphs tell you:

global_step/sec: A performance indicator, showing how many batches (gradient updates) we processed per second (y-axis) at a particular batch (x-axis). In order to see this report, you need to provide a global_step (as we did with tf.train.get_global_step()). You also need to run training for a sufficiently long time, which we do by asking the Estimator train for 500 epochs when we call its train method:

loss: The loss reported. The actual loss value (y-axis) doesn't mean much. The shape of the graph is what's important.

my_accuracy: The accuracy recorded when we invoked both of the following:

eval_metric_ops={'my_accuracy': accuracy}), during EVAL (when returning our EstimatorSpec)

tf.summary.scalar('my_accuracy', accuracy[1]), during TRAIN

Note the following in the my_accuracy and loss graphs:

The orange line represents TRAIN.

The blue dot represents EVAL.

During TRAIN, orange values are recorded continuously as batches are processed, which is why it becomes a graph spanning x-axis range. By contrast, EVAL produces only a single value from processing all the evaluation steps. As suggested in Figure 7, you may see and also selectively disable/enable the reporting for training and evaluation the left side. (Figure 7 shows that we kept reporting on for both:)

Figure 7. Enable or disable reporting. In order to see the orange graph, you must specify a global step. This, in combination with getting global_steps/sec reported, makes it a best practice to always register a global step by passing tf.train.get_global_step() as an argument to the optimizer.minimize call. Summary Although pre-made Estimators can be an effective way to quickly create new models, you will often need the additional flexibility that custom Estimators provide. Fortunately, pre-made and custom Estimators follow the same programming model. The only practical difference is that you must write a model function for custom Estimators. Everything else is the same! For more details, be sure to check out:

The complete source code for this blog post.

The official TensorFlow implementation of MNIST, which uses a custom estimator. This model is also an example where we take in raw pixels as numeric values without using feature columns (and input_layer).

The TensorFlow official models repository, which may contain more curated examples using custom estimators.

The TensorBoard video from the TensorFlow Dev Summit, which is a fun and educational introduction to TensorBoard.

Until next time - Happy TensorFlow coding! .blgimg1 { width: 100%; padding: 0 0 0 0; margin: 0; border: 0; } .blgimg2 { width: 100%; padding: 0 0 0 0; margin: 0; border: 0; } .blgimg3 { width: 100%; padding: 0 0 0 0; margin: 0; border: 0; } .blgimg4 { width: 100%; padding: 0 0 0 0; margin: 0; border: 0; } .blgimg5 { width: 100%; padding: 0 0 0 0; margin: 0; border: 0; } .blgimg6 { width: 100%; padding: 0 0 0 0; margin: 0; border: 0; } .blgimg7 { width: 100%; padding: 0 0 0 0; margin: 0; border: 0; } table, th, td { border: 1px solid black; border-collapse: collapse; } code { font-size: 100%; }

Announcing Core ML support in TensorFlow Lite

Tuesday, December 5, 2017

Posted by The TensorFlow Team

On November 14th, we announced the developer preview of TensorFlow Lite, TensorFlow's lightweight solution for mobile and embedded devices. Today, in collaboration with Apple, we are happy to announce support for Core ML! With this announcement, iOS developers can leverage the strengths of Core ML for deploying TensorFlow models. In addition, TensorFlow Lite will continue to support cross-platform deployment, including iOS, through the TensorFlow Lite format (.tflite) as described in the original announcement. Support for Core ML is provided through a tool that takes a TensorFlow model and converts it to the Core ML Model Format (.mlmodel). For more information, check out the TensorFlow Lite documentation pages, and the Core ML converter. The pypi pip installable package is available here: https://pypi.python.org/pypi/tfcoreml/0.1.0. Stay tuned for more updates. Happy TensorFlow Lite coding! .blgimg img { width: 75%; border: 0; display: block; margin: auto; padding: 10px 0 10px 0; }

Introducing TensorFlow Feature Columns

Monday, November 20, 2017

Posted by the TensorFlow Team Welcome to Part 2 of a blog series that introduces TensorFlow Datasets and Estimators. We're devoting this article to feature columns—a data structure describing the features that an Estimator requires for training and inference. As you'll see, feature columns are very rich, enabling you to represent a diverse range of data. In Part 1, we used the pre-made Estimator DNNClassifier to train a model to predict different types of Iris flowers from four input features. That example created only numerical feature columns (of type tf.feature_column.numeric_column). Although those feature columns were sufficient to model the lengths of petals and sepals, real world data sets contain all kinds of non-numerical features. For example:

Figure 1. Non-numerical features. How can we represent non-numerical feature types? That's exactly what this blogpost is all about. Input to a Deep Neural Network Let's start by asking what kind of data can we actually feed into a deep neural network? The answer is, of course, numbers (for example, tf.float32). After all, every neuron in a neural network performs multiplication and addition operations on weights and input data. Real-life input data, however, often contains non-numerical (categorical) data. For example, consider a product_class feature that can contain the following three non-numerical values:

kitchenware

electronics

sports

ML models generally represent categorical values as simple vectors in which a 1 represents the presence of a value and a 0 represents the absence of a value. For example, when product_class is set to sports, an ML model would usually represent product_class as [0, 0, 1], meaning:

0: kitchenware is absent

0: electronics is absent

1: sports: is present

So, although raw data can be numerical or categorical, an ML model represents all features as either a number or a vector of numbers. Introducing Feature Columns As Figure 2 suggests, you specify the input to a model through the feature_columns argument of an Estimator (DNNClassifier for Iris). Feature Columns bridge input data (as returned by input_fn) with your model.

Figure 2. Feature columns bridge raw data with the data your model needs. To represent features as a feature column, call functions of the tf.feature_column package. This blogpost explains nine of the functions in this package. As Figure 3 shows, all nine functions return either a Categorical-Column or a Dense-Column object, except bucketized_column which inherits from both classes:

Figure 3. Feature column methods fall into two main categories and one hybrid category. Let's look at these functions in more detail. Numeric Column The Iris classifier called tf.numeric_column() for all input features: SepalLength, SepalWidth, PetalLength, PetalWidth. Although tf.numeric_column() provides optional arguments, calling the function without any arguments is a perfectly easy way to specify a numerical value with the default data type (tf.float32) as input to your model. For example: # Defaults to a tf.float32 scalar. numeric_feature_column = tf.feature_column.numeric_column(key="SepalLength") Use the dtype argument to specify a non-default numerical data type. For example: # Represent a tf.float64 scalar. numeric_feature_column = tf.feature_column.numeric_column(key="SepalLength", dtype=tf.float64) By default, a numeric column creates a single value (scalar). Use the shape argument to specify another shape. For example: # Represent a 10-element vector in which each cell contains a tf.float32. vector_feature_column = tf.feature_column.numeric_column(key="Bowling", shape=10) # Represent a 10x5 matrix in which each cell contains a tf.float32. matrix_feature_column = tf.feature_column.numeric_column(key="MyMatrix", shape=[10,5]) Bucketized Column Often, you don't want to feed a number directly into the model, but instead split its value into different categories based on numerical ranges. To do so, create a bucketized column. For example, consider raw data that represents the year a house was built. Instead of representing that year as a scalar numeric column, we could split year into the following four buckets:

Figure 4. Dividing year data into four buckets. The model will represent the buckets as follows:

Date Range

Represented as...

< 1960

[1, 0, 0, 0]

>= 1960 but < 1980

[0, 1, 0, 0]

>= 1980 but < 2000

[0, 0, 1, 0]

> 2000

[0, 0, 0, 1]

Why would you want to split a number—a perfectly valid input to our model—into a categorical value like this? Well, notice that the categorization splits a single input number into a four-element vector. Therefore, the model now can learn four individual weights rather than just one. Four weights creates a richer model than one. More importantly, bucketizing enables the model to clearly distinguish between different year categories since only one of the elements is set (1) and the other three elements are cleared (0). When we just use a single number (a year) as input, the model can't distinguish categories. So, bucketing provides the model with additional important information that it can use to learn. The following code demonstrates how to create a bucketized feature: # A numeric column for the raw input. numeric_feature_column = tf.feature_column.numeric_column("Year") # Bucketize the numeric column on the years 1960, 1980, and 2000 bucketized_feature_column = tf.feature_column.bucketized_column( source_column = numeric_feature_column, boundaries = [1960, 1980, 2000]) Note the following:

Before creating the bucketized column, we first created a numeric column to represent the raw year.

We passed the numeric column as the first argument to tf.feature_column.bucketized_column().

Specifying a three-element boundaries vector creates a four-element bucketized vector.

Categorical identity column Categorical identity columns are a special case of bucketized columns. In traditional bucketized columns, each bucket represents a range of values (for example, from 1960 to 1979). In a categorical identity column, each bucket represents a single, unique integer. For example, let's say you want to represent the integer range [0, 4). (That is, you want to represent the integers 0, 1, 2, or 3.) In this case, the categorical identity mapping looks like this:

Figure 5. A categorical identity column mapping. Note that this is a one-hot encoding, not a binary numerical encoding. So, why would you want to represent values as categorical identity columns? As with bucketized columns, a model can learn a separate weight for each class in a categorical identity column. For example, instead of using a string to represent the product_class, let's represent each class with a unique integer value. That is:

0="kitchenware"

1="electronics"

2="sport"

Call tf.feature_column.categorical_column_with_identity() to implement a categorical identity column. For example: # Create a categorical output for input "feature_name_from_input_fn", # which must be of integer type. Value is expected to be >= 0 and < num_buckets identity_feature_column = tf.feature_column.categorical_column_with_identity( key='feature_name_from_input_fn', num_buckets=4) # Values [0, 4) # The 'feature_name_from_input_fn' above needs to match an integer key that is # returned from input_fn (see below). So for this case, 'Integer_1' or # 'Integer_2' would be valid strings instead of 'feature_name_from_input_fn'. # For more information, please check out Part 1 of this blog series. def input_fn(): ...<code>... return ({ 'Integer_1':[values], ..<etc>.., 'Integer_2':[values] }, [Label_values])Categorical vocabulary column We cannot input strings directly to a model. Instead, we must first map strings to numeric or categorical values. Categorical vocabulary columns provide a good way to represent strings as a one-hot vector. For example:

Figure 6. Mapping string values to vocabulary columns. As you can see, categorical vocabulary columns are kind of an enum version of categorical identity columns. TensorFlow provides two different functions to create categorical vocabulary columns:

tf.feature_column.categorical_column_with_vocabulary_list()

tf.feature_column.categorical_column_with_vocabulary_file()

The tf.feature_column.categorical_column_with_vocabulary_list() function maps each string to an integer based on an explicit vocabulary list. For example: # Given input "feature_name_from_input_fn" which is a string, # create a categorical feature to our model by mapping the input to one of # the elements in the vocabulary list. vocabulary_feature_column = tf.feature_column.categorical_column_with_vocabulary_list( key="feature_name_from_input_fn", vocabulary_list=["kitchenware", "electronics", "sports"]) The preceding function has a significant drawback; namely, there's way too much typing when the vocabulary list is long. For these cases, call tf.feature_column.categorical_column_with_vocabulary_file() instead, which lets you place the vocabulary words in a separate file. For example: # Given input "feature_name_from_input_fn" which is a string, # create a categorical feature to our model by mapping the input to one of # the elements in the vocabulary file vocabulary_feature_column = tf.feature_column.categorical_column_with_vocabulary_file( key="feature_name_from_input_fn", vocabulary_file="product_class.txt", vocabulary_size=3) # product_class.txt should have one line for vocabulary element, in our case: kitchenware electronics sportsUsing hash buckets to limit categories So far, we've worked with a naively small number of categories. For example, our product_class example has only 3 categories. Often though, the number of categories can be so big that it's not possible to have individual categories for each vocabulary word or integer because that would consume too much memory. For these cases, we can instead turn the question around and ask, "How many categories am I willing to have for my input?" In fact, the tf.feature_column.categorical_column_with_hash_buckets() function enables you to specify the number of categories. For example, the following code shows how this function calculates a hash value of the input, then puts it into one of the hash_bucket_size categories using the modulo operator: # Create categorical output for input "feature_name_from_input_fn". # Category becomes: hash_value("feature_name_from_input_fn") % hash_bucket_size hashed_feature_column = tf.feature_column.categorical_column_with_hash_bucket( key = "feature_name_from_input_fn", hash_buckets_size = 100) # The number of categories At this point, you might rightfully think: "This is crazy!" After all, we are forcing the different input values to a smaller set of categories. This means that two, probably completely unrelated inputs, will be mapped to the same category, and consequently mean the same thing to the neural network. Figure 7 illustrates this dilemma, showing that kitchenware and sports both get assigned to category (hash bucket) 12:

Figure 7. Representing data in hash buckets. As with many counterintuitive phenomena in machine learning, it turns out that hashing often works well in practice. That's because hash categories provide the model with some separation. The model can use additional features to further separate kitchenware from sports. Feature crosses The last categorical column we'll cover allows us to combine multiple input features into a single one. Combining features, better known as feature crosses, enables the model to learn separate weights specifically for whatever that feature combination means. More concretely, suppose we want our model to calculate real estate prices in Atlanta, GA. Real-estate prices within this city vary greatly depending on location. Representing latitude and longitude as separate features isn't very useful in identifying real-estate location dependencies; however, crossing latitude and longitude into a single feature can pinpoint locations. Suppose we represent Atlanta as a grid of 100x100 rectangular sections, identifying each of the 10,000 sections by a cross of its latitude and longitude. This cross enables the model to pick up on pricing conditions related to each individual section, which is a much stronger signal than latitude and longitude alone. Figure 8 shows our plan, with the latitude & longitude values for the corners of the city:

Figure 8. Map of Atlanta. Imagine this map divided into 10,000 sections of equal size. For the solution, we used a combination of some feature columns we've looked at before, as well as the tf.feature_columns.crossed_column() function. # In our input_fn, we convert input longitude and latitude to integer values # in the range [0, 100) def input_fn(): # Using Datasets, read the input values for longitude and latitude latitude = ... # A tf.float32 value longitude = ... # A tf.float32 value # In our example we just return our lat_int, long_int features. # The dictionary of a complete program would probably have more keys. return { "latitude": latitude, "longitude": longitude, ...}, labels # As can be see from the map, we want to split the latitude range # [33.641336, 33.887157] into 100 buckets. To do this we use np.linspace # to get a list of 99 numbers between min and max of this range. # Using this list we can bucketize latitude into 100 buckets. latitude_buckets = list(np.linspace(33.641336, 33.887157, 99)) latitude_fc = tf.feature_column.bucketized_column( tf.feature_column.numeric_column('latitude'), latitude_buckets) # Do the same bucketization for longitude as done for latitude. longitude_buckets = list(np.linspace(-84.558798, -84.287259, 99)) longitude_fc = tf.feature_column.bucketized_column( tf.feature_column.numeric_column('longitude'), longitude_buckets) # Create a feature cross of fc_longitude x fc_latitude. fc_san_francisco_boxed = tf.feature_column.crossed_column( keys=[latitude_fc, longitude_fc], hash_bucket_size=1000) # No precise rule, maybe 1000 buckets will be good? You may create a feature cross from either of the following:

Feature names; that is, names from the dict returned from input_fn.

Any Categorical Column (see Figure 3), except categorical_column_with_hash_bucket.

When feature columns latitude_fc and longitude_fc are crossed, TensorFlow will create 10,000 combinations of (latitude_fc, longitude_fc) organized as follows: (0,0),(0,1)... (0,99) (1,0),(1,1)... (1,99) …, …, ... (99,0),(99,1)...(99, 99) The function tf.feature_column.crossed_column performs a hash calculation on these combinations and then slots the result into a category by performing a modulo operation with hash_bucket_size. As discussed before, performing the hash and modulo function will probably result in category collisions; that is, multiple (latitude, longitude) feature crosses will end up in the same hash bucket. In practice though, performing feature crosses still provides significant value to the learning capability of your models. Somewhat counterintuitively, when creating feature crosses, you typically still should include the original (uncrossed) features in your model. For example, provide not only the (latitude, longitude) feature cross but also latitude and longitude as separate features. The separate latitude and longitude features help the model separate the contents of hash buckets containing different feature crosses. See this link for a full code example for this. Also, the reference section at the end of this post for lots more examples of feature crossing. Indicator and embedding columns Indicator columns and embedding columns never work on features directly, but instead take categorical columns as input. When using an indicator column, we're telling TensorFlow to do exactly what we've seen in our categorical product_class example. That is, an indicator column treats each category as an element in a one-hot vector, where the matching category has value 1 and the rest have 0s:

Figure 9. Representing data in indicator columns. Here's how you create an indicator column: categorical_column = ... # Create any type of categorical column, see Figure 3 # Represent the categorical column as an indicator column. # This means creating a one-hot vector with one element for each category. indicator_column = tf.feature_column.indicator_column(categorical_column) Now, suppose instead of having just three possible classes, we have a million. Or maybe a billion. For a number of reasons (too technical to cover here), as the number of categories grow large, it becomes infeasible to train a neural network using indicator columns. We can use an embedding column to overcome this limitation. Instead of representing the data as a one-hot vector of many dimensions, an embedding column represents that data as a lower-dimensional, ordinary vector in which each cell can contain any number, not just 0 or 1. By permitting a richer palette of numbers for every cell, an embedding column contains far fewer cells than an indicator column. Let's look at an example comparing indicator and embedding columns. Suppose our input examples consists of different words from a limited palette of only 81 words. Further suppose that the data set provides the following input words in 4 separate examples:

"dog"

"spoon"

"scissors"

"guitar"

In that case, Figure 10 illustrates the processing path for embedding columns or Indicator columns.

Figure 10. An embedding column stores categorical data in a lower-dimensional vector than an indicator column. (We just placed random numbers into the embedding vectors; training determines the actual numbers.) When an example is processed, one of the categorical_column_with... functions maps the example string to a numerical categorical value. For example, a function maps "spoon" to [32]. (The 32 comes from our imagination—the actual values depend on the mapping function.) You may then represent these numerical categorical values in either of the following two ways:

As an indicator column. A function converts each numeric categorical value into an 81-element vector (because our palette consists of 81 words), placing a 1 in the index of the categorical value (0, 32, 79, 80) and a 0 in all the other positions.

As an embedding column. A function uses the numerical categorical values (0, 32, 79, 80) as indices to a lookup table. Each slot in that lookup table contains a 3-element vector.

How do the values in the embeddings vectors magically get assigned? Actually, the assignments happen during training. That is, the model learns the best way to map your input numeric categorical values to the embeddings vector value in order to solve your problem. Embedding columns increase your model's capabilities, since an embeddings vector learns new relationships between categories from the training data. Why is the embedding vector size 3 in our example? Well, the following "formula" provides a general rule of thumb about the number of embedding dimensions: embedding_dimensions = number_of_categories**0.25 That is, the embedding vector dimension should be the 4th root of the number of categories. Since our vocabulary size in this example is 81, the recommended number of dimensions is 3: 3 = 81**0.25 Note that this is just a general guideline; you can set the number of embedding dimensions as you please. Call tf.feature_column.embedding_column to create an embedding_column. The dimension of the embedding vector depends on the problem at hand as described above, but common values go as low as 3 all the way to 300 or even beyond: categorical_column = ... # Create any categorical column shown in Figure 3. # Represent the categorical column as an embedding column. # This means creating a one-hot vector with one element for each category. embedding_column = tf.feature_column.embedding_column( categorical_column=categorical_column, dimension=dimension_of_embedding_vector) Embeddings is a big topic within machine learning. This information was just to get you started using them as feature columns. Please see the end of this post for more information. Passing feature columns to Estimators Still there? I hope so, because we only have a tiny bit left before you've graduated from the basics of feature columns. As we saw in Figure 1, feature columns map your input data (described by the feature dictionary returned from input_fn) to values fed to your model. You specify feature columns as a list to a feature_columns argument of an estimator. Note that the feature_columns argument(s) vary depending on the Estimator:

LinearClassifier and LinearRegressor:

Accept all types of feature column.

DNNClassifier and DNNRegressor:

Only accept dense columns, see Figure 3. Other column types must be wrapped in either an indicator_column or embedding_column as described earlier.

DNNLinearCombinedClassifier and DNNLinearCombinedRegressor:

The linear_feature_columns argument can accept any column type, like the LinearClassifier and LinearRegressor above.
The dnn_feature_columns argument however is limited to dense columns, like DNNClassifier and DNNRegressor above.

The reason for the above rules are beyond the scope of this introductory post, but we will make sure to cover it in a future blogpost. Summary Use feature columns to map your input data to the representations you feed your model. We only used numeric_column in Part 1 of this series , but working with the other functions described in this post, you can easily create other feature columns. For more details on feature columns, be sure to check out:

Josh Gordon's video on Feature Engineering

And from the same author, a Jupyter notebook

The TensorFlow - Wide & Deep Tutorial

Examples of DNNs and linear models that use feature columns

If you want to learn more about embeddings:

Deep Learning, NLP, and representations (Colah's blog)

And checkout the TensorFlow Embedding Projector

.blgimg1 img { width: 100%; border: 0; margin: 0; padding: 10px 0 10px 0 ; } .blgimg2 img { width: 100%; border: 0; margin: 0; padding: 10px 0 10px 0 ; } .blgimg3 img { width: 100%; border: 0; margin: 0; padding: 10px 0 10px 0 ; } .blgimg4 img { width: 100%; border: 0; margin: 0; padding: 10px 0 10px 0 ; } .blgimg5 img { width:50%; border: 0; margin: 0; padding: 10px 0 10px 0 ; } .blgimg6 img { width: 50%; border: 0; margin: 0; padding: 10px 0 10px 0 ; } .blgimg7 img { width: 100%; border: 0; margin: 0; padding: 10px 0 10px 0 ; } .blgimg8 img { width: 100%; border: 0; margin: 0; padding: 10px 0 10px 0 ; } .blgimg9 img { width: 50%; border: 0; margin: 0; padding: 10px 0 10px 0 ; } .blgimg10 img { width: 100%; border: 0; margin: 0; padding: 10px 0 10px 0 ; } code { font-size: 100%; } table, th { border: 1px solid black; border-collapse: collapse; } td { border: 1px solid black; border-collapse: collapse; width: 50% !important; }