<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Google Cloud</title><link>https://cloud.google.com/blog/products/gcp/</link><description>Google Cloud</description><atom:link href="https://cloudblog.withgoogle.com/products/gcp/rss/" rel="self"></atom:link><language>en</language><lastBuildDate>Thu, 05 Jan 2023 17:00:00 -0000</lastBuildDate><image><url>https://gweb-cloudblog-publish.appspot.com/products/gcp/static/blog/images/google.a51985becaa6.png</url><title>Google Cloud</title><link>https://cloud.google.com/blog/products/gcp/</link></image><item><title>New year, new skills - How to reach your cloud career destination</title><link>https://cloud.google.com/blog/topics/training-certifications/start-your-cloud-career-in-2023-steps-and-skills-training/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Cloud is a great place to grow your career in 2023. Opportunity abounds, with cloud roles offering strong salaries and scope for growth as a constantly evolving field.&lt;sup&gt;1&lt;/sup&gt; Some positions do not require a technical background, like project managers, product owners and business analysts. For others, like solutions architects, developers and administrators, coding and technical expertise are a must. &lt;/p&gt;&lt;p&gt;Either way, cloud knowledge and experience are required to land that dream job. But where do you start? And how do you keep up with the fast pace of ever-changing cloud technology? Check out these tips below. There are also suggested training opportunities to help support your growth, including no-cost options below!&lt;/p&gt;&lt;h3&gt;Start by looking at your experience&lt;/h3&gt;&lt;p&gt;Your experience can be a great way to get into cloud, even if it seems non-traditional. Think creatively about transferable skills and opportunities. Here are a few scenarios where you might find yourself today:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;You already work in IT, but in legacy systems or the data center. Forrest Brazeal, Head of Content Marketing at Google Cloud, talks about that in detail in &lt;a href="https://www.youtube.com/watch?v=vviS_fHnJu4&amp;amp;list=PLIivdWyY5sqKBEZkq4X5tojtTY3OhZfda&amp;amp;index=3&amp;amp;t=186s" target="_blank"&gt;this video&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Use your sales experience to become a sales engineer, or your communications experience to become a developer advocate. Stephanie Wong, Developer Advocate at Google Cloud, discusses that &lt;a href="https://www.youtube.com/watch?v=5ETgd44DkzM&amp;amp;list=PLIivdWyY5sqKBEZkq4X5tojtTY3OhZfda&amp;amp;index=2" target="_blank"&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;You don’t have that college degree that is included in the job requirements. I’ve talked about that in a recent video &lt;a href="https://www.youtube.com/watch?v=L4hiEVS9TLk&amp;amp;list=PLIivdWyY5sqKBEZkq4X5tojtTY3OhZfda&amp;amp;index=1" target="_blank"&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Your company has a cloud segment, but your focus is in another area. Go talk to people! Access your colleagues who do what you want to do. Get their advice for skilling up.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Define where you need to fill in gaps&lt;/h3&gt;&lt;p&gt;If you are looking at a technical position, you will need to show cloud applicable experience, so learn about the cloud and build a portfolio of work. Here are a few key skills we recommend everyone have to start&lt;sup&gt;1&lt;/sup&gt;:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Code is non-negotiable&lt;/b&gt;. People who come from software development backgrounds typically find it easier to get into and maneuver through the cloud environment because of their coding experience. Automation, basic data manipulation and scaling is a daily requirement. If you don’t have a language you already know, learning Python is a great place to begin.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Understand Linux&lt;/b&gt;. You’ll need to know the Linux filesystem, basic Linux commands and fundamentals of containerization.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Learn core networking concepts&lt;/b&gt; like the IP Protocol and the others that layer on top of it, DNS, and subnets.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Make sure you understand the cloud itself&lt;/b&gt;, and in particular the specifics about Google Cloud for a role at Google.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Familiarity with open source tooling&lt;/b&gt;. Terraform for automation and Kubernetes for containers are portable between clouds and are worth taking the time to learn.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Boost your targeted hands-on skills&lt;/h3&gt;&lt;p&gt;Check out &lt;a href="https://www.cloudskillsboost.google/?utm_source=google&amp;amp;utm_medium=blog&amp;amp;utm_campaign=NewYearNewSkillsJan2023" target="_blank"&gt;Google Cloud Skills Boost&lt;/a&gt; for a comprehensive collection of training to help you upskill into a cloud role, including hands-on labs that get you real-world experience in Google Cloud. New users can start off with a 30 day no-cost trial&lt;sup&gt;2&lt;/sup&gt;. Take a look at these recommendations:&lt;/p&gt;&lt;p&gt;&lt;b&gt;&lt;i&gt;No-cost labs and courses&lt;/i&gt;&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://www.cloudskillsboost.google/focuses/2794?catalog_rank=%7B%22rank%22%3A1%2C%22num_filters%22%3A0%2C%22has_search%22%3Atrue%7D&amp;amp;parent=catalog&amp;amp;search_id=20908111&amp;amp;utm_source=google&amp;amp;utm_medium=blog&amp;amp;utm_campaign=NewYearNewSkillsJan2023" target="_blank"&gt;A Tour of Google Cloud Hands-on Labs&lt;/a&gt;- 45 minutes&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://www.cloudskillsboost.google/focuses/32138?catalog_rank=%7B%22rank%22%3A1%2C%22num_filters%22%3A0%2C%22has_search%22%3Atrue%7D&amp;amp;parent=catalog&amp;amp;search_id=20908074&amp;amp;utm_source=google&amp;amp;utm_medium=blog&amp;amp;utm_campaign=NewYearNewSkillsJan2023" target="_blank"&gt;A Tour of Google Cloud Sustainability&lt;/a&gt; - 60 minutes&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://www.cloudskillsboost.google/focuses/2802?catalog_rank=%7B%22rank%22%3A1%2C%22num_filters%22%3A0%2C%22has_search%22%3Atrue%7D&amp;amp;parent=catalog&amp;amp;search_id=20908150&amp;amp;utm_source=google&amp;amp;utm_medium=blog&amp;amp;utm_campaign=NewYearNewSkillsJan2023" target="_blank"&gt;Introduction to SQL for BigQuery and Cloud SQL&lt;/a&gt; - 60 minutes &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://www.cloudskillsboost.google/course_templates/265?catalog_rank=%7B%22rank%22%3A2%2C%22num_filters%22%3A0%2C%22has_search%22%3Atrue%7D&amp;amp;search_id=20651945&amp;amp;utm_source=google&amp;amp;utm_medium=blog&amp;amp;utm_campaign=NewYearNewSkillsJan2023" target="_blank"&gt;Infrastructure and Application Modernization with Google Cloud&lt;/a&gt; - Introductory course with three modules &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://www.cloudskillsboost.google/catalog?keywords=Preparing+for+your++Journey&amp;amp;locale=&amp;amp;solution%5B%5D=any&amp;amp;role%5B%5D=any&amp;amp;skill-badge%5B%5D=any&amp;amp;format%5B%5D=any&amp;amp;level%5B%5D=any&amp;amp;duration%5B%5D=any&amp;amp;language%5B%5D=any&amp;amp;utm_source=google&amp;amp;utm_medium=blog&amp;amp;utm_campaign=NewYearNewSkillsJan2023" target="_blank"&gt;Preparing for Google Cloud certification&lt;/a&gt; - Courses to help you prepare for Google Cloud certification exams&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Build hands on projects&lt;/h3&gt;&lt;p&gt;This part is critical for the interview portion. Take the cloud skills you have learned and create something tangible that you can use as a story during an interview. Consider building a project on Github so others can see it working live, and document it well. Be sure to include your decision making process. Here is an example:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Build an API or a web application&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Develop the code for the application&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Pick the infrastructure to deploy that application in the cloud, choose your storage option, and a database with which it will interact &lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;h3&gt;Get valuable cloud knowledge for non-technical roles&lt;/h3&gt;&lt;p&gt;For tech-adjacent roles, like those in business, sales or administration, having a solid knowledge of cloud principles is critical. We recommend completing the Cloud Digital Leader training courses, at no cost. Or go the extra mile and consider taking the Google Cloud Digital Leader Certification exam once you complete the training:&lt;/p&gt;&lt;p&gt;&lt;b&gt;&lt;i&gt;No-cost course&lt;/i&gt;&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://www.cloudskillsboost.google/paths/9?utm_source=google&amp;amp;utm_medium=blog&amp;amp;utm_campaign=NewYearNewSkillsJan2023" target="_blank"&gt;Cloud Digital Leader Learning Path&lt;/a&gt; - understand cloud capabilities, products and services and how they benefit organizations &lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;b&gt;&lt;i&gt;$99 registration fee&lt;/i&gt;&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/certification/cloud-digital-leader?utm_source=google&amp;amp;utm_medium=blog&amp;amp;utm_campaign=NewYearNewSkillsJan2023"&gt;Google Cloud Digital Leader Certification&lt;/a&gt; - validate your cloud expertise by earning a certification&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Commit to learning in the New Year&lt;/h3&gt;&lt;p&gt;A couple of other resources we have are the &lt;a href="https://cloud.google.com/innovators?utm_source=google&amp;amp;utm_medium=blog&amp;amp;utm_campaign=NewYearNewSkillsJan2023#innovatorsplusbenefits"&gt;Google Cloud Innovators Program&lt;/a&gt;, which will help you grow on Google Cloud and connect you with other community members. There is no-cost to join, and it will give you access build your skills and the future of cloud! &lt;a href="https://cloud.google.com/innovators?utm_source=google&amp;amp;utm_medium=blog&amp;amp;utm_campaign=NewYearNewSkillsJan2023#innovatorsplusbenefits"&gt;Join today&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Start your new year strong, whether you are exploring Google Cloud Data, DevOps or Networking certifications by completing &lt;a href="https://go.qwiklabs.com/arcade?utm_source=googlecloud&amp;amp;utm_medium=blog&amp;amp;utm_campaign=start2023" target="_blank"&gt;Arcade games&lt;/a&gt; each week. This January play to win in &lt;a href="https://go.qwiklabs.com/arcade?utm_source=googlecloud&amp;amp;utm_medium=blog&amp;amp;utm_campaign=start2023" target="_blank"&gt;The Arcade&lt;/a&gt; while you learn new skills and earn prizes on Google Cloud Skills Boost. Each week we will feature a new game to help you show and grow your cloud skills, while sampling certification-based learning paths.  &lt;/p&gt;&lt;p&gt;Make 2023 the year to build your cloud career and commit to learning all year, with our $299/year &lt;a href="https://www.cloudskillsboost.google/subscriptions?utm_source=google&amp;amp;utm_medium=blog&amp;amp;utm_campaign=NewYearNewSkillsJan2023" target="_blank"&gt;annual subscription&lt;/a&gt;. This subscription includes $500 of Google Cloud credits (and a bonus $500 of Google Cloud credits after you successfully certify), a $200 certification voucher, $299 annual subscription to Google Cloud Skills Boost with access to the entire training catalog, live-learning events and quarterly technical briefings with executives. &lt;/p&gt;&lt;hr/&gt;&lt;p&gt;&lt;i&gt;&lt;sup&gt;1. &lt;a href="https://www.youtube.com/watch?v=vviS_fHnJu4&amp;amp;list=PLIivdWyY5sqKBEZkq4X5tojtTY3OhZfda&amp;amp;index=3&amp;amp;t=14s" target="_blank"&gt;Starting your career in cloud from IT&lt;/a&gt; - Forrest Brazeal, Head of Content Marketing, Google Cloud&lt;br/&gt;2. Credit card required to activate a 30 day no-cost trial for new users.&lt;/sup&gt;&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Thu, 05 Jan 2023 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/topics/training-certifications/start-your-cloud-career-in-2023-steps-and-skills-training/</guid><category>Google Cloud</category><category>Developers &amp; Practitioners</category><category>Training and Certifications</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/training_2023.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>New year, new skills - How to reach your cloud career destination</title><description>Find out how to jump start your dream cloud career, no matter what your background. Advice for technical and non-technical roles!</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/training_2023.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/training-certifications/start-your-cloud-career-in-2023-steps-and-skills-training/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Priyanka Vergadia</name><title>Lead Developer Advocate, Google</title><department></department><company></company></author></item><item><title>Optimize and scale your startup - A look into the Build Series</title><link>https://cloud.google.com/blog/topics/startups/google-cloud-technical-guides-for-startups-build-series-wrap-up/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;At Google Cloud, we want to provide you with the access to all the tools you need to grow your business. Through the Google Cloud Technical Guides for Startups, leverage industry leading solutions with how-to &lt;a href="https://www.youtube.com/playlist?list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC" target="_blank"&gt;video guides&lt;/a&gt; and&lt;a href="https://cloudonair.withgoogle.com/events/technical-guide-for-startups-series/resources#" target="_blank"&gt;resources&lt;/a&gt;curated for startups. &lt;/p&gt;&lt;p&gt;This multi-series contains 3 chapters: Start, Build and Grow, which matches your startup’s  journey:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;The Start Series: Begin by building, deploying and managing new applications on Google Cloud from start to finish.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;The Build Series: Optimize and scale existing deployments to reach your target audiences.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;The Grow Series: Grow and attain scale with deployments on Google Cloud.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Additionally, at Google we have the &lt;a href="https://cloud.google.com/startup"&gt;Google for Startups Cloud Program&lt;/a&gt;, which is designed to help your business get off the ground and enable a sustainable growth plan for the future. The start of the &lt;a href="https://www.youtube.com/watch?v=xQshNK7V2jQ&amp;amp;list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC&amp;amp;index=13" target="_blank"&gt;Build Series&lt;/a&gt; delineates the benefits of the program, the application process, and more to help your business get started on Google Cloud.&lt;/p&gt;&lt;h3&gt;A quick recap of the Build Series&lt;/h3&gt;&lt;p&gt;Once you have applied for the &lt;a href="https://cloud.google.com/startup"&gt;Google for Startups Cloud Program&lt;/a&gt;, there’s so much to explore and try out on Google Cloud. &lt;/p&gt;&lt;p&gt;Figuring out a rapid but solid application development process can be key to many businesses in reducing time to market. Furthermore, learning what database to use to handle application data can be tricky. Deep dive into our &lt;a href="https://www.youtube.com/watch?v=8DYWeI4Yc8Q&amp;amp;list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC&amp;amp;index=15" target="_blank"&gt;Firestore&lt;/a&gt;video which walks through how Firestore can help you unlock application innovation with simplicity and speed.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-video"&gt;&lt;div class="article-module article-video "&gt;&lt;figure&gt;&lt;a class="h-c-video h-c-video--marquee" data-glue-modal-disabled-on-mobile="true" data-glue-modal-trigger="uni-modal-8DYWeI4Yc8Q-" href="https://youtube.com/watch?v=8DYWeI4Yc8Q"&gt;&lt;img alt="Here to bring you the latest news in the startup program by Google Cloud is Mirabel Tukiman!" src="//img.youtube.com/vi/8DYWeI4Yc8Q/maxresdefault.jpg"/&gt;&lt;svg class="h-c-video__play h-c-icon h-c-icon--color-white" role="img"&gt;&lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/a&gt;&lt;/figure&gt;&lt;/div&gt;&lt;div class="h-c-modal--video" data-glue-modal="uni-modal-8DYWeI4Yc8Q-" data-glue-modal-close-label="Close Dialog"&gt;&lt;a class="glue-yt-video" data-glue-yt-video-autoplay="true" data-glue-yt-video-height="99%" data-glue-yt-video-vid="8DYWeI4Yc8Q" data-glue-yt-video-width="100%" href="https://youtube.com/watch?v=8DYWeI4Yc8Q" ng-cloak=""&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;We then move on to a deep dive into&lt;a href="https://www.youtube.com/watch?v=BH_7_zVk5oM&amp;amp;list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC&amp;amp;index=16" target="_blank"&gt;BigQuery&lt;/a&gt; and how it can help businesses. BigQuery is designed to support analysis over petabytes of data regardless of whether it’s structured or unstructured. This video is the goto video for getting started on BigQuery!&lt;/p&gt;If you are someone looking to run your Spark and Hadoop jobs faster and on the cloud, look to &lt;a href="https://www.youtube.com/watch?v=shzKmZ6Yqtk&amp;amp;list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC&amp;amp;index=17" target="_blank"&gt;Dataproc&lt;/a&gt;. To learn more about Dataproc and how this has helped other customers with their Hadoop clusters, click the video below to learn all things Dataproc related.&lt;/div&gt;&lt;/div&gt;&lt;div class="block-video"&gt;&lt;div class="article-module article-video "&gt;&lt;figure&gt;&lt;a class="h-c-video h-c-video--marquee" data-glue-modal-disabled-on-mobile="true" data-glue-modal-trigger="uni-modal-shzKmZ6Yqtk-" href="https://youtube.com/watch?v=shzKmZ6Yqtk"&gt;&lt;img alt="Here to bring you the latest news in the startup program by Google Cloud is Emely Zavala!" src="//img.youtube.com/vi/shzKmZ6Yqtk/maxresdefault.jpg"/&gt;&lt;svg class="h-c-video__play h-c-icon h-c-icon--color-white" role="img"&gt;&lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/a&gt;&lt;/figure&gt;&lt;/div&gt;&lt;div class="h-c-modal--video" data-glue-modal="uni-modal-shzKmZ6Yqtk-" data-glue-modal-close-label="Close Dialog"&gt;&lt;a class="glue-yt-video" data-glue-yt-video-autoplay="true" data-glue-yt-video-height="99%" data-glue-yt-video-vid="shzKmZ6Yqtk" data-glue-yt-video-width="100%" href="https://youtube.com/watch?v=shzKmZ6Yqtk" ng-cloak=""&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Next, we find out what &lt;a href="https://www.youtube.com/watch?v=dXhF3JJg3mE&amp;amp;list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC&amp;amp;index=18&amp;amp;t=88s" target="_blank"&gt;Dataflow&lt;/a&gt; can bring to your business; some advantages, sample architectures, demos on the console, and how other customers are using Dataflow. &lt;/p&gt;&lt;p&gt;We also talked about Machine Learning, starting from selecting the right ML solution to &lt;a href="https://www.youtube.com/watch?v=pM1M4Y4QZ6k&amp;amp;list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC&amp;amp;index=20&amp;amp;t=12s" target="_blank"&gt;Machine Learning APIs&lt;/a&gt; on cloud to exploring &lt;a href="https://www.youtube.com/watch?v=pZKpAai7stE&amp;amp;list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC&amp;amp;index=21&amp;amp;t=2s" target="_blank"&gt;Vertex AI&lt;/a&gt;. Following that we look into &lt;a href="https://www.youtube.com/watch?v=mwcAjZXZrjg&amp;amp;list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC&amp;amp;index=22" target="_blank"&gt;API management in Google Cloud&lt;/a&gt; and how Apigee helps operate your APIs with enhanced scale, security, and automation.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-video"&gt;&lt;div class="article-module article-video "&gt;&lt;figure&gt;&lt;a class="h-c-video h-c-video--marquee" data-glue-modal-disabled-on-mobile="true" data-glue-modal-trigger="uni-modal-pZKpAai7stE-" href="https://youtube.com/watch?v=pZKpAai7stE"&gt;&lt;img alt="Here to bring you the latest news in the startup program by Google Cloud is Jeevana Hegde and Hussein Giva!" src="//img.youtube.com/vi/pZKpAai7stE/maxresdefault.jpg"/&gt;&lt;svg class="h-c-video__play h-c-icon h-c-icon--color-white" role="img"&gt;&lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/a&gt;&lt;/figure&gt;&lt;/div&gt;&lt;div class="h-c-modal--video" data-glue-modal="uni-modal-pZKpAai7stE-" data-glue-modal-close-label="Close Dialog"&gt;&lt;a class="glue-yt-video" data-glue-yt-video-autoplay="true" data-glue-yt-video-height="99%" data-glue-yt-video-vid="pZKpAai7stE" data-glue-yt-video-width="100%" href="https://youtube.com/watch?v=pZKpAai7stE" ng-cloak=""&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;We ended the series with the last two episodes focusing around &lt;a href="https://www.youtube.com/watch?v=0qyUm6UsMkE&amp;amp;list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC&amp;amp;index=23" target="_blank"&gt;security deep-dive&lt;/a&gt; and using &lt;a href="https://www.youtube.com/watch?v=P9MCC9KmM_8&amp;amp;list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC&amp;amp;index=24" target="_blank"&gt;Cloud Tasks and Cloud Scheduler&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-video"&gt;&lt;div class="article-module article-video "&gt;&lt;figure&gt;&lt;a class="h-c-video h-c-video--marquee" data-glue-modal-disabled-on-mobile="true" data-glue-modal-trigger="uni-modal-0qyUm6UsMkE-" href="https://youtube.com/watch?v=0qyUm6UsMkE"&gt;&lt;img alt="Here to bring you the latest news in the startup program by Google Cloud is Eunsun Cho! Welcome to the second season of the Google Cloud Technical Guides for Startups - the Build Series. Build Series - Episode 11: Getting started with Security on Google Cloud Tune into our new series for a new episode each time and let us know what you think in the comments below!" src="//img.youtube.com/vi/0qyUm6UsMkE/maxresdefault.jpg"/&gt;&lt;svg class="h-c-video__play h-c-icon h-c-icon--color-white" role="img"&gt;&lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/a&gt;&lt;/figure&gt;&lt;/div&gt;&lt;div class="h-c-modal--video" data-glue-modal="uni-modal-0qyUm6UsMkE-" data-glue-modal-close-label="Close Dialog"&gt;&lt;a class="glue-yt-video" data-glue-yt-video-autoplay="true" data-glue-yt-video-height="99%" data-glue-yt-video-vid="0qyUm6UsMkE" data-glue-yt-video-width="100%" href="https://youtube.com/watch?v=0qyUm6UsMkE" ng-cloak=""&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Coming up next - The Grow Series&lt;/h3&gt;&lt;p&gt;Dive into the next chapter of this multi-series, with our upcoming Grow Series, where we will be focusing on growing and attaining scale with deployments on Google Cloud.&lt;/p&gt;&lt;p&gt;&lt;a href="https://cloudonair.withgoogle.com/events/technical-guide-for-startups-series" target="_blank"&gt;Check out our website&lt;/a&gt; and &lt;a href="https://www.youtube.com/playlist?list=PLIivdWyY5sqJOQJCXW_aYEqwfyi6bu1gC" target="_blank"&gt;join us&lt;/a&gt; by checking out the video series on the &lt;a href="https://www.youtube.com/user/googlecloudplatform" target="_blank"&gt;Google Cloud Tech channel&lt;/a&gt;, and subscribe to stay up to date. &lt;/p&gt;&lt;p&gt;See you in the cloud!&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Thu, 22 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/topics/startups/google-cloud-technical-guides-for-startups-build-series-wrap-up/</guid><category>Google Cloud</category><category>Startups</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/build_series_122222.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Optimize and scale your startup - A look into the Build Series</title><description>Announcing the recap of the second series (Build Series) of the Google Cloud Technical Guides for Startups, a video series for technical enablement aimed at helping startups to start, build and grow their businesses.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/build_series_122222.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/startups/google-cloud-technical-guides-for-startups-build-series-wrap-up/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Vibha Kurpad</name><title>Associate Customer Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Aditi Jain</name><title>Customer Engineer</title><department></department><company></company></author></item><item><title>Document AI adds three new capabilities to its OCR engine</title><link>https://cloud.google.com/blog/products/ai-machine-learning/top-reasons-to-use-gcp-document-ai-ocr/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Documents are indispensable parts of our professional and personal lives. They give us crucial insights that help us become more efficient, that organize and optimize information, and that even help us to stay competitive. But as documents become increasingly complex, and as the variety of document types continues to expand, it has become increasingly challenging for people and businesses to sift through the ocean of bits and bytes in order to extract actionable insights. &lt;/p&gt;&lt;p&gt;This is where Google Cloud’s Document AI comes in. It is a unified, AI-powered suite for understanding and organizing documents. &lt;a href="https://cloud.google.com/document-ai"&gt;Document AI&lt;/a&gt; consists of Document AI Workbench (state-of-the-art custom ML platform), Document AI Warehouse (managed service with document storage and analytics capabilities), and a rich set of pre-trained document processors. Underpinning these services is the ability to extract text accurately from various types of documents with a world-class &lt;a href="https://cloud.google.com/document-ai/docs/processors-list#processor_doc-ocr"&gt;Document Optical Character Recognition (OCR)&lt;/a&gt; engine.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="1 OCR engine 122122.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_OCR_engine_122122.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Google Cloud’s Document AI OCR takes an unstructured document as input and extracts text and layout (e.g., paragraphs, lines, etc.) from the document. Covering over 200 languages, Document AI OCR is powered by state-of-the-art machine learning models developed by Google Cloud and Google Research teams. &lt;/p&gt;&lt;p&gt;Today, we are pleased to announce three new OCR features in Public Preview that can further enhance your document processing workflows. &lt;/p&gt;&lt;h3&gt;1. Assess page-level quality of documents with Intelligent Document Quality (IDQ) &lt;/h3&gt;&lt;p&gt;With Document AI OCR, Google Cloud customers and partners can programmatically extract key document characteristics – word frequency distributions, relative positioning of line items, dominant language of the input document, etc. – as critical inputs to their downstream business logic. Today, we are adding another important document assessment signal to this toolbox: Intelligent Document Quality (IDQ) scores. &lt;/p&gt;&lt;p&gt;IDQ provides page-level quality metrics in the following eight dimensions:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Blurriness &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Level of optical noise &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Darkness&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Faintness&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Presence of smaller-than-usual fonts&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Document getting cut off&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Text spans getting cut off&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Glares due to lighting conditions&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;Being able to discern the optical quality of documents helps assess which documents must be processed differently based on their quality, making the overall document processing pipeline more efficient. For example, Gary Lewis, Managing Director of lending and deposit solutions at Jack Henry, noted, “Google’s Document AI technology, enriched with Intelligent Document Quality (IDQ) signals, will help businesses to automate the data capture of invoices and payments when sending to our factoring customers for purchasing. This creates internal efficiencies, reduces risk for the factor/lender, and gets financing into the hands of cash-constrained businesses quickly.”&lt;/p&gt;&lt;p&gt;Overall, document quality metrics pave the way for more intelligent routing of documents for downstream analytics. The reference workflow below uses document quality scores to split and classify documents before sending them to either the pre-built &lt;a href="https://cloud.google.com/document-ai/docs/processors-list#processor_form-parser"&gt;Form Parser&lt;/a&gt; (in the case of high document quality) or a &lt;a href="https://cloud.google.com/document-ai/docs/workbench/build-custom-processor"&gt;Custom Document Extractor&lt;/a&gt; trained specifically on lower-quality datasets.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="2 OCR engine 122122.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_OCR_engine_122122.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;2. Process digital PDF documents with confidence with built-in digital PDF support&lt;/h3&gt;&lt;p&gt;The PDF format is popular in various business applications such as procurement (invoices, purchase orders), lending (W-2 forms, paystubs), and contracts (leasing or mortgage agreements). PDF documents can be image-based (e.g., a scanned driver’s license) or digital, where you can hover over, highlight, and copy/paste embedded text in a PDF document the same way as you interact with a text file such as Google Doc or Microsoft Word. &lt;/p&gt;&lt;p&gt;We are happy to announce digital PDF support in Document AI OCR. The digital PDF feature extracts text and symbols exactly as they appear in the source documents, therefore making our OCR engine highly performant in complex visual scenarios such as rotated texts, extreme font sizes and/or styles, or partially hidden text.  &lt;/p&gt;&lt;p&gt;Discussing the importance and prevalence of PDF documents in banking and finance (e.g., bank statements, mortgage agreements, etc.), Ritesh Biswas, Director, Google Cloud Practice at PwC, said, “The Document AI OCR solution from Google Cloud, especially its support for digital PDF input formats, has enabled PwC to bring digital transformation to the global financial services industry.”&lt;/p&gt;&lt;h3&gt;3. “Freeze” model characteristics with OCR versioning&lt;/h3&gt;&lt;p&gt;As a fully managed cloud-based service, Document AI OCR regularly upgrades the underlying AI/ML models to maintain its world-class accuracy across over 200 languages and scripts. These model upgrades, while providing new features and enhancements, may occasionally lead to changes in OCR behavior compared to an earlier version. &lt;/p&gt;&lt;p&gt;Today, we are launching OCR versioning, which enables users to pin to a historical OCR model behavior. The “frozen” model versions, in turn, give our customers and partners peace of mind, ensuring consistent OCR behavior. For industries with rigorous compliance requirements, this update also helps maintain the same model version, thus minimizing the need and effort to recertify stacks between releases. According to Jagadheeswaran Kathirvel, Senior Principal Architect at Mr. Cooper, “Having consistent OCR behavior is mission-critical to our business workflows. We value Google Cloud’s OCR versioning capability that enables our products to pin to a specific OCR version for an extended period of time.”&lt;/p&gt;&lt;p&gt;With OCR versioning, you have the full flexibility to select the versioning option that best fits your business needs.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="3 OCR engine 122122.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_OCR_engine_122122.1000064720000470.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Getting Started on Document AI OCR&lt;/h3&gt;&lt;p&gt;Learn more about the new OCR features and tutorials in the &lt;a href="https://cloud.google.com/document-ai/docs/processors-list#processor_doc-ocr"&gt;Document AI Documentation&lt;/a&gt; or &lt;a href="https://cloud.google.com/document-ai/docs/drag-and-drop"&gt;try it&lt;/a&gt; directly in your browser (no coding required). For more details on what’s new with Document AI, don’t forget to check out our &lt;a href="https://youtu.be/DxFeQok9pus" target="_blank"&gt;breakout session&lt;/a&gt; from Google Cloud Next 2022.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Wed, 21 Dec 2022 19:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/ai-machine-learning/top-reasons-to-use-gcp-document-ai-ocr/</guid><category>Google Cloud</category><category>AI &amp; Machine Learning</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/aiml2022_PO1vxqJ.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Document AI adds three new capabilities to its OCR engine</title><description>Announcing three new features for Document AI OCR, including intelligent document quality metrics, digital PDF support, and OCR model versioning.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/aiml2022_PO1vxqJ.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/ai-machine-learning/top-reasons-to-use-gcp-document-ai-ocr/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Steve Z.</name><title>Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Devaki Kulkarni</name><title>Product Manager</title><department></department><company></company></author></item><item><title>New control plane connectivity and isolation options for your GKE clusters</title><link>https://cloud.google.com/blog/products/containers-kubernetes/understanding-gkes-new-control-plane-connectivity/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Once upon a time, all Google Kubernetes Engine (GKE) clusters used public IP addressing for communication between nodes and the control plane. Subsequently, we heard your security concerns and introduced private clusters enabled by VPC peering. &lt;/p&gt;&lt;p&gt;To consolidate the connectivity types, starting in March 2022, we began using Google Cloud’s &lt;a href="https://cloud.google.com/vpc/docs/private-service-connect"&gt;Private Service Connect (PSC)&lt;/a&gt; for new public clusters’ communication between the GKE cluster control plane and nodes, which has profound implications for how you can configure your GKE environment. Today, we’re presenting a new consistent PSC-based framework for GKE control plane connectivity from cluster nodes. Additionally, we’re excited to announce a new feature set which includes cluster isolation at the control plane and node pool levels to enable more scalable, secure — and cheaper! — GKE clusters. &lt;/p&gt;&lt;h3&gt;New architecture&lt;/h3&gt;&lt;p&gt;Starting with GKE version 1.23 and later, all new public clusters created on or after March 15th, 2022 began using Google Cloud’s PSC infrastructure to communicate between the GKE cluster control plane and nodes. PSC provides a consistent framework that helps connect different networks through a service networking approach, and allows service producers and consumers to communicate using private IP addresses internal to a VPC. &lt;/p&gt;&lt;p&gt;The biggest benefit of this change is to set the stage for using PSC-enabled features for GKE clusters.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="1 control plane connectivity 122122.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_control_plane_connectiv.1000065120000983.max-1000x1000.jpg"/&gt;&lt;figcaption class="article-image__caption "&gt;&lt;div class="rich-text"&gt;&lt;i&gt;Figure 1: Simplified diagram of PSC-based architecture for GKE clusters&lt;/i&gt;&lt;/div&gt;&lt;/figcaption&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;The new set of cluster isolation capabilities we’re presenting here is part of the evolution to a more scalable and secure GKE cluster posture. Previously, private GKE clusters were enabled with VPC peering, introducing specific network architectures. With this feature set, you now have the ability to:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Update the GKE cluster control plane to only allow access to a private endpoint&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Create or update a GKE cluster node pool with public or private nodes&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Enable or disable GKE cluster control plane access from Google-owned IPs.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;In addition, the new PSC infrastructure can provide cost savings. Traditionally, control plane communication is treated as normal egress and is charged for public clusters as a normal public IP charge. This is also true if you’re running &lt;code&gt;kubectl&lt;/code&gt; for provisioning or other operational reasons. With PSC infrastructure, we have eliminated the cost of communication between the control plane and your cluster nodes, resulting in one less network egress charge to worry about.&lt;/p&gt;&lt;p&gt;Now, let’s take a look at how this feature set enables these new capabilities.&lt;/p&gt;&lt;h3&gt;Allow access to the control plane only via a private endpoint&lt;/h3&gt;&lt;p&gt;Private cluster users have long had the ability to create the control plane with both public and private endpoints. We now extend the same flexibility to public GKE clusters based on PSC. With this, if you want private-only access to your GKE control plane but want all your node pools to be public, you can do so. &lt;/p&gt;&lt;p&gt;This model provides a tighter security posture for the control plane, while leaving you to choose what kind of cluster node you need, based on your deployment. &lt;/p&gt;&lt;p&gt;To enable access only to a private endpoint on the control plane, use the following gcloud command:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'gcloud container clusters update CLUSTER_NAME \\\r\n --enable-private-endpoint'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e499665c110&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="2 control plane connectivity 122122.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_control_plane_connectivity_122122.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Allow toggling and mixed-mode clusters with public and private node pools&lt;/h3&gt;&lt;p&gt;All cloud providers with managed Kubernetes offerings offer both public and private clusters. Whether a cluster is public or private is enforced at the cluster level, and cannot be changed once it is created. Now you have the ability to toggle a node pool to have private or public IP addressing. &lt;/p&gt;&lt;p&gt;You may also want a mix of private and public node pools. For example, you may be running a mix of workloads in your cluster in which some require internet access and some don’t. Instead of setting up NAT rules, you can deploy a workload on a node pool with public IP addressing to ensure that only such node pool deployments are publicly accessible. &lt;/p&gt;&lt;p&gt;To enable private-only IP addressing on existing node pools, use the following gcloud command:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'gcloud container node-pools update POOL_NAME \\\r\n --cluster CLUSTER_NAME \\\r\n --enable-private-nodes'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e49966da5d0&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;To enable private-only IP addressing at node pool creation time, use the following gcloud command:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'gcloud container node-pools create POOL_NAME \\\r\n --cluster CLUSTER_NAME \\\r\n --enable-private-nodes'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e49965e9410&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Configure access from Google Cloud &lt;/h3&gt;&lt;p&gt;In some scenarios, users have identified workloads outside of their GKE cluster, for example, applications running in Cloud Run or any GCP VMs sourced with Google Cloud public IPs were allowed to reach the cluster control plane. To mitigate potential security concerns, we have introduced a feature that allows you to toggle access to your cluster control plane from such sources. &lt;/p&gt;&lt;p&gt;To remove access from Google Cloud public IPs to the control plane, use the following gcloud command:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'gcloud container clusters update CLUSTER_NAME \\\r\n --no-enable-google-cloud-access'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e49965e9d10&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Similarly, you can use this flag at cluster creation time.&lt;/p&gt;&lt;h3&gt;Choose your private endpoint address&lt;/h3&gt;Many customers like to map IPs to a stack for easier troubleshooting and to track usage. For example — IP block x for Infrastructure, IP block y for Services, IP block z for the GKE control plane, etc. By default, the private IP address for the control plane in PSC-based GKE clusters comes from the node subnet. However, some customers treat node subnets as infrastructure and apply security policies against it. To differentiate between infrastructure and the GKE control plane, you can now create a new custom subnet and assign it to your cluster control plane.&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'gcloud container clusters create CLUSTER_NAME \\\r\n --private-endpoint-subnetwork=SUBNET_NAME'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e49965e9d90&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;What can you do with this new GKE architecture?&lt;/h3&gt;&lt;p&gt;With this new set of features, you can basically remove all public IP communication for your GKE clusters! This, in essence, means you can make your GKE clusters completely private. &lt;/p&gt;&lt;p&gt;You currently need to create the cluster as public to ensure that it uses PSC, but you can then update your cluster using gcloud with the &lt;code&gt;--enable-private-endpoint&lt;/code&gt; flag, or the UI, to configure access via only a private endpoint on the control plane or create new private node pools. &lt;/p&gt;&lt;p&gt;Alternatively, you can control access at cluster creation time with the &lt;code&gt;--master-authorized-network&lt;/code&gt;s and &lt;code&gt;--no-enable-google-cloud-access&lt;/code&gt; flags to prevent access from public addressing to the control plane.&lt;/p&gt;&lt;p&gt;Furthermore, you can use the REST API or Terraform Providers to actually build a new PSC-based GKE cluster with the default (thus first) node pools to have private nodes. This can be done by setting the &lt;a href="https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1/projects.locations.clusters.nodePools#nodenetworkconfig"&gt;&lt;code&gt;enablePrivateNodes&lt;/code&gt;&lt;/a&gt; field to true (instead of leveraging the public GKE cluster defaults and then updating afterwards, as currently required with gcloud and UI operations). &lt;/p&gt;&lt;p&gt;Lastly, the aforementioned features extend not only to Standard GKE clusters, but also to GKE Autopilot clusters.&lt;/p&gt;&lt;p&gt;When evaluating if you’re ready to move these PSC-based GKE cluster types to take advantage of private cluster isolation, keep in mind that the control plane’s private endpoint has the following limitations:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Private addresses in URLs for new or existing webhooks that you configure are not supported. To mitigate this incompatibility and assign an internal IP address to the URL for webhooks, set up a webhook to a private address by URL, create a headless service without a selector and a corresponding endpoint for the required destination.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;The control plane private endpoint is not currently accessible from on-premises systems.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;The control plane private endpoint is not currently globally accessible: Client VMs from different regions than the cluster region cannot connect to the control plane's private endpoint.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;All public clusters on version 1.25 and later that are not yet PSC-based are currently being migrated to the new PSC infrastructure; therefore, your clusters might already be using PSC to communicate with the control plane.&lt;/p&gt;&lt;p&gt;To learn more about GKE clusters with PSC-based control plane communication, check out these references:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview#public-cluster-psc"&gt;GKE Concept page for public clusters with PSC&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/change-cluster-isolation"&gt;How-to: Change Cluster Isolation page&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/node-pools#add"&gt;How-to: GKE node pool creation page with isolation feature flag&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/change-cluster-isolation#autopilot"&gt;How-to: Schedule Pods on GKE Autopilot private nodes&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/sdk/gcloud/reference/container/clusters/create#--private-endpoint-subnetwork"&gt;gcloud reference to create a cluster with a custom private subnet&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/hashicorp/terraform-provider-google/releases/tag/v4.45.0" target="_blank"&gt;Terraform Providers Google: release v4.45.0 page&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/vpc/docs/private-service-connect"&gt;Google Cloud Private Services Connect page&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Here are the more specific features in the latest Terraform Provider, handy to integrate into your automation pipeline:&lt;/p&gt;&lt;p&gt;&lt;a href="https://github.com/hashicorp/terraform-provider-google/releases/tag/v4.45.0" target="_blank"&gt;Terraform Providers Google: release v4.45.0&lt;/a&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#gcp_public_cidrs_access_enabled" target="_blank"&gt;gcp_public_cidrs_access_enabled&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#enable_private_endpoint" target="_blank"&gt;enable_private_endpoint&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#private_endpoint_subnetwork" target="_blank"&gt;private_endpoint_subnetwork&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_node_pool#enable_private_nodes" target="_blank"&gt;enable_private_nodes&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Wed, 21 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/containers-kubernetes/understanding-gkes-new-control-plane-connectivity/</guid><category>Networking</category><category>Application Modernization</category><category>Google Cloud</category><category>Containers &amp; Kubernetes</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/containers_2022.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>New control plane connectivity and isolation options for your GKE clusters</title><description>New GKE networking options enable cluster isolation for the control plane and node pools, for more scalable, secure, and cost-effective GKE clusters.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/containers_2022.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/containers-kubernetes/understanding-gkes-new-control-plane-connectivity/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Cynthia Thomas</name><title>Product Manager, GKE</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Dmitry Berkovich</name><title>Staff Software Engineer, GKE</title><department></department><company></company></author></item><item><title>Google Cloud wrapped: Top 22 news stories of 2022, according to you</title><link>https://cloud.google.com/blog/products/gcp/top-google-cloud-stories-of-2022/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;What a year! Over here at Google Cloud, we’re winding things down, but not before taking some time to reflect on everything that happened over the past twelve months. &lt;/p&gt;&lt;p&gt;Inspired by the custom Spotify Wrapped playlist playing in our earbuds, we pulled the data about the best-read Google Cloud news posts of the year, to better understand which stories resonated most with you. &lt;/p&gt;&lt;p&gt;Many of your favorite stories came as no surprise, as they tracked with major news, product launches, and events. But there were some sleeper hits in there too — stories whose viral success and staying power took us a bit by surprise. We also uncovered some fascinating data about the older posts that you keep coming back to, month after month, year after year (stay tuned for more on that in 2023). So, without further ado, here are the top 22 Google Cloud news stories of 2022, according to you, our readers.&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke"&gt;Here's what to know about changes to kubectl authentication coming in GKE v1.26&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/how-google-cloud-blocked-largest-layer-7-ddos-attack-at-46-million-rps"&gt;How Google Cloud blocked the largest Layer 7 DDoS attack at 46 million rps&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/virtual-machine-threat-detection-in-security-command-center"&gt;Protecting customers against cryptomining threats with VM Threat Detection in Security Command Center&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/looker-next-evolution-business-intelligence-data-studio"&gt;Introducing the next evolution of Looker, your unified business intelligence platform&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="http://cloud.google.com/blog/products/compute/calculating-100-trillion-digits-of-pi-on-google-cloud"&gt;Even more pi in the sky: Calculating 100 trillion digits of pi on Google Cloud&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/databases/introducing-alloydb-for-postgresql"&gt;Introducing AlloyDB for PostgreSQL: Free yourself from expensive, legacy databases&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/infrastructure-modernization/introducing-blockchain-node-engine"&gt;Introducing Blockchain Node Engine: fully managed node-hosting for Web3 development&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/public-sector/announcing-google-public-sector"&gt;Introducing Google Public Sector&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/google-completes-acquisition-of-mandiant"&gt;Google + Mandiant: Transforming Security Operations and Incident Response&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/raising-the-bar-in-security-operations"&gt;Raising the bar in Security Operations: Google Acquires Siemplify&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/serverless/loreal-combines-google-cloud-serverless-and-data-offerings"&gt;The L’Oréal Beauty Tech Data Platform - A data story of terabytes and serverless&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/build-a-data-mesh-on-google-cloud-with-dataplex-now-generally-available"&gt;Build a data mesh on Google Cloud with Dataplex, now generally available&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/financial-services/google-cloud-launches-dedicated-digital-asset-team"&gt;Google Cloud launches new dedicated Digital Assets Team&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/google-announces-new-cloud-contact-center-ai-platform"&gt;Contact Center AI reimagines the customer experience through full end-to-end platform&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/partners/google-cloud-announces-2021-partner-of-the-year-awards"&gt;Unveiling the 2021 Google Cloud Partner of the Year Award Winners&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="http://cloud.google.com/blog/products/identity-security/automate-public-certificate-lifecycle-management-via--acme-client-api"&gt;Automate Public Certificates Lifecycle Management via RFC 8555 (ACME)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-intelligent-scalable-storage"&gt;AlloyDB for PostgreSQL under the hood: Intelligent, database-aware storage&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/looker-and-data-studio-integrate-for-best-of-both-worlds"&gt;Bringing together the best of both sides of BI with Looker and Data Studio&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/serverless/introducing-the-next-generation-of-cloud-functions"&gt;Supercharge your event-driven architecture with new Cloud Functions (2nd gen)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/devops-sre/dora-2022-accelerate-state-of-devops-report-now-out"&gt;Announcing the 2022 Accelerate State of DevOps Report: A deep dive into security&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/making-cobalt-strike-harder-for-threat-actors-to-abuse"&gt;Making Cobalt Strike harder for threat actors to abuse&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/why-google-now-uses-post-quantum-cryptography-for-internal-comms"&gt;Securing tomorrow today: Why Google now protects its internal communications from quantum threats&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;Recognize any of your favorites? We thought you might. See anything you missed? Now’s your chance to catch up.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-aside"&gt;&lt;dl&gt;&lt;dt&gt;aside_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'title', u'A transformative top 10'), (u'body', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e67f7738c10&amp;gt;), (u'btn_text', u'Read the top 10'), (u'href', u'https://cloud.google.com/blog/transform/top-10-digital-transformation-cloud-stories-trends-2022'), (u'image', &amp;lt;GAEImage: Transform Top 10 2022&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Let’s take a deeper look at these top posts as they landed throughout the year. &lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;January&lt;/h3&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/raising-the-bar-in-security-operations"&gt;&lt;b&gt;Raising the bar in Security Operations: Google Acquires Siemplify&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#10)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;We set off some new year’s fireworks by acquiring security operations specialist Siemplify, combining their proven security orchestration, automation and response technology with our &lt;a href="https://chronicle.security/" target="_blank"&gt;Chronicle security analytics&lt;/a&gt; to build a next-generation security operations workflow.&lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/financial-services/google-cloud-launches-dedicated-digital-asset-team"&gt;&lt;b&gt;Google Cloud launches new dedicated Digital Assets Team&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#13)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;News flash: blockchain technology has huge potential. So it was no big surprise that readers responded with gusto to the news of Google Cloud’s new Digital Assets Team, whose charter is to support customers’ needs in building, transacting, storing value, and deploying new products on blockchain-based platforms.&lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;February&lt;/h3&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/virtual-machine-threat-detection-in-security-command-center"&gt;&lt;b&gt;Protecting customers against cryptomining threats with VM Threat Detection in Security Command Center&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#3)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Who wants their VMs to be hijacked by hackers mining crypto? No one. To help, we added a new layer of threat detection to our &lt;a href="https://cloud.google.com/security-command-center"&gt;Security Command Center&lt;/a&gt; that can help detect threats such as cryptomining malware inside virtual machines running on Google Cloud. &lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke"&gt;&lt;b&gt;Here's what to know about changes to kubectl authentication coming in GKE v1.26&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#1)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;The open-source Kubernetes community made a big move when it decided to require that all provider-specific code that currently exists in the OSS code base be removed (starting with v1.26). We responded with a blockbuster post (the #1 post of the year, in terms of readership) that outlines how this move impacts the client side. &lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/serverless/introducing-the-next-generation-of-cloud-functions"&gt;&lt;b&gt;Supercharge your event-driven architecture with new Cloud Functions (2nd gen)&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#19)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Developers eyeing serverless platforms responded with enthusiasm to news of our next-generation Functions-as-a-Service product, which offers more powerful infrastructure, advanced control over performance and scalability, more control around the functions runtime, and support for triggers from over 90 event sources. &lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/build-a-data-mesh-on-google-cloud-with-dataplex-now-generally-available"&gt;&lt;b&gt;Build a data mesh on Google Cloud with Dataplex, now generally available&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#12)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Building a data mesh is hard to do. But doing so lets data teams centrally manage, monitor, and govern their data across all manner of data lakes, data warehouses, and data marts, so they can make the data available to various analytics and data science tools. With Dataplex, data teams got a new way to do just that.&lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;March&lt;/h3&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/serverless/loreal-combines-google-cloud-serverless-and-data-offerings"&gt;&lt;b&gt;The L’Oréal Beauty Tech Data Platform - A data story of terabytes and serverless&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#11)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Serverless, event-driven architecture, cross-cloud analytics… This customer story from L’Oréal about how it built its Beauty Tech Data Platform had it all. &lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/google-announces-new-cloud-contact-center-ai-platform"&gt;&lt;b&gt;Contact Center AI reimagines the customer experience through full end-to-end platform&lt;/b&gt;&lt;/a&gt;&lt;b&gt;(#14)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Customers rely on contact centers for help when they encounter urgent problems with a product or service, but contact centers often struggle to provide timely help. To bridge this gap with the power of AI, Google Cloud built Contact Center AI (CCAI) to streamline and shorten this time to value. CCAI Platform, the addition announced here, expanded this effort by introducing end-to-end call center capabilities.&lt;/p&gt;&lt;p&gt;&lt;a href="http://cloud.google.com/blog/products/identity-security/automate-public-certificate-lifecycle-management-via--acme-client-api"&gt;&lt;b&gt;Automate Public Certificates Lifecycle Management via RFC 8555 (ACME)&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#16)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;With this announcement, Google Cloud customers were able to acquire public certificates for their workloads that terminate TLS directly or for their cross-cloud and on-premises workloads using the Automatic Certificate Management Environment (&lt;a href="https://datatracker.ietf.org/doc/html/rfc8555" target="_blank"&gt;ACME&lt;/a&gt;) protocol. This is the same standard used by Certificate Authorities to enable automatic lifecycle management of TLS certificates.&lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;April&lt;/h3&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/looker-and-data-studio-integrate-for-best-of-both-worlds"&gt;&lt;b&gt;Bringing together the best of both sides of BI with Looker and Data Studio&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#18)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;When Google Cloud acquired Looker in 2020 for its business intelligence and analytics platform, inquiring minds instantly began asking what would become of Data Studio, Google’s existing self-serve BI solution. This blog began to answer that question.&lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;May &lt;/h3&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/databases/introducing-alloydb-for-postgresql"&gt;&lt;b&gt;Introducing AlloyDB for PostgreSQL: Free yourself from expensive, legacy databases&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#6)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Live from Shoreline at &lt;a href="https://io.google/" target="_blank"&gt;Google I/O&lt;/a&gt;, we made one of our largest product announcements of the year, launching a PostgreSQL database that can handle both transactional and analytical workloads, without sacrificing performance.&lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-intelligent-scalable-storage"&gt;&lt;b&gt;AlloyDB for PostgreSQL under the hood: Intelligent, database-aware storage&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#17)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Readers couldn’t get enough about AlloyDB, piling on to learn about the inner workings of its database-aware storage (not to mention its &lt;a href="https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-columnar-engine"&gt;columnar engine&lt;/a&gt;). &lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;June/July &lt;/h3&gt;&lt;p&gt;&lt;a href="http://cloud.google.com/blog/products/compute/calculating-100-trillion-digits-of-pi-on-google-cloud"&gt;&lt;b&gt;Even more pi in the sky: Calculating 100 trillion digits of pi on Google Cloud&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#5)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;A follow up to a &lt;a href="https://cloud.google.com/blog/products/compute/calculating-31-4-trillion-digits-of-archimedes-constant-on-google-cloud"&gt;reader favorite&lt;/a&gt; from 2019, we broke the record (again) by calculating the most digits of pi, leaning into significant advancements in Google Cloud compute, networking and storage. &lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/partners/google-cloud-announces-2021-partner-of-the-year-awards"&gt;&lt;b&gt;Unveiling the 2021 Google Cloud Partner of the Year Award Winners&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#15)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Who consistently demonstrates a creative spirit, collaborative drive, and a customer-first approach? Google Cloud partners, of course! With this blog, we were proud to recognize you and to call you our partners!&lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/public-sector/announcing-google-public-sector"&gt;&lt;b&gt;Introducing Google Public Sector&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#8)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;The U.S. government had been asking for more choice in cloud vendors who could support its missions, and protect the health, safety, and security of its citizens. With the announcement of Google Public Sector, a subsidiary of Google LLC that will bring Google Cloud and Google Workspace technologies to U.S. public sector customers, we delivered.&lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;August&lt;/h3&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/how-google-cloud-blocked-largest-layer-7-ddos-attack-at-46-million-rps"&gt;&lt;b&gt;How Google Cloud blocked the largest Layer 7 DDoS attack at 46 million rps&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#2)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Distributed denial-of-service (DDoS) attacks have been increasing in frequency and growing in size exponentially. In this post, we described how &lt;a href="https://cloud.google.com/armor"&gt;Cloud Armor&lt;/a&gt; protected one Google Cloud customer from the largest DDoS attack ever recorded — an attack so large that it was like receiving all of the requests that Wikipedia receives in a day in just 10 seconds. &lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;September&lt;/h3&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/google-completes-acquisition-of-mandiant"&gt;&lt;b&gt;Google + Mandiant: Transforming Security Operations and Incident Response&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#9)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Here, we took a moment to reflect on the completion of our acquisition of threat intelligence firm Mandiant. Bringing Mandiant into the Google Cloud fold will allow us to deliver a security operations suite to help enterprises globally stay protected at every stage of the security lifecycle, and focus on eliminating entire classes of threats. &lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/devops-sre/dora-2022-accelerate-state-of-devops-report-now-out"&gt;&lt;b&gt;Announcing the 2022 Accelerate State of DevOps Report: A deep dive into security&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#20)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;For eight years now, DevOps professionals have pored over the results of DORA’s annual Accelerate State of DevOps Report. This year’s installment focused on the relationship between security and DevOps, using the &lt;a href="https://slsa.dev/" target="_blank"&gt;Supply-chain Levels for Secure Artifacts (SLSA)&lt;/a&gt; and NIST &lt;a href="https://csrc.nist.gov/publications/detail/sp/800-218/final" target="_blank"&gt;Secure Software Development&lt;/a&gt; frameworks. &lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;October&lt;/h3&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/looker-next-evolution-business-intelligence-data-studio"&gt;&lt;b&gt;Introducing the next evolution of Looker, your unified business intelligence platform&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#4)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In April, we began to lay out our strategy for Looker and Data Studio. At &lt;a href="https://cloud.withgoogle.com/next" target="_blank"&gt;Google Cloud Next ‘22&lt;/a&gt;, we took the next step, consolidating the two under the Looker brand umbrella, and adding important new capabilities. &lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/infrastructure-modernization/introducing-blockchain-node-engine"&gt;&lt;b&gt;Introducing Blockchain Node Engine: fully managed node-hosting for Web3 development&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#7)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Remember how in January we said that blockchain has a lot of potential? About that. News of the fully managed Blockchain Node Engine node-hosting service took readers by storm, catapulting it to the top ten of 2022, with just over two months left in the year. &lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;November/December&lt;/h3&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/making-cobalt-strike-harder-for-threat-actors-to-abuse"&gt;&lt;b&gt;Making Cobalt Strike harder for threat actors to abuse&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#21)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Legitimate versions of Cobalt Strike are a very popular red team software tool, but older, cracked versions are often used by malicious hackers to spread malware. We made available to the security community a set of open-source YARA Rules that can be deployed to help stop the illicit use of Cobalt Strike. &lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/why-google-now-uses-post-quantum-cryptography-for-internal-comms"&gt;&lt;b&gt;Securing tomorrow today: Why Google now protects its internal communications from quantum threats&lt;/b&gt;&lt;/a&gt; &lt;b&gt;(#22)&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Google and Google Cloud have taken steps to harden our cryptographic algorithms used to protect internal communications against quantum computing threats. We explain here why we did it, and what challenges we face to achieve this type of future-proofing.&lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;That’s a wrap!&lt;/h3&gt;&lt;p&gt;Barring any last minute surprises, we’re pretty confident that what we have here is the definitive list of your favorite news stories of 2022 — you’ve got great taste. We can’t wait to see what stories inspire you in the new year. Happy holidays, and thanks for reading!&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Tue, 20 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/gcp/top-google-cloud-stories-of-2022/</guid><category>Google Cloud</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Cloud_wrapped_122022.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Google Cloud wrapped: Top 22 news stories of 2022, according to you</title><description>We ran the numbers to find this year’s top Google Cloud news stories, by readership.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Cloud_wrapped_122022.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/gcp/top-google-cloud-stories-of-2022/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Google Cloud Content &amp; Editorial </name><title></title><department></department><company></company></author></item><item><title>What’s new with Google Cloud</title><link>https://cloud.google.com/blog/topics/inside-google-cloud/whats-new-google-cloud/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Want to know the latest from Google Cloud? Find it here in one handy location. Check back regularly for our newest updates, announcements, resources, events, learning opportunities, and more. &lt;br/&gt;&lt;/p&gt;&lt;hr/&gt;&lt;p&gt;&lt;b&gt;Tip&lt;/b&gt;: Not sure where to find what you’re looking for on the Google Cloud blog? Start here: &lt;a href="https://cloud.google.com/blog/topics/inside-google-cloud/complete-list-google-cloud-blog-links-2021"&gt;Google Cloud blog 101: Full list of topics, links, and resources&lt;/a&gt;.&lt;/p&gt;&lt;hr/&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Week of Dec 19 - Dec 23, 2022&lt;/h3&gt;&lt;p&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="https://cloud.google.com/eventarc/docs"&gt;&lt;b&gt;Eventarc&lt;/b&gt;&lt;/a&gt; adds support for &lt;a href="https://cloud.google.com/eventarc/docs/reference/supported-events#directly-from-a-google-cloud-source"&gt;85+ new direct events&lt;/a&gt;  from the following services: API Gateway, Apigee Registry, BeyondCorp, Certificate Manager, Cloud Data Fusion, Cloud Functions, Cloud Memorystore for Memcached, Database Migration, Datastream, Eventarc, and Workflows. Direct events provide strongly typed events with lower latency. This launch brings the total event sources supported by Eventarc to &lt;a href="https://cloud.google.com/eventarc/docs/reference/supported-events"&gt;&lt;b&gt;150+ Google and third-party services with 7000+ direct and Cloud audit log based events.&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Week of Dec 12 - Dec 16, 2022&lt;/h3&gt;&lt;p&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Storage Transfer Service now offers &lt;a href="https://cloud.google.com/products/#product-launch-stages"&gt;Preview support&lt;/a&gt; for event-driven transfers - serverless, real-time replication from AWS S3 to Cloud Storage, and between Cloud Storage buckets. With this new capability, you can accelerate your event-driven analytics pipeline, enable automatic replication across Cloud Storage buckets, create a backup copy of data in a different region or project, or perform live migration. Read more &lt;a href="https://cloud.google.com/storage-transfer/docs/event-driven-transfers"&gt;here&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Learn about Memorystore for Redis best practices to achieve the optimal performance and availability with your implementation. Prescriptive guidance around monitoring your Memorystore instance is also provided. Read more about these topics &lt;a href="https://cloud.google.com/blog/products/databases/best-pactices-for-cloud-memorystore-for-redis"&gt;here&lt;/a&gt;.&lt;br/&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;/p&gt;&lt;ul&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Week of Dec 5 - Dec 9, 2022&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;A Google Cloud first-party supported open-source &lt;a href="https://github.com/googleapis/java-pubsub-group-kafka-connector" target="_blank"&gt;Kafka Connector for Pub/Sub and Pub/Sub Lite&lt;/a&gt; is now generally available. See how it enables an easy drop-in solution for moving data between Kafka clusters and Pub/Sub and Pub/Sub Lite &lt;a href="https://cloud.google.com/blog/products/data-analytics/pubsub-group-kafka-connector-is-now-ga"&gt;here&lt;/a&gt;.&lt;br/&gt;&lt;/li&gt;&lt;li&gt;Eventarc &lt;a href="https://cloud.google.com/eventarc/docs/use-cmek"&gt;support for&lt;/a&gt; &lt;a href="https://cloud.google.com/eventarc/docs/use-cmek"&gt;customer-managed encryption keys (CMEK)&lt;/a&gt; is generally available (GA).&lt;br/&gt;&lt;/li&gt;&lt;li&gt;Pub/Sub Lite now offers export subscriptions to Pub/Sub. This new subscription type writes Lite messages directly to Pub/Sub - no code development or Dataflow jobs needed. Great for connecting disparate data pipelines and migration from Lite to Pub/Sub. &lt;a href="https://cloud.google.com/blog/products/data-analytics/easier-and-cheaper-with-pubsub-lite-reservations"&gt;Learn more&lt;/a&gt;.&lt;br/&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Week of Nov 28 - Dec 2, 2022&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;Zeotap partnered with Google Cloud to build a next-generation customer data platform with focus on Privacy, Security and Compliance. This blog post describes their journey using Google Data Cloud including BigQuery, BI Engine, Vertex AI to build customized Audience segments at scale. Read more &lt;a href="https://cloud.google.com/blog/products/data-analytics/built-bigquery-zeotap-uses-google-bigquery-build-highly-customized-audiences-scale"&gt;here&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Week of Nov 14 - Nov 18, 2022&lt;/h3&gt;&lt;p&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Apigee has been named a&lt;a href="https://cloud.google.com/blog/products/api-management/apigee-is-a-leader-in-the-gartner-mq-for-api-management"&gt;leader in the 2022 Gartner Magic Quadrant for API Management&lt;/a&gt;, marking the &lt;b&gt;&lt;i&gt;seventh time in a row&lt;/i&gt;&lt;/b&gt; we’ve earned this recognition. We remain the top API Management vendor in our Ability to Execute, with a strong product offering, customer experience, and sales execution. Please help us share the good news via &lt;a href="https://twitter.com/googlecloud/status/1593671703904804867" target="_blank"&gt;Twitter&lt;/a&gt;, &lt;a href="https://www.facebook.com/495863664148547/posts/1728087374259497/" target="_blank"&gt;Facebook&lt;/a&gt;, and &lt;a href="https://www.linkedin.com/feed/update/urn:li:activity:6999437466961072128/" target="_blank"&gt;LinkedIn&lt;/a&gt;.&lt;br/&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="https://www.connected-stories.com/?utm_source=Google+BwBQ+Blog&amp;amp;utm_medium=Blog+Post&amp;amp;utm_campaign=Google+BwBQ+2022" target="_blank"&gt;Connected-Stories&lt;/a&gt; has built an end-to-end creative management platform on Google Cloud including BigQuery, Vertex AI to develop, serve and optimize interactive video and display Ads that scale across any channel. Read more &lt;a href="https://cloud.google.com/blog/products/data-analytics/how-connected-stories-is-using-google-data-cloud"&gt;here&lt;/a&gt;.&lt;br/&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Week of Nov 7 - Nov 11, 2022&lt;/h3&gt;&lt;p&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Private Marketplace functionality is now available in preview for Google Cloud Marketplace to help organizations scale compliant product discovery. Learn more &lt;a href="https://cloud.google.com/blog/products/application-modernization/google-cloud-private-marketplace-preview"&gt;here&lt;/a&gt;.&lt;br/&gt;&lt;/li&gt;&lt;li&gt;No-cost access to some of our popular training is available on Coursera until December 31,2022. Get hands-on experience to enhance your technical skills in the cloud environment for the most in-demand job roles. Training is available for both technical and non-technical professionals and spans foundational to advanced content. You’ll also earn a shareable certificate. Learn more about this training offer &lt;a href="https://cloud.google.com/blog/topics/training-certifications/get-cloud-skills--training-needed-for-in-demand-job-roles"&gt;today&lt;/a&gt;. &lt;br/&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Week of Oct 31 - Nov 4, 2022&lt;/h3&gt;&lt;p&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="https://cloud.google.com/iam/docs/deny-overview"&gt;IAM Deny&lt;/a&gt;, a security guardrail to help Google Cloud customers harden their security posture at scale, is now Generally Available (GA). IAM Deny policies manage access to Google Cloud resources based on principal, resource type, and permissions they're trying to use. It enables administrators to harden their cloud security posture easily and at scale.&lt;br/&gt;&lt;/li&gt;&lt;li&gt;True Fit, a data-driven personalization platform built on Google Data Cloud describe their data journey to unlock Partner growth. True Fit publishes a number of BigQuery dataset for its Retail partners using Analytics Hub. Data sharing using Google Cloud has elevated True Fit’s business using real-world data in real-time. They achieved this in conjunction with the &lt;a href="https://cloud.google.com/solutions/data-cloud-isvs"&gt;Built with BigQuery&lt;/a&gt; program from Cloud Partner Engineering. &lt;a href="https://cloud.google.com/blog/products/data-analytics/how-google-cloud-bigquery-helps-true-fit-unlock-partner-growth"&gt;Read more&lt;/a&gt;.&lt;br/&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="https://cloud.google.com/blog/products/application-development/introducing-cloud-workstations"&gt;Google Cloud Workstations&lt;/a&gt; is now in public preview.&lt;br/&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Week of Oct 24 - Oct 28, 2022&lt;/h3&gt;&lt;p&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Google Cloud&lt;/b&gt; and &lt;b&gt;Sibros Technology&lt;/b&gt; with their award winning Deep Connected Platform is enabling vehicle manufacturers and suppliers to reach the next level in their use of data  to gain valuable insights that should mitigate risks, reduce costs, add innovative products, drive sustainability and introduce value-added use cases services in the automotive industry. &lt;a href="https://cloud.google.com/blog/products/data-analytics/powering-connected-vehicles-on-google-cloud-with-sibros-ota-platform"&gt;Read more&lt;/a&gt;&lt;b&gt;.&lt;/b&gt;&lt;br/&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Data Exploration Workbench in Dataplex is now Generally Available&lt;/b&gt; - it offers a Spark-powered serverless data exploration experience with one-click access to Spark SQL scripts and Jupyter notebooks. With the workbench, Data Consumers can spend more time generating insights rather than integrating different tools and platforms.&lt;a href="https://cloud.google.com/blog/products/data-analytics/dataplex-provides-spark-powered-data-exploration-experience"&gt;Learn more&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Mon, 19 Dec 2022 19:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/topics/inside-google-cloud/whats-new-google-cloud/</guid><category>Google Cloud</category><category>Inside Google Cloud</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/whats_new_cloud.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s new with Google Cloud</title><description>Find our newest updates, announcements, resources, events, learning opportunities, and more in one handy location.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/whats_new_cloud.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/inside-google-cloud/whats-new-google-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Google Cloud Content &amp; Editorial </name><title></title><department></department><company></company></author></item><item><title>The Squire’s guide to automated deployments with Cloud Build</title><link>https://cloud.google.com/blog/products/serverless/the-squires-guide-to-automated-deployments-with-cloud-build/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;hr/&gt;&lt;p&gt;&lt;b&gt;Audience: (Intermediate level)&lt;/b&gt; Targeting readers that have not yet interacted with Google Cloud before, but have engineering continuous integration, package management, beginner-level container and messaging experience. Requiring these individuals to have a pre-existing frontend application and supporting API server locally in place.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Technologies&lt;/b&gt;: &lt;/p&gt;&lt;ul&gt;&lt;li&gt;Cloud Build&lt;/li&gt;&lt;li&gt;Cloud Build Triggers&lt;/li&gt;&lt;li&gt;Artifact Registry&lt;/li&gt;&lt;li&gt;Cloud Run&lt;/li&gt;&lt;li&gt;Pub/Sub&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;b&gt;Requirements before getting started&lt;/b&gt;:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Functional client-side repository &lt;/li&gt;&lt;li&gt;Functional API server repository&lt;/li&gt;&lt;li&gt;Pre-existing GCP project with billing enabled&lt;/li&gt;&lt;li&gt;Unix machine&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;A Hero’s Journey - The Quest Begins&lt;/h3&gt;&lt;p&gt;In the initial stages of development, it’s easy to underestimate the grunt work needed to containerize and deploy your application, especially if you are new to the cloud. Could Google Cloud help you complete your project without adding too much bloat to the work? Let’s find out! This blog will take you on a quest to get to the heart of quick automated deployments by leveraging awesome features from the following products:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/build"&gt;&lt;b&gt;Cloud Build&lt;/b&gt;: DevOps automation platform&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/artifact-registry"&gt;&lt;b&gt;Artifact Registry&lt;/b&gt;: Universal package manager&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/run/docs/"&gt;&lt;b&gt;Cloud Run&lt;/b&gt;: Serverless for containerized applications&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/pubsub/docs/overview"&gt;&lt;b&gt;Pub/Sub&lt;/b&gt;: Global real time messaging&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;To help on this learning journey, we’d like to arm you with a realistic example of this flow as you are fashioning your own CI/CD pipeline. This blog will be referencing an open source Github project that models a best practices architecture using Google Cloud serverless patterns, &lt;a href="https://github.com/GoogleCloudPlatform/emblem" target="_blank"&gt;Emblem&lt;/a&gt;.  (Note: References will be tagged with Emblem).&lt;br/&gt;&lt;p&gt;&lt;/p&gt;&lt;hr/&gt;&lt;p&gt;&lt;b&gt;Note&lt;/b&gt;: This blog will showcase the benefits of using Pub/Sub with multiple triggers, as it does in Emblem. If you are looking for a more direct path to building and deploying your containers with one trigger, check out the following quickstarts: &lt;a href="https://cloud.google.com/build/docs/deploying-builds/deploy-cloud-run"&gt;”Deploying to Cloud Run using Cloud Build”&lt;/a&gt; and &lt;a href="https://cloud.google.com/deploy/docs/deploy-app-run"&gt;“Deploying to Cloud Run using Cloud Deploy”&lt;/a&gt;. &lt;br/&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;hr/&gt;&lt;h3&gt;Quest goals&lt;/h3&gt;&lt;p&gt;&lt;i&gt;The following goals will lead you to create a lean automated deployment flow for your API service that will be triggered by any change to the main branch of its source Github repository.&lt;/i&gt;&lt;/p&gt;&lt;p&gt;&lt;i&gt;&lt;b&gt;Manual deployment with Cloud Build and Cloud Run&lt;/b&gt;&lt;b&gt;&lt;br/&gt;&lt;/b&gt;&lt;/i&gt;Before you run off and attempt to automate anything, you will need a solid understanding of what commands you will be adding to your future &lt;code&gt;cloudbuild.yaml&lt;/code&gt; files.&lt;/p&gt;&lt;p&gt;&lt;i&gt;&lt;b&gt;Build an image with a Cloud Build trigger&lt;/b&gt;&lt;b&gt;&lt;br/&gt;&lt;/b&gt;&lt;/i&gt;Creating the first trigger and &lt;i&gt;cloudbuild.yaml&lt;/i&gt; file in the Cloud Build product that will react to any new changes to the main branch of your Github project.&lt;/p&gt;&lt;p&gt;&lt;i&gt;&lt;b&gt;React to Cloud Build events with Pub/Sub&lt;/b&gt;&lt;b&gt;&lt;br/&gt;&lt;/b&gt;&lt;/i&gt;Using a cool in-built feature of Artifact Registry repositories, create a pubsub topic.&lt;/p&gt;&lt;p&gt;&lt;i&gt;&lt;b&gt;Deploy with a Cloud Build trigger&lt;/b&gt;&lt;b&gt;&lt;br/&gt;&lt;/b&gt;&lt;/i&gt;Creating a new Cloud Build trigger that listens to the above Pub/Sub topic and a new &lt;code&gt;cloudbuild.yaml&lt;/code&gt; file that will initiate deployment of newly created container images from Artifact Registry.&lt;/p&gt;&lt;h3&gt;Before getting started&lt;/h3&gt;&lt;p&gt;For the purposes of this blog, the following is required:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/sdk/gcloud"&gt;&lt;code&gt;gcloud cli&lt;/code&gt;&lt;/a&gt; installed on Unix machine&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;An existing REST API server with associated Dockerfile&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Google Cloud project with billing enabled (&lt;a href="https://cloud.google.com/pricing"&gt;pricing&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;You will create a new Github project repository &lt;code&gt;epic-quest-project&lt;/code&gt;and adding your existing REST API server code directory (i.e Emblem:  &lt;a href="https://github.com/GoogleCloudPlatform/emblem/tree/main/content-api" target="_blank"&gt;content-api&lt;/a&gt;) to create the following project file structure:&lt;br/&gt;&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'epic-quest-project/\r\n\u251c\u2500\u2500 ops/ # where build triggers will live\r\n\u2514\u2500\u2500 server-side/ # where your API server code lives\r\n \u251c\u2500\u2500 main.py\r\n \u251c\u2500\u2500 requirements.txt\r\n \u251c\u2500\u2500 Dockerfile\r\n \u2514\u2500\u2500 \u2026'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3ed124cf5990&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Now onto the quest!&lt;/p&gt;&lt;h3&gt;Goal #1: Manual deployment with Cloud Build and Cloud Run&lt;/h3&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit-1.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/life-of-commit-1.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;You will be building and deploying your containers using Google Cloud products, &lt;a href="https://cloud.google.com/build"&gt;Cloud Build&lt;/a&gt; and &lt;a href="https://cloud.google.com/run/docs/"&gt;Cloud Run&lt;/a&gt;, via the Google Cloud CLI, also known as &lt;code&gt;gcloud&lt;/code&gt;. &lt;/p&gt;&lt;p&gt;Within an open terminal, you will be setting up the following environment variables that declare the Google Cloud project ID,  and which region you will be basing your project from. You will also need to enable the following product APIs (Cloud Run, Cloud Build &amp;amp; Artifact Registry APIs) within the Google Cloud project.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'# setting environment variables\r\nexport PROJECT_ID=&amp;lt;add your project ID&amp;gt;\r\nexport REGION=&amp;lt;add your region/location&amp;gt;\r\n\r\n# enable relevant apis\r\ngcloud services enable run.googleapis.com \\\r\nartifactregistry.googleapis.com compute.googleapis.com cloudbuild.googleapis.com \r\n\r\n# update gcloud with project id and region\r\ngcloud config set project $PROJECT_ID\r\ngcloud config set compute/region $REGION'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3ed124cf5bd0&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;The container images you create from the &lt;code&gt;server-side/&lt;/code&gt; directory will be stored in an image repository named “epic-quest”, managed by Artifact Registry.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'gcloud artifacts repositories create epic-quest --repository-format="DOCKER" --location=$REGION'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3ed124cf5b90&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Now that the “epic quest” Artifact Registry repository has been created you can begin pushing container images to it! Use &lt;code&gt;gcloud builds submit&lt;/code&gt; to build and tag an image from the &lt;code&gt;server-side/&lt;/code&gt; directory with Artifact Registry repository specific format: &lt;code&gt;&amp;lt;region&amp;gt;-docker.pkg.dev/&amp;lt;project-id&amp;gt;/&amp;lt;repository&amp;gt;&lt;/code&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'# root: epic-quest-project\r\ncd server-side/\r\n\r\n# root: create "server-side" image\r\ngcloud builds submit . --tag $REGION-docker.pkg.dev/$PROJECT_ID/epic-quest/server-side'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3ed124cf5190&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;After pushing your &lt;code&gt;server-side&lt;/code&gt; container image to the Artifact Registry repository, you’re all set to deploy it with Cloud Run!&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'gcloud run deploy --image=$REGION-docker.pkg.dev/$PROJECT_ID/epic-quest/server-side'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3ed124cf5050&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'Service name: server-side\r\nAllow unauthenticated invocations to [server-side] (y/N)? y'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3ed124cf5290&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Excellent, you’ve created a basic manual CI/CD pipeline! Now, you can explore what it looks like to have this pipeline automated.&lt;/p&gt;&lt;h3&gt;Goal #2: Build an image with a Cloud Build trigger&lt;/h3&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit-2.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image4_MkMoDcK.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;To start automating your small pipeline you will need to create a cloudbuild.yaml file that will configure your first &lt;a href="https://cloud.google.com/build/docs/automating-builds/create-manage-triggers"&gt;Cloud Build trigger&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;In the ops directory of &lt;code&gt;epic-quest-project&lt;/code&gt;, create a new file named &lt;code&gt;api-build.cloudbuild.yaml&lt;/code&gt;. This new yaml file will describe the steps Cloud Build will use to build your container image and push it to the Artifact Registry.&lt;/p&gt;&lt;p&gt;(Emblem: &lt;a href="https://github.com/GoogleCloudPlatform/emblem/blob/main/ops/api-build.cloudbuild.yaml" target="_blank"&gt;&lt;code&gt;ops/api-build.cloudbuild.yaml&lt;/code&gt;&lt;/a&gt;)&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u"touch ops/api-build.cloudbuild.yaml \r\n\r\n\r\n# api-build.cloudbuild.yaml contents\r\n\r\nsteps:\r\n # Docker Build \r\n - name: 'gcr.io/cloud-builders/docker'\r\n args: \r\n - 'build'\r\n - '-t'\r\n - '${_REGION}-docker.pkg.dev/${PROJECT_ID}/epic-quest/server-side:${_IMAGE_TAG}'\r\n - 'server-side/.'\r\n\r\n# Store in Artifact Registry\r\nimages:\r\n - '${_REGION}-docker.pkg.dev/${PROJECT_ID}/epic-quest/server-side:${_IMAGE_TAG}'"), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3ed116cd5c10&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;To configure Cloud Build to automatically execute the steps in the above yaml, use the &lt;a href="https://console.cloud.google.com/cloud-build/triggers"&gt;Cloud Console&lt;/a&gt; to create a new Cloud Build trigger:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit-3.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/life-of-commit-3.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Remember to select `Push to a branch` as  the event that will activate the build trigger and, under `Source` connect your “epic-quest-project” Github repository.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit-4.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image13_wubg25e.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;You may need to authenticate with your GitHub account credentials to connect a repository to your Google Cloud project. Once you have a repository connected, specify the location of the cloud build configuration in that repository:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit-5.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image5_Lj9XErS.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/life-of-commit.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Submitting this configuration will create a new trigger named &lt;code&gt;api-new-build&lt;/code&gt; that will be invoked whenever a change is committed and merged into the main branch of the repository with changes to the &lt;code&gt;server-side/&lt;/code&gt; folder.&lt;/p&gt;&lt;p&gt;After committing your changes to server-side/ files locally, you can verify this trigger works by merging a new commit into the main branch of your repository. Once merged, you will be able to observe the build trigger at work in the &lt;a href="https://console.cloud.google.com/cloud-build/builds"&gt;Build History&lt;/a&gt; page of the Cloud Console.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit-7.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/life-of-commit-7.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Excellent, the container build is now automated! How will Cloud Run know when a new build is ready to deploy? Enter Pub/Sub.&lt;/p&gt;&lt;h3&gt;Goal #3: React to Cloud Build events with Pub/Sub&lt;/h3&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit-8.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image6_kXsitZQ.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;By default, Artifact Registry will &lt;a href="https://cloud.google.com/artifact-registry/docs/configure-notifications"&gt;publish&lt;/a&gt; messages about changes in its repositories to a Pub/Sub topic named &lt;code&gt;gcr&lt;/code&gt; if it exists. Let’s take advantage of that feature for your next Cloud Build trigger. First, create a Pub/Sub topic named &lt;code&gt;gcr&lt;/code&gt;:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'gcloud pubsub topics create gcr'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3ed127e6a050&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Now, every time a new build is pushed to any Artifact Registry repository, a message is published to the &lt;code&gt;gcr&lt;/code&gt; topic with a build digest that identifies that build. Next it’s time to configure your second trigger to complete the automated deployment pipeline. &lt;/p&gt;&lt;h3&gt;Goal #4: Deploy with a Cloud Build trigger&lt;/h3&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit-9.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image12_JvGQkSU.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Now you’re arriving at the final step, creating the deployment trigger! This Cloud Build trigger is the last link to complete your automated deployment story.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;hr/&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;b&gt;Note&lt;/b&gt;: Read more about our opinionated way to perform this step here using &lt;a href="https://cloud.google.com/run/docs/continuous-deployment-with-cloud-build"&gt;Cloud Run with checkbox CD&lt;/a&gt; and check out the new &lt;a href="https://cloud.google.com/deploy/docs/deploy-app-run"&gt;support for Cloud Run in Cloud Deploy&lt;/a&gt;.&lt;/p&gt;&lt;hr/&gt;&lt;p&gt;In the &lt;code&gt;ops&lt;/code&gt; directory of the &lt;code&gt;epic-quest-project&lt;/code&gt;, create a new file named &lt;code&gt;api-deploy.cloudbuild.yaml&lt;/code&gt;. In short, this will perform the deployment action of the new container image on your behalf. ( Emblem: &lt;a href="https://github.com/GoogleCloudPlatform/emblem/blob/main/ops/deploy.cloudbuild.yaml" target="_blank"&gt;&lt;code&gt;ops/deploy.cloudbuild.yaml&lt;/code&gt;&lt;/a&gt;) .&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'touch ops/api-deploy.cloudbuild.yaml \r\n\r\n\r\n# api-deploy.cloudbuild.yaml contents\r\n\r\nsteps:\r\n # Print the full Pub/Sub message for debugging\r\n - id: "Echo Pub/Sub message" \r\n name: gcr.io/cloud-builders/gcloud\r\n entrypoint: /bin/bash\r\n args:\r\n - \'-c\'\r\n - |\r\n echo ${_BODY}\r\n\r\n # Cloud Run Deploy\r\n - id: "Deploy to Cloud Run"\r\n name: gcr.io/cloud-builders/gcloud\r\n args:\r\n - run\r\n - deploy\r\n - ${_SERVICE}\r\n - --image=${_IMAGE_NAME}\r\n - --region=${_REGION}\r\n - --revision-suffix=${_REVISION}\r\n - --project=${_PROJECT_ID}\r\n - --allow-unauthenticated\r\n - --tag=${_IMAGE_TAG}'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3ed12606b710&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;The first step in this Cloud Build configuration will print to the build job log the body of the message published by Artifact Registry and the second step will deploy to Cloud Run.&lt;/p&gt;&lt;p&gt;Open the &lt;a href="https://console.cloud.google.com/cloud-build/triggers"&gt;console&lt;/a&gt; and create another new Cloud Build Trigger with the following configuration:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit-10.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/life-of-commit-10.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Instead of choosing a repository event like in the api-build trigger, select Pub/Sub message to create a subscription to the desired Pub/Sub topic along with the trigger:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit-11.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/life-of-commit-11.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Once again, provide the location of the corresponding Cloud Build configuration file in the repository. Additionally, include values for the substitution variables that exist in the configuration file. Those variables are identifiable by the underscore prefix (_). Note that the &lt;code&gt;_BODY,  _IMAGE_NAME&lt;/code&gt; and &lt;code&gt;_REVISION&lt;/code&gt; variables reference data included in the body of the Pub/Sub message:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="life-of-commit-12.png" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/life-of-commit-12.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;The Cloud Build service account by default will initiate the deployment to Cloud Run, so it will need to have &lt;a href="https://cloud.google.com/run/docs/deploying#permissions_required_to_deploy"&gt;Cloud Run Developer and Service Account User IAM roles&lt;/a&gt;granted to it in the project where the Cloud Run services reside.&lt;/p&gt;&lt;p&gt;After granting those roles, check that the pipeline is working by creating a commit to the &lt;code&gt;server-side/&lt;/code&gt; directory in your &lt;code&gt;epic-quest-project&lt;/code&gt; GitHub repository. It should result in the automatic invocation of the &lt;code&gt;api-new-build&lt;/code&gt; trigger followed closely by the &lt;code&gt;api-deploy&lt;/code&gt; trigger, and finally with a new revision in the corresponding Cloud Run service.&lt;/p&gt;&lt;p&gt;Your final project setup should similar to the following:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'epic-quest-project/\r\n\u2514\u2500\u2500 ops/\r\n \u251c\u2500\u2500 api-build.cloudbuild.yaml\r\n \u2514\u2500\u2500 api-deploy.cloudbuild.yaml\r\n\u2514\u2500\u2500 server-side/\r\n \u251c\u2500\u2500 main.py\r\n \u251c\u2500\u2500 requirements.txt\r\n \u251c\u2500\u2500 Dockerfile\r\n \u2514\u2500\u2500 \u2026'), (u'language', u''), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3ed1249939d0&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Quest complete!&lt;/h3&gt;&lt;p&gt;Excellent, you now have a shiny automated pipeline and leveled up your deployment game! &lt;/p&gt;&lt;p&gt;&lt;i&gt;After reading today’s post, we hope you have a better understanding of how to manually create and spin up a container using just Cloud Build and Cloud Run, use Cloud Build triggers to react to Github repository actions, writing cloudbuild.yaml files to add additional configuration to your build triggers, and magical benefits of using Artifact Registry repositories.&lt;/i&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;If you want to learn even more, check out the open source serverless project &lt;a href="https://github.com/GoogleCloudPlatform/emblem" target="_blank"&gt;Emblem&lt;/a&gt; on Github.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Mon, 19 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/the-squires-guide-to-automated-deployments-with-cloud-build/</guid><category>Google Cloud</category><category>Startups</category><category>Serverless</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>The Squire’s guide to automated deployments with Cloud Build</title><description>Getting started with your first automated deployment pipeline using open source project Emblem featuring Google Cloud Serverless products like Cloud Run, Cloud Build, Artifact Registry, and Pub/Sub.</description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/the-squires-guide-to-automated-deployments-with-cloud-build/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Patricia Shin</name><title>Cloud Developer Relations Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Roger Martinez</name><title>Cloud Developer Relations Engineer</title><department></department><company></company></author></item><item><title>Automate data governance, extend your data fabric with Dataplex-BigLake integration</title><link>https://cloud.google.com/blog/products/data-analytics/automate-data-governance-with-google-cloud-dataplex-and-biglake/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Unlocking the full potential of data requires breaking down the silo between open-source data formats and data warehouses. At the same time, it is critical to enable &lt;a href="https://cloud.google.com/learn/what-is-data-governance"&gt;data governance&lt;/a&gt; team to apply policies regardless of where the data happens, whether - on file  or columnar storage. &lt;/p&gt;&lt;p&gt;Today,  data governance teams have to become subject matter experts on each storage system the corporate data happens to reside on. Since February 2022,  Dataplex has offered a unified  place to apply policies, which are propagated across both lake storage and data warehouses in GCP. Rather than specifying policies in multiple places, bearing the cognitive load of translating policies from “what you want the storage system to do” to “how your data should behave” Dataplex offers a single point for unambiguous policy management.  Now, we are making it easier for you to use &lt;a href="https://cloud.google.com/blog/products/data-analytics/unify-data-lakes-and-warehouses-with-biglake-now-generally-available"&gt;BigLake&lt;/a&gt;.  &lt;/p&gt;&lt;p&gt;Earlier this year, we launched BigLake into general availability, BigLake unifies data fabric between Data Lakes and Data Warehouses by extending &lt;a href="https://cloud.google.com/bigquery"&gt;BigQuery&lt;/a&gt; storage to open file formats. Today, we announce BigLake Integration with &lt;a href="https://cloud.google.com/dataplex"&gt;Dataplex&lt;/a&gt; (available in preview). This integration eliminates the configuration steps for the admin taking advantage of BigLake and managing policies across GCS and BigQuery from a unified console. &lt;/p&gt;&lt;p&gt;Previously,  you could point Dataplex at a &lt;a href="https://cloud.google.com/storage"&gt;Google Cloud Storage (GCS)&lt;/a&gt; bucket, and Dataplex will &lt;a href="https://cloud.google.com/dataplex/docs/discover-data"&gt;discover&lt;/a&gt; and extract all metadata from the data lake and register this metadata in BigQuery (and Dataproc Metastore, Data Catalog) for analysis and search. With the BigLake integration capability, we are building on this capability by allowing an “upgrade” of a bucket asset, and instead of just creating external tables in BigQuery for analysis - Dataplex will create policy-capable BigLake tables! &lt;/p&gt;&lt;p&gt;The immediate implication is that admins can now assign column, row, and table policies to the BigLake tables auto-created by Dataplex, as with BigLake - the infrastructure (GCS) layer is separate from the analysis layer (BigQuery). Dataplex will handle the creation of a BigQuery connection and a BigQuery publishing dataset and ensure the BigQuery service account has the correct permissions on the bucket.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Dataplex-BigLake 121622.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Dataplex-BigLake_121622.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;But wait - there’s more.&lt;/p&gt;&lt;p&gt;With this release of Dataplex, we are also introducing advanced logging called governance logs.  Governance logs allow tracking the exact state of policy propagation to tables and columns - adding an additional level of detail going beyond the high-level “status” for the bucket and into fine-grained status and logs for tables, columns. &lt;/p&gt;&lt;h3&gt;What’s next? &lt;/h3&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;We have updated our documentation for &lt;a href="https://cloud.google.com/dataplex/docs/manage-assets"&gt;managing buckets&lt;/a&gt; and have additional detail regarding policy propagation and the upgrade process.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Stay tuned for an exciting  roadmap ahead, with more automation around policy management.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;For more information, please visit:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/dataplex"&gt;Google Cloud Dataplex&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Fri, 16 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/automate-data-governance-with-google-cloud-dataplex-and-biglake/</guid><category>Google Cloud</category><category>Infrastructure Modernization</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Automate data governance, extend your data fabric with Dataplex-BigLake integration</title><description>Learn how to automate data governance and your data fabric with Dataplex &amp; BigLake integration. Allow centralizing policies in data lakes &amp; warehouses.</description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/automate-data-governance-with-google-cloud-dataplex-and-biglake/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Uri Gilad</name><title>Group Product Manager - Data Governance, Google Cloud</title><department></department><company></company></author></item><item><title>How HSBC is upskilling at scale with Google Cloud</title><link>https://cloud.google.com/blog/topics/training-certifications/hsbc-upskilled-at-scale-with-google-cloud/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;i&gt;&lt;b&gt;Editor’s note&lt;/b&gt;: Founded in 1865, HSBC is one of the world’s largest banking and financial services organizations. In today’s post, Adrian Phelan, Global Head of Google Cloud, HSBC, explains how the organization is working with Google Cloud to drive cloud adoption at scale. &lt;/i&gt;&lt;/p&gt;&lt;hr/&gt;&lt;p&gt;Close to &lt;a href="https://www.mckinsey.com/capabilities/people-and-organizational-performance/our-insights/five-fifty-the-skillful-corporation" target="_blank"&gt;90% of corporations&lt;/a&gt; say they’re affected by digital skills gaps, or expect to be within the next five years. Technologies and business models are evolving rapidly, and companies are deploying a multi-pronged approach to ensure they have the right skills in the right places.&lt;/p&gt;&lt;p&gt;Here at HSBC, one of the bank’s strategic priorities is digitizing at scale. As people operate in a more digital world, we want to supply them with services quickly and in ways they want to use them. We initially worked with &lt;a href="https://cloud.google.com/"&gt;Google Cloud&lt;/a&gt; to implement more than 1,700 data analytics, customer experience, cybersecurity and emission reduction projects. A big part of rolling these out has been getting our teams skilled in the right way.&lt;/p&gt;&lt;h3&gt;Empowering employees with a culture of learning&lt;/h3&gt;&lt;p&gt;This approach has evolved over time, but central to it has been proactively instilling a culture of learning. We started out in 2018 with a few small-scale training projects, and it quickly became clear that the teams who had participated in them delivered better and faster than those who hadn’t. They were also more independent and less dependent on central expertise.&lt;/p&gt;&lt;p&gt;This inspired us to scale up our learning programs across the organization, which was a challenge because of the sheer size of our technical staff: tens of thousands of employees.  &lt;/p&gt;&lt;p&gt;After some really positive feedback for our early training programs with Google Cloud, we set up our Google Accelerated Certification Program (GACP). It’s a 10-week blended learning model including self-learning, case studies, and hands-on practice followed by an examination preparation boot camp.&lt;/p&gt;&lt;p&gt;This combination of theory and practice in a safe environment helped build employees’ confidence. Two thousand people have gone through this training so far, and it’s really helped accelerate their journey towards achieving Google Cloud certification. The learning programs also offer other digital credentials, such as completion badges and skill badges, which provide encouragement and help participants measure their progress.  &lt;/p&gt;&lt;h3&gt;Company-wide knowledge building&lt;/h3&gt;&lt;p&gt;When we started our learning journey, we focused on IT roles for obvious reasons, but we are increasingly moving towards training people in business functions.&lt;/p&gt;&lt;p&gt;One of our aims is to educate our less technical employees about the broad capabilities that exist within the cloud. IT teams are often the ones to say, "Hey, we could do this in a better, more efficient, different way by using the cloud", and to make that happen we need to work in close collaboration with our business colleagues, so it’s equally important that they understand the technology.&lt;/p&gt;&lt;p&gt;To enable this kind of innovation, you have to educate the whole organization in the ‘art of the possible’. One of the ways we did that was by organizing a month-long Cloud Festival that reached 10,000 employees, which included three Google Cloud sessions. This really helped us build a foundational level of knowledge with business and technology colleagues across the organization.&lt;/p&gt;&lt;p&gt;As we continue along our training path, interest in the cloud within the organization continues to increase. Our channel for communicating any changes related to cloud technology, processes or ways of working now has an audience of close to 8,000 employees. &lt;/p&gt;&lt;h3&gt;Looking to the future with targeted training &lt;/h3&gt;&lt;p&gt;The Google Cloud team has provided a lot of support in helping us get our training off the ground. It has always been a true process of co-creation, of listening, testing things, and seeing what works best. We meet weekly in order to keep our learning journey moving forward, listen to the demands of the business, understand what the pipeline of work is, and what the up and coming Google Cloud product launches are, so that we can stay one step ahead.&lt;/p&gt;&lt;p&gt;One example of this is the bespoke training we introduced for business leaders. So far, 250 senior business leaders have completed it with great feedback. They have told us that the program improved their understanding of how the cloud can help to more quickly meet customer expectations, increase speed to market, reduce overheads and grow revenue through new product streams and continuous innovation. It also covered potential business activities suitable for migration to the cloud.&lt;/p&gt;&lt;p&gt;When it comes to learning and training, you can either let it happen organically, or you can drive it. Our choice was to drive it and invest in it, and I’d highly recommend anybody trying to adopt cloud at scale does the same: they will see the return on that investment many times over. &lt;/p&gt;&lt;p&gt;Learn more about &lt;a href="https://cloud.google.com/training?hl=en"&gt;Google Cloud training and certification&lt;/a&gt;and the impact it can have on your team.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Fri, 16 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/topics/training-certifications/hsbc-upskilled-at-scale-with-google-cloud/</guid><category>Google Cloud</category><category>Training and Certifications</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/180502-hsbc-logo-london-3-1600x900_1.max-600x600.JPG" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How HSBC is upskilling at scale with Google Cloud</title><description>HSBC upskilled with Google Cloud: a culture of digitizing, targeted training and gaining certifications.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/180502-hsbc-logo-london-3-1600x900_1.max-600x600.JPG</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/training-certifications/hsbc-upskilled-at-scale-with-google-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Adrian Phelan</name><title>Global Head of Google Cloud, HSBC</title><department></department><company></company></author></item><item><title>BigQuery Omni: solving cross-cloud challenges by bringing analytics to your data</title><link>https://cloud.google.com/blog/products/data-analytics/cross-cloud-analytics-with-bigquery-omni-and-biglake/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;a href="https://www.crn.com/slide-shows/cloud/4-key-cloud-trends-that-will-influence-2021/2#:~:text=About%2090%20percent%20of%20enterprises,a%20single%20private%20cloud%20strategy." target="_blank"&gt;Research&lt;/a&gt; shows that over 90% of large organizations already deploy multicloud architectures, and their data is distributed across several public cloud providers. Additionally, data is also increasingly split across various storage systems such as warehouses, operational and relational databases, object stores, etc. With the proliferation of new applications, data is serving many more use cases such as data sciences, business intelligence, analytics, streaming and the list goes on. With these data trends, customers are increasingly gravitating towards an open multicloud data lake. However, multicloud data lakes present several challenges such as data silos, data duplication, fragmented governance, complexity of tools, and increased costs.&lt;/p&gt;&lt;p&gt;With Google's data cloud technologies, customers can leverage the unique combination of distributed cloud services. They can create an agile cross-cloud semantic business layer with Looker and manage data lakes and data warehouses across cloud environments at scale with BigQuery and capabilities like BigLake and BigQuery Omni. &lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/biglake"&gt;BigLake&lt;/a&gt; is a storage engine that unifies data warehouses and lake houses by standardizing across different storage formats including &lt;a href="https://cloud.google.com/bigquery"&gt;BigQuery&lt;/a&gt; managed table and open file formats such as Parquet and Apache Iceberg on object storage. &lt;a href="https://cloud.google.com/bigquery/docs/omni-introduction"&gt;BigQuery Omni&lt;/a&gt; provides the compute engine that runs locally to the storage on AWS or Azure, which customers can use to query data in AWS or Azure seamlessly. This provides several key benefits such as:&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;A single pane of glass to query your multicloud data lakes (across Google Cloud Platform, Amazon Web Services, and Microsoft Azure)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Cross-cloud analytics by combining data across different platforms with little to no egress costs&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Unified governance and secure management of your data wherever it resides&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="1 BigQuery Omni 121522.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_BigQuery_Omni_121522.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;In this blog, we will share cross-cloud analytics use cases customers are solving with Google’s Data Cloud and the benefits they are realizing.&lt;br/&gt;&lt;/p&gt;&lt;h3&gt;Unified marketing analytics for 360-degree insights&lt;/h3&gt;&lt;p&gt;Organizations want to perform marketing analytics - ads optimization, inventory management, churn prediction, buyer propensity trends and many more such analytics. To do this before BigQuery Omni, customers had to use data from several different sources such as Google Analytics, public datasets and other proprietary information stored across cloud environments. This requires moving large amounts of data, managing duplicate copies and incremental costs to perform any cross-cloud analytics and derive actionable insights. With BigQuery Omni, organizations are able to greatly simplify this workflow. Using the familiar BigQuery interface, users can access data residing in AWS or Azure, discover and select just the relevant data that needs to be combined for further analysis. This subset of data can be moved to Google Cloud using Omni’s new &lt;a href="https://cloud.google.com/blog/products/data-analytics/bq-omnis-cross-cloud-transfer-now-generally-available/"&gt;Cross-Cloud Transfer capabilities&lt;/a&gt;. Customers can combine this data with other Google Cloud datasets and these consolidated tables can be made available to key business stakeholders through advanced analytics tools such as Looker and Looker Studio. Customers are also able to tie in this data now with world class AI models via &lt;a href="https://cloud.google.com/vertex-ai"&gt;Vertex AI&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;As an illustrative example, consider a retailer who has sales &amp;amp; inventory, user and search data spread across multiple data silos. Using BigQuery Omni they can seamlessly bring these datasets together and power several marketing analytics scenarios like customer segmentation, campaign management and demand forecasting etc.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="2 BigQuery Omni 121522.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_BigQuery_Omni_121522.1000067520000438.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;i&gt;"Interested in performing cross-cloud analytics, we tested BigQuery Omni and really liked the SQL support to easily get data from AWS S3. We have seen great potential and value in BigQuery Omni for adopting a multi-cloud data strategy." — &lt;b&gt;Florian Valeye, Staff Data Engineer,&lt;/b&gt;&lt;a href="https://www.backmarket.com/en-us" target="_blank"&gt;&lt;b&gt;Back Market&lt;/b&gt;&lt;/a&gt;, a leading online marketplace for renewed technology based out of France&lt;/i&gt;&lt;br/&gt;&lt;/p&gt;&lt;h3&gt;Data platform with consistent and unified cross-cloud governance&lt;/h3&gt;&lt;p&gt;Another pattern is customers looking to analyze operational, transactional and business data across data silos in different clouds through a unified data platform. These data silos are a result of various factors such as merger and acquisitions, standardization of analytical tools, leveraging best of breed solutions in different clouds and diversification of data footprint across clouds. In addition to a single pane of glass for data access across silos, customers deeply desire consistent and uniform governance of their data across clouds. &lt;/p&gt;&lt;p&gt;&lt;i&gt;“Achieve is looking to deliver a consistent analytics experience to all our customers and stakeholders. With our financial and credit report data distributed across clouds, accessing and getting insights holistically is difficult. Through our exploration with Omni, we are able to access datasets in different clouds using a single familiar BigQuery interface; we see its promise as one of the primary tools in our multi-cloud platform." — &lt;b&gt;James Simonson, Senior Data Engineer,&lt;/b&gt;&lt;a href="https://www.achieve.com/" target="_blank"&gt;&lt;b&gt;Achieve&lt;/b&gt;&lt;/a&gt;&lt;/i&gt;&lt;br/&gt;&lt;/p&gt;&lt;p&gt;With BigLake and BigQuery Omni abstracting the storage and compute layers respectively, organizations can access and query their data in Google Cloud irrespective of where it resides. They can also set fine-grained row level and column access policies in BigQuery and consistently govern it across clouds. These building blocks enable data engineering teams to build a unified and governed data platform for their data users without having to deal with the complexity of building and managing complex data pipelines. Furthermore, with BigQuery Omni’s integration with Dataplex and Data Catalog, you can discover, search your data across clouds and enrich your data by adding relevant business context with business glossary and rich text.&lt;/p&gt;&lt;p&gt;&lt;i&gt;"Several SADA customers use GCP to build and manage their data analytics platform. During many explorations and proofs of concepts, our customers have seen the great potential and value in BigQuery Omni. Enabling seamless cross-cloud data analytics has allowed them to realize the value of their data quicker while lowering the barrier to entry for BigQuery adoption in a low-risk fashion." — &lt;b&gt;Brian Suk, Associate Chief Technology Officer,&lt;/b&gt;&lt;a href="https://sada.com/" target="_blank"&gt;&lt;b&gt;SADA&lt;/b&gt;&lt;/a&gt;&lt;b&gt;, one of the strategic partners of Google Cloud.&lt;/b&gt;&lt;/i&gt;&lt;br/&gt;&lt;/p&gt;&lt;h3&gt;Simplified data sharing between data providers and their customers&lt;/h3&gt;&lt;p&gt;A third emerging pattern in cross cloud analytics is data sharing. Several services have the business need to share information such as inventory data, subscriber data to their customers or users who in turn analyze or aggregate the data with their proprietary data and oftentimes share the results back with the service provider. In several cases, the two parties are on different cloud environments, requiring them to move data back and forth. &lt;/p&gt;&lt;p&gt;Consider a company  operating in the &lt;a href="https://www.actioniq.com/what-is-cdp/" target="_blank"&gt;customer data platform&lt;/a&gt; (CDP) space. CDPs were designed to help activate customer data, and a critical first step of that was unifying and managing that customer data. To enable this, many CDP vendors built their solution choosing one of the available cloud infrastructure technologies and copied data from the client’s systems.&lt;i&gt;“Copying data from client applications and infrastructure has always been a requirement to deploy a CDP, but it doesn’t have to be anymore" — &lt;b&gt;Justin DeBrabant, Senior Vice President of Product, &lt;a href="https://www.actioniq.com/" target="_blank"&gt;ActionIQ&lt;/a&gt;.&lt;/b&gt;&lt;/i&gt;&lt;/p&gt;&lt;p&gt;While a small percentage of customers are fine with moving data across cloud environments, the majority are hesitant to onboard new services and would rather prefer providing governed access to their data sets. &lt;/p&gt;&lt;p&gt;&lt;i&gt;“A new architectural pattern is emerging, allowing organizations to keep their data at one location and make it accessible, with the proper guardrails, to applications used by the rest of the organization’s stack”&lt;/i&gt;adds&lt;i&gt;&lt;b&gt;Justin at &lt;a href="https://www.actioniq.com/" target="_blank"&gt;ActionIQ&lt;/a&gt;.&lt;/b&gt;&lt;/i&gt;&lt;/p&gt;&lt;p&gt;With BigQuery Omni, services in Google Cloud Platform can more easily access and share data with their customers and users in other cloud environments with limited data movement. One of UK's largest statistics providers has explored Omni for their data sharing needs.&lt;/p&gt;&lt;p&gt;&lt;i&gt;"We tested BigQuery Omni and really like the ability to get data from AWS directly into BQ. We're excited about managing data sharing with different organizations without onboarding new clouds" – &lt;b&gt;Simon Sandford-Taylor, Chief Information and Digital Officer, &lt;a href="https://www.ons.gov.uk/" target="_blank"&gt;UK's Office for National Statistics&lt;/a&gt;&lt;/b&gt;&lt;/i&gt;&lt;/p&gt;&lt;p&gt;With BigQuery Omni, customers are able to:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Access and query data across clouds through a single user interface&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Reduce the need for data engineering before analyzing data&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Lower operational overhead and risks by deploying an application that runs across multiple clouds which leverages the same, consistent security controls&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Accelerate access to insights by significantly reducing the time for data processing and analysis &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Create consistent and predictable budgeting across multiple cloud footprints &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Enable long term agility and maximize the benefits every cloud investment&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Over the last year, we’ve seen great momentum in customer adoption and added significant innovations to BigQuery Omni including improved performance and scalability for querying your data in AWS S3 or Azure Blob Storage, &lt;a href="https://cloud.google.com/blog/products/data-analytics/announcing-apache-iceberg-support-for-biglake"&gt;Iceberg support for Omni&lt;/a&gt;, &lt;a href="https://cloud.google.com/bigquery/docs/omni-introduction"&gt;Larger query result set size up to 20GB&lt;/a&gt; and &lt;a href="https://cloud.google.com/blog/products/data-analytics/bq-omnis-cross-cloud-transfer-now-generally-available/"&gt;Cross-cloud transfer&lt;/a&gt; that helps customers easily, securely, and cost effectively move just enough data across cloud environments for advanced analytics. &lt;/p&gt;&lt;p&gt;BigQuery Omni has launched several features to support unified governance of your data across multiple clouds - you can get fine-grained access to your multi-cloud data with &lt;a href="https://cloud.google.com/bigquery/docs/row-level-security-intro"&gt;row level&lt;/a&gt; and &lt;a href="https://cloud.google.com/bigquery/docs/column-level-security-intro"&gt;column level security&lt;/a&gt;. Building on this, we are excited to announce that BigQuery Omni now supports &lt;a href="https://cloud.google.com/bigquery/docs/column-data-masking-intro"&gt;data masking&lt;/a&gt;. We’ve also made it easy for customers to try and see the benefits of BigQuery Omni through the &lt;a href="https://cloud.google.com/bigquery/pricing#bqomni"&gt;limited time free tria&lt;/a&gt;l available until March 30, 2023. &lt;/p&gt;&lt;p&gt;BigQuery Omni running on other public clouds outside of Google Cloud is available in AWS US East1 (N.Virginia) and Azure US East2 (US East) regions. We are also excited to share that we will be bringing BigQuery Omni to more regions in the future, starting with Asia Pacific (AWS Korea) coming soon.&lt;/p&gt;&lt;h3&gt;Getting Started&lt;/h3&gt;&lt;p&gt;Get started with a &lt;a href="https://console.cloud.google.com/freetrial?facet_utm_source=%28direct%29&amp;amp;facet_utm_campaign=%28direct%29&amp;amp;facet_utm_medium=%28none%29&amp;amp;facet_url=https%3A%2F%2Fcloud.google.com%2Fbigquery&amp;amp;facet_id_list=%5B39300012%2C+39300020%2C+39300118%2C+39300196%2C+39300241%2C+39300319%2C+39300322%2C+39300324%2C+39300333%2C+39300345%2C+39300354%2C+39300364%2C+39300373%2C+39300412%2C+39300422%2C+39300436%5D&amp;amp;_ga=2.103918912.1581262585.1669960592-594615328.1669960592"&gt;free trial&lt;/a&gt; to learn about Omni. Check out the &lt;a href="https://cloud.google.com/bigquery/docs/omni-introduction?utm_source=forbes&amp;amp;utm_medium=display&amp;amp;utm_campaign=2022-forbes-brand-voice&amp;amp;utm_content=cross_cloud_analytics_documentation&amp;amp;utm_term=-"&gt;documentation&lt;/a&gt; to learn more about BigQuery Omni. You can also leverage the &lt;a href="https://www.cloudskillsboost.google/focuses/49746?parent=catalog" target="_blank"&gt;self paced labs&lt;/a&gt; to learn how to set up BigQuery Omni easily.&lt;br/&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Thu, 15 Dec 2022 18:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/cross-cloud-analytics-with-bigquery-omni-and-biglake/</guid><category>Google Cloud</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>BigQuery Omni: solving cross-cloud challenges by bringing analytics to your data</title><description>Customers can solve marketing analytics, data governance and data sharing challenges with cross-cloud analytics.</description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/cross-cloud-analytics-with-bigquery-omni-and-biglake/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Vidya Shanmugam</name><title>Product Manager, BigQuery</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Manoj Gunti</name><title>Product Marketing Manager, BigQuery</title><department></department><company></company></author></item><item><title>Efficient PyTorch training with Vertex AI</title><link>https://cloud.google.com/blog/products/ai-machine-learning/efficient-pytorch-training-with-vertex-ai/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/vertex-ai"&gt;Vertex AI&lt;/a&gt; provides flexible and scalable hardware and secured infrastructure to train PyTorch based deep learning models with pre-built containers and custom containers. For model training with large amounts of data, using the distributed training paradigm and reading data from &lt;a href="https://cloud.google.com/storage"&gt;Cloud Storage&lt;/a&gt; is the best practice. However, training with data on the cloud such as remote storage on Cloud Storage, introduces a new set of challenges. For example, when a dataset consists of many small individual files, randomly accessing them can introduce network overhead. Another challenge is data throughput, the speed at which data is fed to the hardware accelerators (GPU) to keep them fully utilized.&lt;/p&gt;&lt;p&gt;In this post, we walk through methods to improve training performance step-by-step, starting first without distributed training followed by distributed training paradigms using data on cloud. Finally we can boost the training by 6x faster with data on Cloud Storage approaching the same speed as data on a local disk. We will show how &lt;a href="https://cloud.google.com/vertex-ai/docs/training/custom-training"&gt;Vertex AI Training&lt;/a&gt; service with &lt;a href="https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments"&gt;Vertex AI Experiments&lt;/a&gt; and &lt;a href="https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview"&gt;Vertex AI TensorBoard&lt;/a&gt; can be used to keep track of experiments and results.&lt;/p&gt;&lt;p&gt;You can find the accompanying code for this blog post on the &lt;a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/main/community-content/pytorch_efficient_training" target="_blank"&gt;GitHub Repo&lt;/a&gt;.&lt;/p&gt;&lt;h2&gt;PyTorch distributed training&lt;/h2&gt;&lt;p&gt;PyTorch natively supports &lt;a href="https://pytorch.org/tutorials/beginner/dist_overview.html" target="_blank"&gt;distributed training strategies&lt;/a&gt;. &lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;DataParallel (DP)&lt;/b&gt; is a simple strategy often used for single-machine multi-GPU training, but the single process it relies on could be the bottleneck of performance. This approach loads an entire mini-batch on the main thread and then scatters the sub mini-batches across the GPUs. The model parameters are only updated on the main GPU and then broadcasted to other GPUs at the beginning of the next iteration.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;DistributedDataParallel (DDP)&lt;/b&gt; fits multi-node multi-GPU scenarios where the model is replicated on each device which is controlled by an individual process. Each process loads its own mini-batch and passes them to its GPU. Each process also has its own optimizer with no parameter broadcast reducing the communication overhead. Finally, an all-reduce operation is performed across GPUs unlike DP. This multi-process benefits the training performance.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;FullyShardedDataParallel (FSDP)&lt;/b&gt; is another data parallel paradigm similar to DDP, which enables fitting more data and larger models by sharding the optimizer states, gradients and parameters into multiple FSDP units, unlike DDP where model parameters are replicated on each GPU.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Different distributed training strategies can ideally fit different training scenarios. However, sometimes it is not easy to pick the best one for specific environment configurations. For example, effectiveness of data loading pipeline to GPUs, batch size and network bandwidth in a multi-node setup can affect performance of a distributed training strategy.&lt;/p&gt;&lt;p&gt;In post, we will use PyTorch &lt;a href="https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html" target="_blank"&gt;ResNet-50&lt;/a&gt; as the example model and train it on &lt;a href="https://www.image-net.org/" target="_blank"&gt;ImageNet validation data&lt;/a&gt; (50K images) to measure the training performance for different training strategies.&lt;/p&gt;&lt;h2&gt;Demonstration&lt;/h2&gt;&lt;h3&gt;Environment configurations&lt;/h3&gt;&lt;p&gt;For the test environment, we create custom jobs on Vertex AI Training with following setup:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="1 PyTorch training 121522.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_PyTorch_training_121522.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Here are training hyperparameters setup for all of the following experiments:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="2 PyTorch training 121522.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_PyTorch_training_121522.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;For each of the following experiments, we train the model for 10 epochs and use the averaged epoch time as the training performance. Please note that we focused on improving the training time and not on the model performance itself.&lt;/p&gt;&lt;h3&gt;Read data from Cloud Storage with &lt;code&gt;gcsfuse&lt;/code&gt; and WebDataset&lt;/h3&gt;&lt;p&gt;We use &lt;a href="https://github.com/GoogleCloudPlatform/gcsfuse" target="_blank"&gt;gcsfuse&lt;/a&gt; to access data on &lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/cloud-storage-file-system-ai-training"&gt;Cloud Storage from Vertex AI Training&lt;/a&gt; jobs. Vertex AI training jobs have Cloud Storage buckets already mounted via gcsfuse and there is no additional work required to use gcsfuse. With &lt;a href="https://cloud.google.com/vertex-ai/docs/training/code-requirements#fuse"&gt;gcsfuse training jobs on Vertex AI&lt;/a&gt; can access data on Cloud Storage as simply as files in the local file system. This also provides high throughput for large file sequential reads.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u"open('/gcs/test-bucket/path/to/object', 'r')"), (u'language', u'lang-py'), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e5c6a006a50&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Data loading pipeline could be a bottleneck of distributed training when it reads individual data files from the cloud. &lt;a href="https://github.com/webdataset/webdataset" target="_blank"&gt;WebDataset&lt;/a&gt; is a PyTorch dataset implementation designed to improve streaming data access especially in remote storage settings. The idea behind WebDataset is similar to &lt;a href="https://www.tensorflow.org/tutorials/load_data/tfrecord" target="_blank"&gt;TFRecord&lt;/a&gt;, it collects multiple raw data files and compiles them into one &lt;a href="https://ftp.gnu.org/old-gnu/Manuals/tar-1.12/html_node/tar_117.html" target="_blank"&gt;POSIX tar&lt;/a&gt; file. But unlike TFRecord, it doesn’t do any format conversion and doesn’t assign object semantics to data and the data format is the same in the tar file as it is on disk. Refer to &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/scaling-deep-learning-workloads-pytorch-xla-and-cloud-tpu-vm"&gt;this blog post&lt;/a&gt; for key pipeline performance enhancements we can achieve with WebDataset.&lt;/p&gt;&lt;p&gt;WebDataset shards a large number of individual images into a small number of tar files. During training, each single network request will be able to fetch multiple images and cache them locally for the next couple of batches. Thus the sequential I/O allows much lower overhead of network communication. In the below demonstration, we will see the difference between training using data on Cloud Storage with and without WebDataset using gcsfuse.&lt;/p&gt;&lt;p&gt;&lt;b&gt;NOTE&lt;/b&gt;: WebDataset has been incorporated into the official &lt;a href="https://github.com/pytorch/data" target="_blank"&gt;TorchData&lt;/a&gt; library as &lt;a href="https://pytorch.org/data/beta/generated/torchdata.datapipes.iter.WebDataset.html#torchdata.datapipes.iter.WebDataset" target="_blank"&gt;torchdata.datapipes.iter.WebDataset&lt;/a&gt;. But the TorchData lib is currently in the &lt;b&gt;Beta&lt;/b&gt; stage and doesn’t have a stable version. So we stick to the original WebDataset as the dependency.&lt;/p&gt;&lt;h3&gt;Without distributed training&lt;/h3&gt;&lt;p&gt;We train the ResNet-50 on one single GPU first to get a baseline performance:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="3 PyTorch training 121522.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_PyTorch_training_121522.1000064120000310.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;From the result we can see that, when training on one single GPU, using data on Cloud Storage takes about 2x the time of using a local disk. Keep this in mind, we will use multiple methods to improve the performance step by step.&lt;/p&gt;&lt;h3&gt;DataParallel (DP)&lt;/h3&gt;&lt;p&gt;The &lt;a href="https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html" target="_blank"&gt;DataParallel&lt;/a&gt; strategy is the simplest method introduced by PyTorch to enable single-machine multiple-GPU training with the smallest code change. Actually as small as one line code change:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'model = torch.nn.DataParallel(model)'), (u'language', u'lang-py'), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e5c69e8df10&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;We train the ResNet-50 on single node with 4 GPUs using the DP strategy:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="4 PyTorch training 121522.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_PyTorch_training_121522.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;After applying DP on 4 GPUs, we can see that:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Training with data on the local disk gets 3x faster (from 489s to 157s).&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Training with data on Cloud Storage gets faster a little bit (from 804s to 738s).&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;It’s apparent that the distributed training with data on Cloud Storage becomes an input bound training, waiting for data to be read due to network bottleneck.&lt;/p&gt;&lt;h3&gt;DistributedDataParallel (DDP)&lt;/h3&gt;&lt;p&gt;&lt;a href="https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html" target="_blank"&gt;DistributedDataParallel&lt;/a&gt; is more sophisticated and powerful than DataParallel. It’s recommended to use DDP over DP, despite the added complexity, because DP is single-process multi-thread which suffers from Python GIL contention and DDP can fit more scenarios like multi-node and model-parallel. Here we experimented with DDP on a single node with 4 GPUs where each GPU is handled by an individual process.&lt;/p&gt;&lt;p&gt;We use the &lt;code&gt;nccl&lt;/code&gt; backend to initialize the process group for DDP and construct the model:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u"dist.init_process_group(\r\n backend='nccl', init_method='env://',\r\n world_size=4, rank=rank)\r\n\r\ntorch.nn.parallel.DistributedDataParallel(model)"), (u'language', u'lang-py'), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e5c4e54d590&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;We train the ResNet-50 on 4 GPUs using the DDP strategy and WebDataset:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="5 PyTorch training 121522.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_PyTorch_training_121522.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;After enabling DDP on 4 GPUs, we can see that:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Training with data on the local disk gets further faster than DP (from 157s to 134s).&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Training with data on Cloud Storage gets much better (from 738s to 432s), but it is 3x times slower than using a local disk.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Training with data on Cloud Storage gets a lot faster (from 432s to 133s) when using source files in WebDataset format, which is very close or as good as to the speed of training with data on the local disk.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The input bound problem is kind of relieved when using DDP, which is expected because there’s no Python GIL contention any more for reading data. And despite the addition of data preprocessing work, sharding data with WebDataset benefits the performance by removing the overhead of network communication. Finally,  DDP and WebDataset improve training performance by 6x (from 804s to 133s) in comparison to without distributed training and individual smaller files.&lt;/p&gt;&lt;h3&gt;FullyShardedDataParallel (FSDP)&lt;/h3&gt;&lt;p&gt;&lt;a href="https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/" target="_blank"&gt;FullyShardedDataParallel&lt;/a&gt; wraps model layers into FSDP units. It gathers full parameters before the forward and backward operations and runs reduce-scatter to synchronize gradients. It achieves lower peak memory usage than DDP with some configurations.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u'# policy to recursively wrap layers with FSDP\r\nfsdp_auto_wrap_policy = functools.partial(\r\n size_based_auto_wrap_policy, \r\n min_num_params=100)\r\n\r\n# construct the model to shard model parameters \r\n# across data parallel workers\r\nmodel = torch.distributed.fsdp.FullyShardedDataParallel(\r\n model, \r\n auto_wrap_policy=fsdp_auto_wrap_policy)'), (u'language', u'lang-py'), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e5c6b4d8190&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;We train the ResNet-50 on 4 GPUs using the FSDP strategy and WebDataset:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="6 PyTorch training 121522.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/6_PyTorch_training_121522.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;We can see that using FSDP achieves a similar training performance as DDP in this configuration on a single node with 4 GPUs.&lt;/p&gt;&lt;p&gt;Comparing performance across these different training strategies, with and without WebDataset format, we see an overall 6x performance improvement with data on Cloud Storage using WebDataset and choosing DistributedDataParallel or FullyShardedDataParallel distributed training strategies. The training performance with data on Cloud Storage is similar to when trained with data on a local disk.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="7 PyTorch training 121522.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/7_PyTorch_training_121522.0427027308410533.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h2&gt;Tracking with Vertex AI TensorBoard and Experiments&lt;/h2&gt;&lt;p&gt;As you have seen so far, we carried out performance improvement trials step-by-step and it was necessary to run the experiments with several configurations and track the development and outcome. &lt;a href="https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments"&gt;Vertex AI Experiments&lt;/a&gt; enable seamless experimentation along with tracking. You can track parameters, visualize and compare the performance metrics of your model and pipeline experiments.&lt;/p&gt;&lt;p&gt;You would use &lt;a href="https://cloud.google.com/vertex-ai/docs/start/client-libraries#python"&gt;Vertex AI Python SDK&lt;/a&gt; to create an experiment, and log both parameters, metrics, and artifacts associated with experiment runs. The SDK provides a handy initialization method to create a TensorBoard instance using &lt;a href="https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview"&gt;Vertex AI TensorBoard&lt;/a&gt; for logging model time series metrics. For example, we tracked training loss, validation accuracy and training run times for each epoch.&lt;/p&gt;&lt;p&gt;Below is the snippet to start an experiment, log model parameters, run the training job and track metrics at the end of the training session:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-code"&gt;&lt;dl&gt;&lt;dt&gt;code_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'code', u"# Create Tensorboard instance and initialize Vertex AI client\r\nTENSORBOARD_RESOURCE_NAME = aiplatform.Tensorboard.create()\r\naiplatform.init(project=PROJECT_ID,\r\n location=REGION,\r\n experiment=EXPERIMENT_NAME,\r\n experiment_tensorboard=TENSORBOARD_RESOURCE_NAME,\r\n staging_bucket=BUCKET_URI)\r\n\r\n# start experiment run\r\naiplatform.start_run(EXPERIMENT_RUN_NAME)\r\n\r\n# log parameters to the experiment\r\naiplatform.log_params(exp_params)\r\n\r\n# create job\r\njob = aiplatform.CustomJob(\r\n display_name=DISPLAY_NAME, \r\n worker_pool_specs=WORKER_SPEC,\r\n staging_bucket=BUCKET_URI,\r\n base_output_dir=BASE_OUTPUT_DIR\r\n)\r\n\r\n#run job\r\njob.run(\r\n service_account=SERVICE_ACCOUNT,\r\n tensorboard=TENSORBOARD_RESOURCE_NAME\r\n)\r\n\r\n# log metrics to the experiment\r\nmetrics_df = pd.read_json(metrics_path, typ='series')\r\naiplatform.log_metrics(metrics_df[metrics_cols].to_dict())\r\n\r\n# stop the run\r\naiplatform.end_run()"), (u'language', u'lang-py'), (u'caption', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e5c4f8840d0&amp;gt;)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;The SDK supports a handy &lt;a href="https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_get_experiment_df"&gt;&lt;code&gt;get_experiment_df&lt;/code&gt;&lt;/a&gt; method to return experiment run information as a Pandas dataframe. Using this dataframe, we can now effectively compare performance between different experiment configurations:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="8 PyTorch training 121522.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/8_PyTorch_training_121522.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Since the experiment is backed with TensorBoard using Vertex AI TensorBoard, you can access TensorBoard from the console and do a deeper analysis. For the experiment, we modified training code to add TensorBoard scalars with metrics that we were interested in.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="9 PyTorch training 121522.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/9_PyTorch_training_121522.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h2&gt;Conclusion&lt;/h2&gt;&lt;p&gt;In this post, we demonstrated how PyTorch training could be input bound when data is read from Google Cloud Storage and showed approaches to improve performance by comparing distributed training strategies and introducing WebDataset format.&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Use WebDataset to shard individual files which can improve sequential I/O performance by reducing network bottlenecks. &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;When training on multiple GPUs, choose &lt;code&gt;DistributedDataParallel&lt;/code&gt; or &lt;code&gt;FullyShardedDataParallel&lt;/code&gt; distributed training strategies for better performance. &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;For large-scale datasets you cannot download to the local disk. Use &lt;code&gt;gcsfuse&lt;/code&gt; to simplify implementation of data access to Cloud Storage from Vertex AI and use WebDataset to shard individual files reducing network overhead. &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Vertex AI improves productivity when carrying out experiments while offering flexibility, security and control. Vertex AI Training custom jobs make it easy to run experiments with several training configurations, GPU shapes and machine specs. Combined with Vertex AI Experiments and Vertex AI TensorBoard, you can track parameters, visualize and compare the performance metrics of your model and pipeline experiments.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;You can find the accompanying code for this blog post on this &lt;a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/main/community-content/pytorch_efficient_training" target="_blank"&gt;GitHub Repo&lt;/a&gt;.&lt;br/&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Thu, 15 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/ai-machine-learning/efficient-pytorch-training-with-vertex-ai/</guid><category>Google Cloud</category><category>AI &amp; Machine Learning</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Efficient PyTorch training with Vertex AI</title><description>Introducing methods to improve the performance of PyTorch training with cloud data and integrates to these methods Vertex AI.</description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/ai-machine-learning/efficient-pytorch-training-with-vertex-ai/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Xiang Xu</name><title>Software Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Rajesh Thallam</name><title>Machine Learning Solutions Architect</title><department></department><company></company></author></item><item><title>Using Vertex AI to build an industry leading Peer Group Benchmarking solution</title><link>https://cloud.google.com/blog/products/ai-machine-learning/using-vertex-ai-for-peer-group-benchmarking-in-capital-markets/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;The modern world of financial markets is fraught with volatility and uncertainty. Market participants and members are rethinking the way they approach problems and rapidly changing the way they do business. Access to models, usage patterns, and data has become key to keeping up with ever evolving markets. &lt;/p&gt;&lt;p&gt;One of the biggest challenges firms face in futures and options trading is determining how they benchmark against their competitors. Market participants are continually looking for ways to improve performance, identifying what happened, why it happened, and any associated risks. Leveraging the latest technologies in automation and artificial intelligence, many organizations are using Vertex AI to build a solution around &lt;a href="https://www.investopedia.com/terms/p/peer-group.asp" target="_blank"&gt;peer group&lt;/a&gt; benchmarking and explainability. &lt;/p&gt;&lt;h2&gt;Introduction&lt;/h2&gt;&lt;p&gt;Using the speed and efficiency of Vertex AI, we have developed a solution that will allow market participants to identify similar trading group patterns and assess performance relative to their competition. Machine learning (ML) models for dimensionality reduction, clustering, and explainability are trained to detect patterns and transform data into valuable insights. This blog post goes over these models in detail, as well as the ML operations (MLOps) pipeline used to train and deploy these models at scale.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="1 - Introduction Image.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_-_Introduction_Image.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;A series of successive models are used that feed predictive results as training data into the next model (e.g. dimensionality reduction -&amp;gt; clustering -&amp;gt; explainability). This requires a robust automated system for training and maintaining models and data, and provides an ideal use case for the MLOps capabilities of Vertex AI. &lt;/p&gt;&lt;h2&gt;The Solution&lt;/h2&gt;&lt;h3&gt;Data&lt;/h3&gt;&lt;p&gt;A market analytics dataset was used which contains market participant trading metrics aggregated and averaged across a 3 month period. This dataset contains a high number of dimensions. Specific features include buying and selling counts, trade and order quantities, types, first and last fill times, aggressive vs. passive trading indicators, and a number of other features related to trading behavior.&lt;/p&gt;&lt;h3&gt;Modeling&lt;/h3&gt;&lt;p&gt;&lt;b&gt;Dimensionality Reduction&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Clustering in high dimensional space presents a challenge, particularly for distance-based clustering algorithms. As the number of dimensions grows, the distance between all points in the dataset converge and become more similar. This distance concentration problem makes it difficult to perform typical cluster analysis on highly dimensional data. &lt;/p&gt;&lt;p&gt;For the task of dimensionality reduction, an Artificial Neural Network (ANN) Autoencoder was used to learn a supervised similarity metric for each market participant in the dataset. This autoencoder takes in each market participant and their associated features. It pushes the information through a hidden layer that is constrained in size, forcing the network to learn how to condense information down into a small encoded representation.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 "&gt;&lt;img alt="2 - Dimensionality Reduction.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_-_Dimensionality_Reduction.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;The constrained layer is a vector (z) in latent space, where each element in the vector is a learned reduction of the original market participant features (X); thus, allowing dimensionality reduction by simply applying X * z. This results in a new distribution of customer data q(X’ | X) where the distribution is constrained in size to the shape of z. By minimizing the reconstruction error between the initial input X and the autoencoder’s reconstructed output X’ we can balance the overall size of the similarity space (the number of latent dimensions) and the amount of information lost.&lt;/p&gt;&lt;p&gt;The resulting output of the autoencoder is a 2-dimensional learned representation of the highly dimensional data.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Clustering&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Experiments were conducted to determine the optimal clustering algorithm, number of clusters, and hyperparameters. A number of models were compared, including density-based spatial clustering of applications with noise (DBSCAN), agglomerative clustering, gaussian mixture model (GMM), and k-means. Using &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html" target="_blank"&gt;silhouette score&lt;/a&gt; as an evaluation criterion, it was ultimately determined that k-means performed best for clustering on the dimensionally reduced data. &lt;/p&gt;&lt;p&gt;The k-means algorithm is an iterative refinement technique that aims to separate data points into n groups of equal variance. Each of these groups are defined by a cluster centroid, which is the mean of the data points in the cluster. Cluster centroids are initially randomly generated, and iteratively reassigned until the within-cluster sum-of-squares is minimized. Below: within-cluster sum-of-squares criteria.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="3 - Clustering.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_-_Clustering.1000064520000368.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;b&gt;Explainability&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/explainable-ai"&gt;Explainable AI&lt;/a&gt; (XAI) aims to provide insights into why a model predicts in a certain way. For this use case, XAI models are used to explain why a market participant was placed into a particular peer group. This is achieved through feature importance e.g. for each market participant, the top contributing factors towards a peer group cluster assignment. &lt;/p&gt;&lt;p&gt;Deriving explainability from clustering models is somewhat difficult. Clustering is an unsupervised learning problem, which means there are no labels or “ground truth” for the model to analyze. Distance-based clustering algorithms instead rely on creating labels for the data points based on their relative positioning to each other. These labels are assigned as part of the prediction by the k-means algorithm - each point in the dataset is given a peer group assignment that associates it with a particular cluster. &lt;/p&gt;&lt;p&gt;XAI models can be trained on top of k-means by fitting a classifier to these peer group cluster assignments. Using the cluster assignments as labels turns the problem into supervised learning, whereby the end goal is to determine feature importance for the classifier. Shapley values are used for feature importance, which explain the marginal contributions of each feature to the final classification prediction.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="4 - Explainability.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_-_Explainability.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Shapley values are ranked to provide market participants with a powerful tool to analyze what features are contributing the most to their peer group assignments.&lt;/p&gt;&lt;h3&gt;MLOps&lt;/h3&gt;&lt;p&gt;&lt;a href="https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf" target="_blank"&gt;MLOps&lt;/a&gt; is an ML engineering culture and practice that aims to unify ML system development (Dev) and ML system operation (Ops). Using Vertex AI, a fully functioning MLOps pipeline has been constructed that trains and explains peer group benchmarking models. This pipeline is complete with automation and monitoring at all steps of ML system construction, including integration, testing, releasing, deployment and infrastructure management. It also includes a comprehensive approach for continuous integration / continuous delivery (CI/CD). Vertex AI’s end-to-end platform was used to meet these MLOps needs, including:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Distributed training jobs to construct ML models at scale using Vertex AI Pipelines&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Hyperparameter tuning jobs to quickly tune complex models using Vertex AI Vizier&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Model versioning using Vertex AI Model Registry&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Batch prediction jobs using Vertex AI Prediction&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Tracking metadata related to training jobs using Vertex ML Metadata&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Tracking model experimentation using Vertex AI Experiments&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Storing and versioning training data from prediction jobs using Vertex AI Feature Store&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Data validation and monitoring using Tensorflow Data Validation (TFDV)&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The MLOps pipeline is broken down into 5 core areas:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;CI/CD &amp;amp; Orchestration&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Data Ingestion &amp;amp; Preprocessing&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Dimensionality Reduction&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Clustering&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Explainability&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="5 - MLOps.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_-_MLOps.1980162738603005.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;The CI/CD and orchestration layer was implemented using Vertex AI Pipelines, Cloud Source Repository (CSR), Artifact Registry, and Cloud Build. When changes are made to the code base, automatic Cloud Build Triggers are executed that run unit tests, build containers, push the containers to Artifact Registry, and compile and run the Vertex AI pipeline. &lt;/p&gt;&lt;br/&gt;&lt;p&gt;The pipeline is a sequence of connected components that run successive training and prediction jobs; the outputs from one model are stored in Vertex AI Feature Store and used as inputs into the next model. The end result of this pipeline is a series of trained models for dimensionality reduction, clustering, and explainability, all stored in Vertex AI Model Registry. Peer groups and explainable results are written to Feature Store and BigQuery respectively.&lt;/p&gt;&lt;h2&gt;Working with AI Services in Google Cloud’s Professional Services Organization (PSO)&lt;/h2&gt;&lt;p&gt;AI Services leads the transformation of enterprise customers and industries with cloud solutions. We are seeing widespread application of AI across Financial Services and Capital Markets. Vertex AI provides a unified platform for training and deploying models and helps enterprises more effectively make data driven decisions. You can learn more about our work at: &lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/"&gt;Google Cloud&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/vertex-ai"&gt;Vertex AI&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/consulting"&gt;Google Cloud consulting services&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;a href="https://services.google.com/fh/files/misc/artificial_intelligence_sheet.pdf" target="_blank"&gt;Custom AI-as-a-Service&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;hr/&gt;&lt;p&gt;&lt;i&gt;&lt;sup&gt;This post was edited with help from Mike Bernico, Eugenia Inzaugarat, Ashwin Mishra, and the rest of the delivery team. I would also like to thank core team members Rochak Lamba, Anna Labedz, and Ravinder Lota.&lt;/sup&gt;&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-related_article_tout"&gt;&lt;div class="uni-related-article-tout h-c-page"&gt;&lt;section class="h-c-grid"&gt;&lt;a class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker" data-analytics='{
                       "event": "page interaction",
                       "category": "article lead",
                       "action": "related article - inline",
                       "label": "article: {slug}"
                     }' href="https://gweb-cloudblog-publish.appspot.com/products/ai-machine-learning/google-cloud-vertex-ai-accelerates-machine-learning/"&gt;&lt;div class="uni-related-article-tout__inner-wrapper"&gt;&lt;p class="uni-related-article-tout__eyebrow h-c-eyebrow"&gt;Related Article&lt;/p&gt;&lt;div class="uni-related-article-tout__content-wrapper"&gt;&lt;div class="uni-related-article-tout__image-wrapper"&gt;&lt;div class="uni-related-article-tout__image" style="background-image: url('https://storage.googleapis.com/gweb-cloudblog-publish/images/applied_ml_summit.max-500x500.jpg')"&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="uni-related-article-tout__content"&gt;&lt;h4 class="uni-related-article-tout__header h-has-bottom-margin"&gt;Accelerate the deployment of ML in production with Vertex AI&lt;/h4&gt;&lt;p class="uni-related-article-tout__body"&gt;Google Cloud expands Vertex AI to help customers accelerate deployment of ML models into production.&lt;/p&gt;&lt;div class="cta module-cta h-c-copy uni-related-article-tout__cta muted"&gt;&lt;span class="nowrap"&gt;Read Article&lt;svg class="icon h-c-icon" role="presentation"&gt;&lt;use xlink:href="#mi-arrow-forward" xmlns:xlink="http://www.w3.org/1999/xlink"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/a&gt;&lt;/section&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Thu, 15 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/ai-machine-learning/using-vertex-ai-for-peer-group-benchmarking-in-capital-markets/</guid><category>Financial Services</category><category>Google Cloud</category><category>AI &amp; Machine Learning</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/aiml2022_PO1vxqJ.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Using Vertex AI to build an industry leading Peer Group Benchmarking solution</title><description>Leveraging the latest technologies in artificial intelligence, Vertex AI is being used to build an industry leading Peer Group Benchmarking solution.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/aiml2022_PO1vxqJ.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/ai-machine-learning/using-vertex-ai-for-peer-group-benchmarking-in-capital-markets/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sean Rastatter</name><title>AI Engineer</title><department></department><company></company></author></item><item><title>How Vodafone Hungary migrated their data platform to Google Cloud</title><link>https://cloud.google.com/blog/products/data-analytics/vodafone-hungary-data-platform-migration/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Vodafone is currently the second largest telecommunication company in Hungary, and recently  acquired UPC Hungary to extend its previous mobile services with fix portfolio. Following the acquisition, Vodafone Hungary serves approximately 3.8 million residential and business subscribers. This story is about how Vodafone Hungary benefited from moving its data and analytics platform to Google Cloud. &lt;/p&gt;&lt;p&gt;To support this acquisition, Vodafone Hungary went through a large business transformation that required changes in many IT systems to create a future-ready IT architecture. The goal of the transformation was to provide future-proof services for customers in all segments of the Hungarian mobile market. During this transformation, Vodafone’s core IT systems changed, which created the challenge of building a new data and analytics environment in a fast and effective way. During the project data had to be moved from the previous on-premises analytics service to the cloud. This was achieved by  migrating existing data and merging them with data coming from the new systems in a very short timeframe of  around six months.  During the project there were several changes in the source system data structure that needed to be adapted quickly on the analytics side to reach the Go Live date.&lt;/p&gt;&lt;h3&gt;Data and  analytics in Google Cloud&lt;/h3&gt;&lt;p&gt;To answer this challenge, Vodafone Hungary decided to partner with Google Cloud. The partnership was based on implementing a full metadata-driven analytics environment in a multi-vendor project using cutting edge Google Cloud solutions such as &lt;a href="https://cloud.google.com/data-fusion"&gt;Data Fusion&lt;/a&gt; and &lt;a href="https://cloud.google.com/bigquery?utm_source=google&amp;amp;utm_medium=cpc&amp;amp;utm_campaign=na-US-all-en-dr-bkws-all-all-trial-p-dr-1011347&amp;amp;utm_content=text-ad-none-any-DEV_c-CRE_622025513236-ADGP_Desk%20%7C%20BKWS%20-%20PHR%20%7C%20Txt%20~%20Data%20Analytics%20~%20BigQuery_Big%20Query-KWID_43700073023088462-kwd-333270004738&amp;amp;utm_term=KW_bigquery%20google-ST_bigquery%20google&amp;amp;gclid=CjwKCAjw2OiaBhBSEiwAh2ZSP3RnJNh0CSfGk_RxZUOYbjDSVfpf2VpJm7BkRqX3qsu4HD_yYtQ0qxoC8isQAvD_BwE&amp;amp;gclsrc=aw.ds"&gt;BigQuery&lt;/a&gt;. The Vodafone Hungary Data Engineering team gained significant knowledge of the new Google Cloud solutions, which meant the team was able to support the company’s long-term initiatives.&lt;/p&gt;&lt;p&gt;Based on data loaded by this metadata-driven framework, Vodafone Hungary built up a sophisticated data and analytics service on Google Cloud that helped it become a data-driven company.&lt;/p&gt;&lt;p&gt;By analyzing data from throughout the company with the help of Google Cloud, Vodafone was able to gain insights that provided a clearer picture of the business. They now have a holistic view of customers across all segments. &lt;/p&gt;&lt;p&gt;Along with these core KPIs, the advanced analytics and Big Data models built on the top of this data and analytics services ensures that customers get more personalized offers than was previously possible.. It used to be the case that a business requestor needed to define a project to send new data to the data warehouse. The new metadata-driven framework allows the internal data engineering team to onboard new systems and new data in a very short time (within days), thus speeding up the BI development and decision-making process.&lt;/p&gt;&lt;h3&gt;Technical solution&lt;/h3&gt;&lt;p&gt;The solution uses several technical innovations to meet the requirements of the business. The local data extraction solution is built on the top of the CDAP and Hadoop technologies written in CDAP pipelines, PySpark jobs, and Unix shell script. In this layer, the system gets data from several sources in several formats including database extracts and different file types. The system needs to manage around 1,900 loads on a daily basis, and most data arriving in a five-hour time frame. Therefore, the framework needs to be a highly scalable system that can handle the high loading peaks without generating unexpected cost during the low peaks.&lt;/p&gt;&lt;p&gt;Once collected, the data from the extraction layer goes to the cloud in an encrypted and anonymized format. In the cloud, the extracted data lands in a &lt;a href="https://cloud.google.com/storage"&gt;Google Cloud Storage&lt;/a&gt; bucket. By arriving at the file, it triggers the Data Fusion pipelines in an event-based way by using the Log Sink, &lt;a href="https://cloud.google.com/pubsub"&gt;Pub/Sub&lt;/a&gt;, &lt;a href="https://cloud.google.com/functions"&gt;Cloud Function&lt;/a&gt;, and REST API. After triggering the data load, &lt;a href="https://cloud.google.com/composer"&gt;Cloud Composer&lt;/a&gt; controls the execution of the metadata-driven, template-based, auto-generated DAGs. Data Fusion ephemeral clusters were chosen as they adapt to the size of each data pipeline while also controlling costs during low peaks. &lt;/p&gt;&lt;p&gt;The principle of limited liability is important. Each component has a relatively limited range of responsibilities, which means that Cloud Function, DAGs, and Pipelines contain the minimum responsibilities and logic that is necessary to finish their own tasks.&lt;/p&gt;&lt;p&gt;After loading this data into a raw layer, several tasks are triggered in Data Fusion to build up an historical aggregated layer. The Vodafone Hungary data team can use this to create their own reports in a Qlik environment (which also runs on the Google Cloud environment) and build up Big Data and advanced analytical models using the Vodafone standard Big Data framework. &lt;/p&gt;&lt;p&gt;The most critical point of the architecture is the custom triggering function, which handles scheduling and execution of processes. The process triggers more than 1,900 DAGs per day, while also moving and processing around 1 TB of anonymized data per day.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Vodafone Hungary 121422.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Vodafone_Hungary_121422.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;The way forward&lt;/h3&gt;&lt;p&gt;After the stabilization, the optimization of the processes started taking into account cost and efficiency levels. The architecture was upgraded to use Airflow 2 and Composer 2 as these systems became available. Moving the architecture to these versions increased performance and manageability. Going forward, Vodafone Hungary will continue searching for even more ways to improve processes with the help of the Google Support team. &lt;/p&gt;&lt;p&gt;To support fast and effective processing, Vodafone Hungary recently decided to move the control tables to Google &lt;a href="https://cloud.google.com/spanner"&gt;Cloud Spanner&lt;/a&gt; and keep only the business data in BigQuery. This delivered a great improvement in  processing.&lt;/p&gt;&lt;p&gt;In the analytics area, Vodafone Hungary plans to move to more advanced and cutting-edge technologies, which will allow the Big Data team to improve their performance by using Google Cloud native machine learning tools such as &lt;a href="https://cloud.google.com/automl"&gt;Auto ML&lt;/a&gt; and Vertex AI. These will further improve the effectiveness of the targeted campaigns and offer the benefit of advanced data analysis.&lt;/p&gt;&lt;p&gt;To get started, we recommend you check out &lt;a href="https://console.cloud.google.com/bigquery?_ga=2.979249.1030392842.1669588001-42349575.1669587954"&gt;BigQuery's free trial&lt;/a&gt; and &lt;a href="https://cloud.google.com/bigquery/docs/migration-assessment"&gt;BigQuery's Migration Assessment&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Wed, 14 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/vodafone-hungary-data-platform-migration/</guid><category>Google Cloud</category><category>Data Analytics</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/teleco_2022_B5LTQfV.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How Vodafone Hungary migrated their data platform to Google Cloud</title><description>How Vodafone Hungary migrated their data platform to Google Cloud.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/teleco_2022_B5LTQfV.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/vodafone-hungary-data-platform-migration/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Gergely Szalai</name><title>Senior Manager, Data Engineering, Vodafone</title><department></department><company></company></author></item><item><title>Carbon Health transforms operating outcomes with Connected Sheets for Looker</title><link>https://cloud.google.com/blog/products/data-analytics/connected-sheets-for-looker-powers-carbon-health-data-analytics/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Everyone wants affordable, quality healthcare but not everyone has it. A 2021 report by the Commonwealth Fund ranked the U.S. in last place among 11 high-income countries in healthcare access.1 Carbon Health is working to change that. We are doing so by combining the best of virtual care, in-person visits, and technology to support patients with their everyday physical and mental health needs.&lt;br/&gt;&lt;/p&gt;&lt;h3&gt;Rethinking how data and analytics are accessed at Carbon Health &lt;/h3&gt;&lt;p&gt;Delivering premium healthcare for the masses that's accessible and affordable is an ambitious undertaking. It requires a commitment to operating the business in an efficient and disciplined way. To meet our goals, our teams across the company require detailed, daily insights into operating results.&lt;/p&gt;&lt;p&gt;In the last year, we realized our existing BI platform was inaccessible to most of our employees outside of R&amp;amp;D. Creating the analytics, dashboards, and reports needed by our clinic leaders and executives required direct help from our data scientists. &lt;/p&gt;&lt;p&gt;However, this has all changed since deploying Looker as our new BI platform. We initially used Looker to build tables, charts, and graphs that improved how people could access and analyze data about our operating efficiency. As we continued to evaluate how our data and analytics should be experienced by our in-clinic staff, we learned about Connected Sheets for Looker, which has unlocked an entirely new way of sharing insights across the company.&lt;br/&gt;&lt;/p&gt;&lt;h3&gt;A new way to deliver performance reporting and drive results&lt;/h3&gt;&lt;p&gt;Connected Sheets for Looker gives Carbon Health employees who work in Google Sheets—practically everyone—a familiar tool for working with Looker data. For instance, one of our first outputs using the Connected Sheets integration has been a daily and weekly performance push-report for the clinic’s operating leaders, including providers. &lt;/p&gt;&lt;p&gt;Essentially a scorecard, the report tracks the most important KPIs for measuring clinics' successes, including appointment volume, patient satisfaction such as net promoter score (NPS), reviews, phone call answer rates, and even metrics about billing and collections. To provide easy access, we built a workflow through Google App Script that takes our daily performance report and automatically emails a PDF to key clinic leaders each morning. &lt;/p&gt;&lt;p&gt;Within the first 30 days of the report's creation, clinic leaders were able to drive noticeable improvements in operating results. For instance, actively tracking clinic volume has enabled us to manage our schedules more effectively, which in turn drives more visits and enables us to better communicate expectations with our patients. Other clinics have dramatically improved their call answer rates by tracking inbound call volume, which has also led to better patient satisfaction. &lt;/p&gt;&lt;h3&gt;Greater accountability, greater collaboration&lt;/h3&gt;&lt;p&gt;As you can imagine, a report that holds people accountable for outcomes in such a visible way can create some anxiety. We've eased those concerns by using the information constructively, with the goal to use reporting as a positive feedback mechanism to bolster open collaboration and identify operational processes that need improvement. For example, data about our call answer rates initiated an investigation that led to an operational redesign of how phones are deployed and managed at more than 120 clinics across the U.S.&lt;/p&gt;&lt;h3&gt;Looker as a scalable solution with endless applications&lt;/h3&gt;&lt;p&gt;We're now rolling out Connected Sheets for Looker to deliver performance push-reporting across all teams at Carbon Health. Additionally, we continue to find new ways to leverage Connected Sheets for Looker to meet other needs of the business. &lt;/p&gt;&lt;p&gt;For instance, we've recently been able to better understand our software costs by analyzing vendor spend from our accounting systems directly in Google Sheets. Going forward, this will allow us to build a basic workflow to monitor subscription spend and employee application usage, which will lead to us saving money on unnecessary licenses and underutilized software. &lt;/p&gt;&lt;p&gt;We've come a long way in the last year. Between Looker and its integration with Google Sheets, we can meet the data needs of all our stakeholders at Carbon Health. Connected Sheets for Looker has been an impactful solution that's going to help us drive measurable results in how we deliver premium healthcare to the masses.&lt;/p&gt;&lt;hr/&gt;&lt;p&gt;&lt;i&gt;&lt;sup&gt;1. &lt;a href="https://www.commonwealthfund.org/publications/fund-reports/2021/aug/mirror-mirror-2021-reflecting-poorly" target="_blank"&gt;Mirror, Mirror 2021: Reflecting Poorly&lt;/a&gt;&lt;br/&gt;2. &lt;a href="https://www.forbes.com/sites/katiejennings/2021/07/21/meet-the-immigrant-entrepreneurs-who-raised-350-million-to-rethink-us-primary-care/?sh=444c72572b2c" target="_blank"&gt; HEALTHCARE EDITORS' PICK Meet The Immigrant Entrepreneurs Who Raised $350 Million To Rethink U.S. Primary Care&lt;/a&gt;&lt;/sup&gt;&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-related_article_tout"&gt;&lt;div class="uni-related-article-tout h-c-page"&gt;&lt;section class="h-c-grid"&gt;&lt;a class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker" data-analytics='{
                       "event": "page interaction",
                       "category": "article lead",
                       "action": "related article - inline",
                       "label": "article: {slug}"
                     }' href="https://gweb-cloudblog-publish.appspot.com/products/infrastructure-modernization/analyze-your-looker-data-through-google-sheets/"&gt;&lt;div class="uni-related-article-tout__inner-wrapper"&gt;&lt;p class="uni-related-article-tout__eyebrow h-c-eyebrow"&gt;Related Article&lt;/p&gt;&lt;div class="uni-related-article-tout__content-wrapper"&gt;&lt;div class="uni-related-article-tout__image-wrapper"&gt;&lt;div class="uni-related-article-tout__image" style="background-image: url('https://storage.googleapis.com/gweb-cloudblog-publish/images/Blog-Banner_2880x1200_v12x-1.max-500x500.jpg')"&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="uni-related-article-tout__content"&gt;&lt;h4 class="uni-related-article-tout__header h-has-bottom-margin"&gt;Analyze Looker-modeled data through Google Sheets&lt;/h4&gt;&lt;p class="uni-related-article-tout__body"&gt;Connected Sheets for Looker brings modeled, trusted data into Google Sheets, enabling users to work in a way that is comfortable and conv...&lt;/p&gt;&lt;div class="cta module-cta h-c-copy uni-related-article-tout__cta muted"&gt;&lt;span class="nowrap"&gt;Read Article&lt;svg class="icon h-c-icon" role="presentation"&gt;&lt;use xlink:href="#mi-arrow-forward" xmlns:xlink="http://www.w3.org/1999/xlink"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/a&gt;&lt;/section&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Wed, 14 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/connected-sheets-for-looker-powers-carbon-health-data-analytics/</guid><category>Google Cloud</category><category>Application Modernization</category><category>Data Analytics</category><media:content url="https://storage.googleapis.com/gweb-cloudblog-publish/images/healthcare_2022_CWzvzU5.max-600x600.jpg" width="540" height="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Carbon Health transforms operating outcomes with Connected Sheets for Looker</title><description>Carbon Health, a hybrid healthcare provider, transforms operating outcomes with performance management reporting through Connected Sheets for Looker.</description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/healthcare_2022_CWzvzU5.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/connected-sheets-for-looker-powers-carbon-health-data-analytics/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Christoffer Prompovitch</name><title>Product Lead, Carbon Health</title><department></department><company></company></author></item><item><title>Minimal Downtime Migrations to Cloud Spanner with HarbourBridge 2.0</title><link>https://cloud.google.com/blog/topics/developers-practitioners/minimal-downtime-migrations-cloud-spanner-harbourbridge-20/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Spanner is a fully managed, strongly consistent and highly available database providing up to 99.999% availability. It is also very easy to create your Spanner instance and point your application to it. But what if you want to migrate your schema and data from another database to Cloud Spanner? The common challenges with database migrations are ensuring high throughput of data transfer, and high availability of your application with minimal downtime,  and all this needs to be enabled with a user-friendly migrations solution. &lt;/p&gt;&lt;p&gt;Today, we are excited to announce the launch of &lt;a href="https://github.com/cloudspannerecosystem/harbourbridge" target="_blank"&gt;HarbourBridge 2.0&lt;/a&gt; (Preview) - an easy to use open source migration tool, now with enhanced capabilities for schema and data migrations with minimal downtime.&lt;/p&gt;&lt;p&gt;This blog intends to demonstrate migration of schema and data for an application from MySQL to Spanner using HarbourBridge.&lt;/p&gt;&lt;h3&gt;About HarbourBridge&lt;/h3&gt;&lt;p&gt;&lt;a href="https://github.com/cloudspannerecosystem/harbourbridge" target="_blank"&gt;HarbourBridge&lt;/a&gt; is an easy to use open source tool, which gives you highly detailed schema assessments and recommendations and allows you to perform migrations with minimal downtime. It just lets you point, click and trigger your schema and data migrations. It provides a unified interface for the migration wherein it gives users the flexibility to modify the generated spanner schema and run end to end migration from a single interface. It provides the capabilities of editing table details like columns, primary key, foreign key, indexes, etc and provides insights on the schema conversion performance along with highlighting important issues and suggestions.&lt;/p&gt;&lt;h3&gt;What's new in HarbourBridge 2.0?&lt;/h3&gt;&lt;p&gt;With this recent launch, you can now do the following:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Perform end to end minimal downtime terabyte scale data migrations &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Get improved schema assessment and recommendations&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Experience ease of access with gCloud Integration &lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;We’ll experience the power of some of these cool new add-ons as we walk through the various application migration scenarios in this blog.&lt;/p&gt;&lt;h3&gt;Types of Migration&lt;/h3&gt;&lt;p&gt;Data migration with HarbourBridge is of 2 types:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Minimal Downtime &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Migration with downtime&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;Minimal Downtime is for real time transactions and incremental updates in business critical applications to ensure there is business continuity and very  minimal interruption.Migration with downtime is recommended only for POC’s/ test environment setups or applications which can take a few hours of downtime.&lt;/p&gt;&lt;h3&gt;Connecting HarbourBridge to source&lt;/h3&gt;&lt;p&gt;There are three ways to connect HarbourBridge to your source database:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Direct connection to Database - for minimal downtime and continuous data migration for a certain time period&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Data dump -  for a one time migration of the source database dump into Spanner &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Session file - to load from a previous HarbourBridge session&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;h3&gt;Migration components of HarbourBridge&lt;/h3&gt;&lt;p&gt;With HarbourBridge you can choose to migrate:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Schema-only &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Data-only &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Both Schema and Data &lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;The below image shows how at a high level, the various components involved behind the scenes for data migration:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Components of HarbourBridge" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_LXZ5vAo.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;To manage a low-downtime migration, HarbourBridge orchestrates the following processes for you. You only have to set up connection profiles from the HarbourBridge UI on the migration page, everything else is handled by Harbour Bridge under the hood:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;HarbourBridge sets up a &lt;a href="https://cloud.google.com/storage"&gt;Cloud Storage&lt;/a&gt; bucket to store incoming change events on the source database while the snapshot migration progresses&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;HarbourBridge sets up a datastream job to bulk load a snapshot of the data and stream incremental writes. &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;HarbourBridge sets up the &lt;a href="https://cloud.google.com/dataflow"&gt;Dataflow&lt;/a&gt; job to migrate the change events into Spanner, which empties the Cloud Storage bucket over time&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;Validate that most of the data has been copied over to Spanner, and then stop writing to the source database so that the remaining change events can be applied. This results in a short downtime while Spanner catches up to the source database. Afterward, the application can be cut over to use Spanner as the main database.&lt;/p&gt;&lt;h3&gt;The application&lt;/h3&gt;&lt;p&gt;The use case we have created to discuss to demonstrate this migration is an application that streams in live (near real-time) T20 cricket match data ball-by-ball and calculates the &lt;a href="https://en.wikipedia.org/wiki/Duckworth%E2%80%93Lewis%E2%80%93Stern_method" target="_blank"&gt;Duckworth Lewis&lt;/a&gt; Target Score (also known as the Par Score) for Team 2, second innings, in case the match is disrupted mid-innings due to rain or other circumstances. This is calculated using the famous Duckworth Lewis Stern (DLS) algorithm and gets updated for every ball in the second innings; that way we will always know what the winning target is, in case the match gets interrupted and is not continued thereafter. There are several scenarios in Cricket that use the DLS algorithm for determining the target or winning score. &lt;/p&gt;&lt;p&gt;&lt;b&gt;MySQL Database&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In this use case, we are using Cloud SQL for MySQL to house the ball by ball data being streamed-in. The DLS Target client application streams data into MySQL database tables, which will be migrated to Spanner. &lt;/p&gt;&lt;p&gt;&lt;b&gt;Application Migration Architecture&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In this migration, our source data is being sent in bulk and in streaming modes to the MySQL table which is the source of the Migration. Cloud Functions Java function simulates the ball by ball streaming and calculates the Duckworth Lewis Target Score, updates it to the baseline table. HarbourBridge reads from MySQL and writes (Schema and Data) into Cloud Spanner. &lt;/p&gt;&lt;p&gt;The below diagram represents the high level architectural overview of the migration process:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Architecture" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Untitled_design_16.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;b&gt;Note:&lt;/b&gt; In our case the streaming process is simulated with the data coming from a CSV into a landing table in MySQL which then streams match data by pushing row by row data to the baseline MySQL table. This is the table used for further updates and DLS Target calculations.&lt;/p&gt;&lt;h3&gt;Migrating MySQL to Spanner with HarbourBridge&lt;/h3&gt;&lt;p&gt;&lt;b&gt;Set up HarbourBridge &lt;/b&gt;&lt;/p&gt;&lt;p&gt;Run the following 2 gCloud commands on Google Cloud Console Cloud Shell:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Install the HarbourBridge component of gCloud by running:&lt;br/&gt;&lt;code&gt;gcloud components install HarbourBridge&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Start the HarbourBridge UI by running:&lt;br/&gt;&lt;code&gt;gcloud alpha spanner migration web&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;Your HarbourBridge application should be up and running:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of HarbourBridge page to set up source connection" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image14_xleaN19.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;b&gt;Note&lt;/b&gt;: &lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Before proceeding with the migration, remember to enable the &lt;a href="https://cloud.google.com/datastream/docs/use-the-datastream-api#enable_the_api"&gt;DataStream&lt;/a&gt; and &lt;a href="https://cloud.google.com/dataflow"&gt;Dataflow&lt;/a&gt; &lt;a href="https://cloud.google.com/endpoints/docs/openapi/enable-api#:~:text=Click%20the%20API%20you%20want,about%20the%20API%2C%20click%20Enable."&gt;API&lt;/a&gt; from Google Cloud Console&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Ensure you have &lt;a href="https://cloud.google.com/sql/docs/mysql/create-manage-databases"&gt;Cloud SQL for MySQL&lt;/a&gt; or your own MySQL server created for the source and Spanner &lt;a href="https://cloud.google.com/spanner/docs/create-manage-instances"&gt;instance&lt;/a&gt; created for the target&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Ensure all source database instance objects are created. For access to the DB DDLs, DMLs and the data CSV file refer to this git repo &lt;a href="https://github.com/AbiramiSukumaran/harbourbridge-dls-spanner/tree/main/MySQL" target="_blank"&gt;folder&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;For data validation (post-migration step) SELECT queries for both source and Spanner, refer to this git repo &lt;a href="https://github.com/AbiramiSukumaran/harbourbridge-dls-spanner/tree/main/Data%20Validation" target="_blank"&gt;folder&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Ensure Cloud Functions is created and deployed (for Streaming simulation and DLS Target score calculation). For the source code, refer to the git repo &lt;a href="https://github.com/AbiramiSukumaran/harbourbridge-dls-spanner/tree/main/Cloud%20Functions%20Project" target="_blank"&gt;folder&lt;/a&gt;. You can learn how to deploy a Java function to Cloud Functions &lt;a href="https://cloud.google.com/functions/docs/create-deploy-gcloud#deploying_the_function"&gt;here&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Also note that your proxy is set up and running when trying to connect to the source from HarbourBridge. If you are using Cloud SQL for MySQL, you can ensure that proxy is running by executing the following command in Cloud Shell:&lt;br/&gt;&lt;i&gt;./cloud_sql_proxy -instances=&amp;lt;&amp;lt;Project-id:Region:instance-name&amp;gt;&amp;gt;=tcp:&amp;lt;&amp;lt;3306&amp;gt;&amp;gt;&lt;/i&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br/&gt;&lt;p&gt;&lt;b&gt;Connect to the source&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Of the 3 modes of connecting to source, we will use the “Connect to database” method to get the connection established with source:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot selecting “Connect to database” option" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image6_KM8MQbt.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;i&gt;Provide the connection credentials and hit connect:&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot with connection details entered" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image12_w5fmNSB.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;You are now connected to the source and HarbourBridge will land you on the next step of migration.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Schema Assessment and Configuration&lt;/b&gt;&lt;/p&gt;&lt;p&gt;At this point, you get to see both the source (MySQL) version of the schema and the target draft version of the “Configure Schema” page. The Target draft version is the workspace for all edits you can perform on the schema on  your destination database, that is, Cloud Spanner.&lt;/p&gt;&lt;p&gt;&lt;br/&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of HarbourBridge “Configure Schema” page" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image13_dgUrUpK.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;HarbourBridge provides you with comprehensive assessment results and recommendations for improving the schema structure and performance. &lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;As you can see in this image above, the icons to the left of table represent the complexity of table conversion changes as part of the schema migration&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;In this case, the STD_DLS_RESOURCE table requires high complexity conversion changes whereas the other ones require minimal complexity changes&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;The recommendation on the right provides information about the storage requirement of specific columns and there other warnings indicated with the columns list as well&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;You have the ability to make changes to the column types at this point &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Primary Key, Foreign Key, Interleaving tables, indexes and other dependencies related changes and suggestions are also available&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Once changes are made to the schema, HarbourBridge gives you the ability to review the DDL and confirm changes&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Once you confirm the schema changes are in effect before triggering the migration&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of HarbourBridge “Review the DDL changes” popup" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image11_NFuD1CY.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Schema changes are saved successfully.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Prepare Migration&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Click the “Prepare Migration” button on the top right corner of the HarbourBridge page.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of HarbourBridge “Prepare Migration” page" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image9_YbKpyCI.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;1. Select Migration Mode as “Schema and Data”&lt;br/&gt;2. Migration Type as “Minimal Downtime Migration”&lt;br/&gt;3. Set up Target Cloud Spanner Instance&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of HarbourBridge, Prepare Migration, “Target Details” page" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image4_uvQZkzS.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;b&gt;NOTE&lt;/b&gt;: HarbourBridge UI supports only Google SQL dialect as a Spanner destination today. Support for PostgreSQL dialect will be added soon.&lt;/p&gt;&lt;p&gt;4. Set up Source Connection profile&lt;/p&gt;&lt;p&gt;This is your connection to the MySQL data source. Ensure, you have the IP Addresses displayed on the screen allow-listed by your source.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of Prepare Migration “Source Connection Profile” popup" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image2_nkIqyj5.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;5. Set up Target Connection profile&lt;/p&gt;&lt;p&gt;This is the connection to your Datastream job destination which is the Cloud Storage. Please select the instance and make sure you have allow-listed the necessary access.&lt;/p&gt;&lt;p&gt;Once done, hit Migrate at the bottom of the page and wait for the migration to start. HarbourBridge takes care of everything else, including setting up the Datastream and Dataflow jobs and executing them under the hood. You have the option to set this up on your own. But that is not necessary now with the latest launch of HarbourBridge.&lt;br/&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of HarbourBridge “Schema migration completed successfully” message" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image17_WhGr5SS.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Wait until you see the message “Schema migration completed successfully” on the same page. Once you see that, head over to your Spanner database to validate the newly created (migrated) schema.&lt;/p&gt;&lt;br/&gt;&lt;p&gt;&lt;b&gt;Validate Schema and Initial Data&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Connect to the Spanner instance, and head over to the database “cricket_db”. You should see the tables and rest of schema migrated over to the Spanner database:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of Cloud Spanner “Overview” page to validate the schema migration" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image10_aEDRUH4.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;b&gt;Set up Streaming Data&lt;/b&gt;&lt;/p&gt;&lt;p&gt;As part of the setup, after the initial data is migrated, trigger the Cloud Functions job to kickstart data streaming into My SQL.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Validate Streaming Data&lt;/b&gt;&lt;br/&gt;&lt;/p&gt;&lt;p&gt;Check if the streaming data is eventually migrating into Spanner as the streaming happens.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of Cloud Functions Trigger page with HTTPS URL for the function" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image5_OkwUvPX.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;The Cloud Functions (Java Function) can be triggered by hitting the HTTPS URL in the Trigger section of the function’s detail page. Once the streaming starts, you should see data flowing into MySQL and the Target DLS score for Innings 2 getting updated in the DLS table.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of MySQL query result to see source data and DLS target score" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image15_nXvbDIh.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;In the above image, you can see the record count go from 1705 to 1805 with the streaming. Also, the DLS Target field has a calculated value of 112 for the most recent ball.&lt;/p&gt;&lt;p&gt;Now let’s check if the Spanner database table got the updates in migration. Go to the Spanner table and query:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of Cloud Spanner “Query” page to validate data migration" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image3_NOTt2N3.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;As you can see, Spanner has records increasing as part of migration as well. &lt;/p&gt;&lt;p&gt;Also note the change in Target score field value ball after ball:&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of Cloud Spanner “Query” page to validate data migration for Target Score" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image16_TfWFscA.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Wait until you see all the changes migrated over.&lt;/p&gt;&lt;p&gt;For data validation, you can use &lt;a href="https://github.com/GoogleCloudPlatform/professional-services-data-validator" target="_blank"&gt;DVT&lt;/a&gt; (Data Validation Tool), which is a  standardized data validation method built by Google, and can be incorporated into existing GCP tools and technologies. In our use case, I validated the migration of the initial set of records from MySQL source to Spanner table using Cloud Spanner queries. &lt;/p&gt;&lt;p&gt;&lt;b&gt;End the Migration&lt;/b&gt;&lt;/p&gt;&lt;p&gt;When you complete all these validation steps, click End Migration. Follow the below steps to update your application to point to Spanner database:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Stop writes to the source database - &lt;b&gt;This will initiate a period of downtime&lt;/b&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Wait for any other incremental writes to Spanner to catch up with the source&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Once you are sure source and Spanner are in sync, update the application to point to Spanner&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Start your application with Spanner as the database&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Perform smoke tests to ensure all scenarios are working&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Cutover the traffic to your application with Spanner as the database&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;This marks the end of the downtime period&lt;/b&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Screenshot of HarbourBridge “End Migration” pop up with “Clean Up” button" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image7_OoXoTc1.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;b&gt;Clean Up &lt;/b&gt;&lt;/p&gt;&lt;p&gt;Finally hit the “Clean Up” button on the End Migration popup screen. This will remove the migration jobs and dependencies that were created in the process.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Watch the migration in action&lt;/b&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-video"&gt;&lt;div class="article-module article-video "&gt;&lt;figure&gt;&lt;a class="h-c-video h-c-video--marquee" data-glue-modal-disabled-on-mobile="true" data-glue-modal-trigger="uni-modal-vBTlF2I2NwM-" href="https://youtube.com/watch?v=vBTlF2I2NwM"&gt;&lt;img alt="Minimal Downtime Migrations to Spanner with HarbourBridge 2.0" src="//img.youtube.com/vi/vBTlF2I2NwM/maxresdefault.jpg"/&gt;&lt;svg class="h-c-video__play h-c-icon h-c-icon--color-white" role="img"&gt;&lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/a&gt;&lt;figcaption class="article-video__caption h-c-page"&gt;&lt;h4 class="h-c-headline h-c-headline--four h-u-font-weight-medium h-u-mt-std"&gt;Minimal Downtime Migrations to Spanner with HarbourBridge 2.0&lt;/h4&gt;&lt;/figcaption&gt;&lt;/figure&gt;&lt;/div&gt;&lt;div class="h-c-modal--video" data-glue-modal="uni-modal-vBTlF2I2NwM-" data-glue-modal-close-label="Close Dialog"&gt;&lt;a class="glue-yt-video" data-glue-yt-video-autoplay="true" data-glue-yt-video-height="99%" data-glue-yt-video-vid="vBTlF2I2NwM" data-glue-yt-video-width="100%" href="https://youtube.com/watch?v=vBTlF2I2NwM" ng-cloak=""&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Next Steps&lt;/h3&gt;&lt;p&gt;As you walked through this migration with us, you would have noticed how easy it is to point to your database, assess and modify your schema based on recommendations, and migrate your schema, your data, or both to Spanner with minimal downtime.&lt;/p&gt;&lt;p&gt;You can learn more about HarbourBridge on the &lt;a href="https://github.com/cloudspannerecosystem/harbourbridge/blob/master/README.md" target="_blank"&gt;README&lt;/a&gt;, and learn to install gCloud &lt;a href="https://cloud.google.com/spanner/docs/getting-started/set-up"&gt;here&lt;/a&gt;. &lt;/p&gt;&lt;h3&gt;Get started today&lt;/h3&gt;&lt;p&gt;Spanner’s unique architecture allows it to scale horizontally without compromising on the consistency guarantees that developers rely on in modern relational databases. Try out Spanner today for &lt;a href="https://youtu.be/mTzmAa9L7Oc" target="_blank"&gt;free for 90 days&lt;/a&gt; or for as low as &lt;a href="https://youtu.be/m3mbOgjqQ7k" target="_blank"&gt;$65 USD per month&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Wed, 14 Dec 2022 13:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/topics/developers-practitioners/minimal-downtime-migrations-cloud-spanner-harbourbridge-20/</guid><category>Cloud Migration</category><category>Google Cloud</category><category>Developers &amp; Practitioners</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Minimal Downtime Migrations to Cloud Spanner with HarbourBridge 2.0</title><description>We're demonstrating migration of schema and data for an application from MySQL to Cloud Spanner using HarbourBridge.</description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/developers-practitioners/minimal-downtime-migrations-cloud-spanner-harbourbridge-20/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Abirami Sukumaran</name><title>Developer Advocate, Google</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Mohit Gulati</name><title>Product Manager, Google</title><department></department><company></company></author></item><item><title>Using budgets to automate cost controls</title><link>https://cloud.google.com/blog/topics/developers-practitioners/using-budgets-automate-cost-controls/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;b&gt;TL;DR - Budgets can do more than just track costs! You can set up automated cost controls using programmatic budget notifications, and we have an interactive walkthrough with sample architecture to help get you started.&lt;/b&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="budget controls" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_m3O8DmY.max-1000x1000.png"/&gt;&lt;figcaption class="article-image__caption "&gt;&lt;div class="rich-text"&gt;Budgets can help you answer cost questions, and so much more!&lt;/div&gt;&lt;/figcaption&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;There's a few blog posts on &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/protect-your-google-cloud-spending-budgets"&gt;what Google Cloud Budgets are&lt;/a&gt; and how to use them for more than just sending emails by &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/costs-meet-code-programmatic-budget-notifications"&gt;using programmatic budget notifications&lt;/a&gt;. These are important steps to take when using Google Cloud, so you can accurately ask and answer questions about your costs and get meaningful answers in the systems you already use. As your cloud usage grows and matures, you may also need to be more proactive in dealing with your costs.&lt;/p&gt;&lt;h3&gt;More than just a budget&lt;/h3&gt;&lt;p&gt;To recap: budgets let you create a dynamic way of being alerted about your costs, such as getting emails when you've spent or are forecasted to spend a certain amount. When creating a budget, you can provide a fixed amount or you can have the amount based on the previous period, so you could set up a budget that alerts you if your spending has changed significantly in a monthly cadence. In addition, you can have budgets send data to Pub/Sub on a regular basis (programmatic budget notifications) that can be used however you'd like, such as &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/have-budget-notifications-come-your-favorite-comms-channels"&gt;sending messages to Slack&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Budgets that send out notifications are flexible enough to do just about anything, but that's also where things can become a bit tricky to set up. If you're monitoring the costs for a large company with a lot of cloud usage, that could involve multiple environments with lots of products being used in different ways. Being informed about the costs is a good starting point, but you'll likely want to set up automated cost controls to protect yourself and your cloud spending.&lt;/p&gt;&lt;p&gt;In essence, setting up automated cost controls is the same as using programmatic budget notifications: the budget occasionally sends out a Pub/Sub message, and you create a Cloud Function (or similar) subscriber that receives that message and runs some code. Of course, the specifics of that code might be anything and will heavily depend on your business logic needs, ranging from sending a text message all the way to shutting down cloud resources. While the specifics are up to you, we made a few things to make getting started easier!&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-aside"&gt;&lt;dl&gt;&lt;dt&gt;aside_block&lt;/dt&gt;&lt;dd&gt;[StructValue([(u'title', u'Get started with building a cost-enforcement solution'), (u'body', &amp;lt;wagtail.wagtailcore.rich_text.RichText object at 0x3e954823ae50&amp;gt;), (u'btn_text', u'Try it out!'), (u'href', u'https://console.cloud.google.com/?walkthrough_id=billing--budget--cost_enforcement'), (u'image', None)])]&lt;/dd&gt;&lt;/dl&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Show me the way&lt;/h3&gt;&lt;p&gt;&lt;a href="https://console.cloud.google.com/?walkthrough_id=billing--budget--cost_enforcement"&gt;We've created an interactive walkthrough&lt;/a&gt; to help you with all of the steps needed in getting programmatic budget notifications up and running.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 "&gt;&lt;img alt="Pub/Sub" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image3_50qWYHA.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Following the walkthrough, you'll set up a budget, Pub/Sub topic, and Cloud Function that work together to respond to programmatic notifications. Not only will you get a sense of all the pieces involved, you can easily modify the code from the function for your specific purposes, so it serves as a great starting point. That also leads to a question I've heard often: "This is great, but what code am I supposed to use?" And that is why we've expanded our walkthrough to include a full, one-click architecture deployment!&lt;/p&gt;&lt;h3&gt;It's like a sentry, but for your cloud costs&lt;/h3&gt;&lt;p&gt;&lt;a href="https://github.com/googlecloudplatform/deploystack-cost-sentry" target="_blank"&gt;Cost Sentry&lt;/a&gt;, powered by DeployStack, takes the next step in programmatic budget notifications and sets up all the pieces needed to create basic automated cost-enforcement, as well as some example architecture to test it on! In fact, the overall architecture isn't much more than just setting up the programmatic budget notifications alone, but it gives a good example of how that could work in a full environment. &lt;/p&gt;&lt;p&gt;This architecture will get deployed for you, along with the working code to handle a programmatic budget notification and interact with Compute Engine and Cloud Run.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 "&gt;&lt;img alt="Pub/Sub architecture" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image4_Rdyrl3X.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Both the walkthrough and deploying the Cost Sentry stack can be used as the starting point for a full automated cost-enforcement solution. With these samples, you'll want to take a look at the Cloud Function code that receives data from your budget, and how it interacts with the Google Cloud APIs to shut down resources. In this example, any Compute Engine instances or Cloud Run deployments that have been labeled with 'costsentry' will be shut-down/disabled when your budget exceeds the configured amount.&lt;/p&gt;&lt;p&gt;While this is a great solution for getting an automated cost-enforcement solution started, the hard part is probably in the next questions you'll need to answer for your use case. Questions like "What do I actually want to have happen when I hit my budget?" and "Will stopping all of these instances automatically have ramifications?" (spoiler alert: probably) are important ones to figure out when looking at the full scope of a cost-enforcement solution.&lt;/p&gt;&lt;p&gt;Setting up a full automated cost enforcement solution gives you the flexibility to customize your response to budget updates, such as sending higher-priority messaging as you get closer to your budget total, and taking action by shutting down services when you greatly exceed your budget. Any way that you want to build a solution, this is a great starting point!&lt;/p&gt;&lt;h3&gt;Go forth, and do&lt;/h3&gt;&lt;p&gt;This may seem like a lot, and I'm a big fan of the "crawl, walk, run" philosophy. If you're new to Google Cloud, get started by just setting up a budget for all of your costs. From there, you can work with programmatic budget notifications to start expanding how you use budgets. As you get more familiar with Google Cloud, you'll likely need to customize your cost controls and start with Cost Sentry to set up your automated cost-enforcement solution.&lt;/p&gt;&lt;p&gt;Check out the &lt;a href="https://console.cloud.google.com/?walkthrough_id=billing--budget--cost_enforcement"&gt;interactive walkthrough&lt;/a&gt; and &lt;a href="https://github.com/googlecloudplatform/deploystack-cost-sentry" target="_blank"&gt;Cost Sentry architecture&lt;/a&gt; to get started!&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-related_article_tout"&gt;&lt;div class="uni-related-article-tout h-c-page"&gt;&lt;section class="h-c-grid"&gt;&lt;a class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker" data-analytics='{
                       "event": "page interaction",
                       "category": "article lead",
                       "action": "related article - inline",
                       "label": "article: {slug}"
                     }' href="https://gweb-cloudblog-publish.appspot.com/topics/developers-practitioners/costs-meet-code-programmatic-budget-notifications/"&gt;&lt;div class="uni-related-article-tout__inner-wrapper"&gt;&lt;p class="uni-related-article-tout__eyebrow h-c-eyebrow"&gt;Related Article&lt;/p&gt;&lt;div class="uni-related-article-tout__content-wrapper"&gt;&lt;div class="uni-related-article-tout__image-wrapper"&gt;&lt;div class="uni-related-article-tout__image" style="background-image: url('')"&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="uni-related-article-tout__content"&gt;&lt;h4 class="uni-related-article-tout__header h-has-bottom-margin"&gt;Costs meet code with programmatic budget notifications&lt;/h4&gt;&lt;p class="uni-related-article-tout__body"&gt;TL;DR - More than just alerts, budgets can also send notifications to Pub/Sub. Once they're in Pub/Sub, you can hook up all kinds of serv...&lt;/p&gt;&lt;div class="cta module-cta h-c-copy uni-related-article-tout__cta muted"&gt;&lt;span class="nowrap"&gt;Read Article&lt;svg class="icon h-c-icon" role="presentation"&gt;&lt;use xlink:href="#mi-arrow-forward" xmlns:xlink="http://www.w3.org/1999/xlink"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/a&gt;&lt;/section&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-related_article_tout"&gt;&lt;div class="uni-related-article-tout h-c-page"&gt;&lt;section class="h-c-grid"&gt;&lt;a class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker" data-analytics='{
                       "event": "page interaction",
                       "category": "article lead",
                       "action": "related article - inline",
                       "label": "article: {slug}"
                     }' href="https://gweb-cloudblog-publish.appspot.com/topics/developers-practitioners/protect-your-google-cloud-spending-budgets/"&gt;&lt;div class="uni-related-article-tout__inner-wrapper"&gt;&lt;p class="uni-related-article-tout__eyebrow h-c-eyebrow"&gt;Related Article&lt;/p&gt;&lt;div class="uni-related-article-tout__content-wrapper"&gt;&lt;div class="uni-related-article-tout__image-wrapper"&gt;&lt;div class="uni-related-article-tout__image" style="background-image: url('https://storage.googleapis.com/gweb-cloudblog-publish/images/cost_optimization.max-500x500.jpg')"&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="uni-related-article-tout__content"&gt;&lt;h4 class="uni-related-article-tout__header h-has-bottom-margin"&gt;Protect your Google Cloud spending with budgets&lt;/h4&gt;&lt;p class="uni-related-article-tout__body"&gt;Budgets are the first and simplest way to get a handle on your cloud spend. In this post, we break down a budget and help you set up aler...&lt;/p&gt;&lt;div class="cta module-cta h-c-copy uni-related-article-tout__cta muted"&gt;&lt;span class="nowrap"&gt;Read Article&lt;svg class="icon h-c-icon" role="presentation"&gt;&lt;use xlink:href="#mi-arrow-forward" xmlns:xlink="http://www.w3.org/1999/xlink"&gt;&lt;/use&gt;&lt;/svg&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/a&gt;&lt;/section&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Tue, 13 Dec 2022 15:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/topics/developers-practitioners/using-budgets-automate-cost-controls/</guid><category>Google Cloud</category><category>Developers &amp; Practitioners</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Using budgets to automate cost controls</title><description>Do even more with Google Cloud budgets by setting up automated cost controls</description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/developers-practitioners/using-budgets-automate-cost-controls/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Mark Mirchandani</name><title>Google Cloud Developer Advocate</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Terrence Ryan</name><title>Google Cloud Developer Advocate</title><department></department><company></company></author></item><item><title>Building out your support insights pipeline</title><link>https://cloud.google.com/blog/topics/developers-practitioners/building-out-your-support-insights-pipeline/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Getting into the details&lt;/h3&gt;&lt;p&gt;We wrote &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/how-spam-detection-taught-us-better-tech-support"&gt;previously&lt;/a&gt; about how we used clustering to connect requests for support (in text form) to the best tech support articles so we could answer questions faster and more efficiently. In a constantly changing environment (and in a very oddball couple of years) we wanted to make sure we're focused on preserving our people's productivity by isolating, understanding and responding to new support trends as fast as we can.&lt;/p&gt;&lt;p&gt;Now we'd like to get into a bit more detail about how we did all that and what went on behind the scenes of our process: &lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Supports pileline" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image2_ZZ0gk0v.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Extraction&lt;/h3&gt;&lt;p&gt;Google’s historical support ticket data and metadata are stored in BigQuery, as are the analysis results we generate from that data. We read and write that content using the &lt;a href="https://cloud.google.com/bigquery/docs/reference/rest"&gt;BigQuery API&lt;/a&gt;. However, much of these tickets contain information that is not useful to the ML pipeline and should not be included in the preprocessing and text modeling phases. For example, boilerplate generated from our case management tools must be stripped out using regex and other technologies in order to isolate the IT interaction between the tech and users. &lt;/p&gt;&lt;p&gt;Furthermore, once all boilerplate has been removed, we use part-of-speech tagging to isolate only the nouns within the interaction, since nouns themselves proved to be the best features for modeling an interaction and differentiating a topic. Any one interaction could have 100+ nouns depending on the complexity.  Using these nouns, we take one more step and use stemming and lemmatization to remove any suffix that may be placed on the noun (e.g., “computers” becomes “computer”). This allows for any modification of the root words to be modeled as the same feature and reduces noise in our clustering results.&lt;/p&gt;&lt;p&gt;Once each interaction is transformed into a set of nouns (and unique identifier), we can then move on to more advanced preprocessing techniques.&lt;/p&gt;&lt;h3&gt;Text Modeling&lt;/h3&gt;&lt;p&gt;To cluster the ticket set, it must first be converted into a robust feature space. The core technology underlying our featurization process is &lt;a href="https://www.tensorflow.org/tfx/transform/api_docs/python/tft/tfidf" target="_blank"&gt;TensorFlow transformers&lt;/a&gt;, which can be invoked using the &lt;a href="https://www.tensorflow.org/tfx/api_overview" target="_blank"&gt;TFX API&lt;/a&gt;. TensorFlow parses and annotates the tickets’ natural-language contents and these annotations, once normalized and filtered, form a sparse feature space. The &lt;a href="https://cloud.google.com/dlp"&gt;Cloud Data Loss Prevention (DLP)&lt;/a&gt; API redacts several categories of sensitive information — e.g., person names — from the tickets’ contents, which both mitigates privacy leakage and prunes low-relevance tokens from the feature space.&lt;/p&gt;&lt;p&gt;Although clustering can be performed against a sparse space, it is typically more effective if the space is densified to prune excessive dimensionality. We accomplish this using the &lt;a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf" target="_blank"&gt;term frequency-inverse document frequency (TF-IDF)&lt;/a&gt; statistical technique with a predefined maximum feature count – we also investigated more heavy-duty densification strategies using trained embedding models, but found that the quality improvements over TF-IDF were marginal for our use case, at the cost of a substantial reduction in human interpretability.&lt;/p&gt;&lt;h3&gt;Clustering&lt;/h3&gt;&lt;p&gt;The generated ticket feature set is partitioned into clusters using ClustOn. As this is an unsupervised learning problem, we arrived at the clustering process’s hyper-parameterization values via experimentation and human expert analysis. The trained parameters produced by the algorithm are persisted between subsequent runs of the pipeline in order to maintain consistent cluster IDs; this allows later operational systems to directly track and evaluate a cluster’s evolution over real time. &lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--small h-c-grid__col h-c-grid__col--2 h-c-grid__col--offset-5 "&gt;&lt;img alt="clustering" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image3_upI3dDx.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;The resulting cluster set is sanity-checked by some basic heuristic measures, such a &lt;a href="https://en.wikipedia.org/wiki/Silhouette_(clustering)" target="_blank"&gt;silhouette score&lt;/a&gt;, and then rejoined with the initial ticket data for analysis. Moreover, for privacy purposes, each cluster whose ticket cohort size falls below a predefined threshold is omitted from the data set; this ensures that cluster metadata in the output, such as feature data used to characterize the cluster, cannot be traced with high confidence back to individual tickets.&lt;/p&gt;&lt;h3&gt;Scoring &amp;amp; Anomaly Detection&lt;/h3&gt;&lt;p&gt;Once a cluster has been identified, we need a way to automatically estimate how likely it is that the cluster has recently undergone a state change which might indicate an incipient event, as opposed to remaining in a steady state. “Anomalous” clusters — i.e. those which exhibit a sufficiently high likelihood of an event — can be flagged for later operational investigation, while the rest can be disregarded.&lt;/p&gt;&lt;p&gt;Modeling a cluster’s behavior over time is done by distributing its tickets into a histogram according to their time of creation — using 24-hour buckets, reflecting the daily business cycle — and fitting a zero-inflated Poisson regression to the bucket counts using &lt;a href="https://www.statsmodels.org/stable/generated/statsmodels.discrete.count_model.ZeroInflatedPoisson.html" target="_blank"&gt;statsmodel&lt;/a&gt;&lt;sup&gt;1&lt;/sup&gt;. However, our goal is not just to characterize a cluster’s state, but to detect a discrete change in that state. This is accomplished by developing two models of the same cluster: one of its long-term behavior, and the other of its short-term behavior. The distinction between “long-term” and “short-term” can be as simple as partitioning the histogram’s buckets at some age threshold. But we chose a slightly more nuanced approach: both models are fitted to the entire histogram, but under two different weighting schemata; both decay exponentially by age, but at different rates, so that recent buckets are weighted relatively more heavily in the short-term model than the long-term one.&lt;/p&gt;&lt;p&gt;Both models are “optimized,” in that each achieves the maximum log-likelihood in its respective context. But if the long-term model is evaluated in the short-term context instead, its log-likelihood will show some amount of loss relative to the maximum achieved by the short-term model in the same context. This loss reflects the degree to which the long-term model fails to accurately predict the cluster’s short-term behavior — in other words, the degree to which the cluster’s short-term behavior deviates from the expectation established by its short-term behavior — and thus we refer to it as the &lt;b&gt;deviation score&lt;/b&gt;. This score serves as our key measure of anomaly; if it surpasses a defined threshold, the cluster is deemed anomalous.&lt;/p&gt;&lt;h3&gt;Operationalize&lt;/h3&gt;&lt;p&gt;Using the &lt;a href="https://developers.google.com/issue-tracker" target="_blank"&gt;IssueTracker API&lt;/a&gt;, bugs are auto-generated each time an anomalous cluster is detected. These bugs contain some summary of the tokens found within the cluster itself as well as a parameterized link to the &lt;a href="https://datastudio.google.com/u/0/" target="_blank"&gt;DataStudio dashboard&lt;/a&gt;. These dashboards show the size of the cluster over time, the deviation score and the underlying tickets. &lt;/p&gt;&lt;p&gt;These bugs are picked up by Techstop operations engineers and investigated to determine the root causes, allowing for quicker boots on the ground for any outages that may be occurring, as well as a more harmonious flow of data between support operations and change and incident management teams.&lt;/p&gt;&lt;p&gt;Staying within the IssueTracker product, operations engineers create Problem Records in a separate queue detailing the problem, stakeholders and any solution content. These problem records are shared widely with frontline operations to help address any ongoing issues or outages.&lt;/p&gt;&lt;p&gt;However, the secret sauce does not stop there. Techstop then uses Google's Cloud AutoML engine to train a supervised model to classify any incoming support requests against known Problem Records (IssueTracker bugs). This model acts as a service for two critical functions:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;The model is called by our Chrome extension (see &lt;a href="https://developer.chrome.com/docs/extensions/" target="_blank"&gt;this handy guide&lt;/a&gt;) to recommend Problem Records to frontline techs based on the current ongoing chat. For a company like Google that has a global IT team, this recommendation engine allows for coverage and visibility of issues in near real time&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;The model answers the “how big” question: Many stakeholders want to know how big the problem was, how many end users did this problem affect and so on. By training an AutoML model we can now give good estimators about impact and more importantly we can measure impact of project work that addresses these problems.&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;&lt;/p&gt;&lt;h3&gt;Resampling &amp;amp; User Journey Mapping&lt;/h3&gt;&lt;p&gt;Going beyond incident response, we then semi-automatically extracts user journeys from these trends by sampling each cluster to discover the proportion of user intents. These intents are then used to map user pitfalls and generate a sense of topic for each emerging cluster.&lt;/p&gt;&lt;p&gt;Since operations are constrained by tech evaluation time, a solution to limit the number of reviews necessary that each agent would need to inspect, while still maintaining the accuracy of analysis, was derived. &lt;/p&gt;&lt;p&gt;User intents are defined as user “Goals” an employee may have when engaging with IT support. For example, “I want my cell phone to boot” or "I lost access to an internal tool” are good examples. Therefore, we propose a two-step procedure (to be applied for each cluster).&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;First, we sample chats until the probability that we discover a new intent is small (say &amp;lt;5% or whatever number we want). We can evaluate this probability at each step through the Good-Turing method.&lt;br/&gt;A simple Good-Turing estimate of this probability can be found as E(1) / N, where N is the number of sampled chats so far and E(1) is approximately the number of intents that have only been seen once so far. This number should be lightly smoothed for better accuracy; it’s easy to implement this smoothing on our own&lt;sup&gt;2&lt;/sup&gt; or call a library.&lt;br/&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Once we have finished, we take the intents that we consider representative (say there are k of them) and create one additional category for “other intents.” Then, we estimate the sample size for multinomial estimation (with k+1 categories) that we still need to reach, given composition accuracy (say, that each intent fraction is within e.g., 0.1 or 0.2 of the actual fraction). To do so, we consider Thompson’s procedure&lt;sup&gt;3&lt;/sup&gt;, but take advantage of the data collected so far to be used as a plugin estimate for the possible values of the parameters, plus we should also consider a grid of parameter values within a confidence interval of the current plugin estimate, to be sufficiently conservative. The procedure is described on page 43 in this &lt;a href="https://www.jstor.org/stable/2684318" target="_blank"&gt;article,&lt;/a&gt; steps (1) and (2). The procedure is easy to implement and under our current setup, &lt;a href="https://colab.corp.google.com/drive/1rqB9M3Y7LlD-5AqE__zOC0nUzcnD_xk1?authuser=1#scrollTo=LwcWcuWS8oHY" target="_blank"&gt;it will be a few lines of code&lt;/a&gt;. &lt;br/&gt;&lt;br/&gt;The procedure gives us the target sample size. If we have already reached this sample size in step 1, we are done. Otherwise, we sample a few more chats to reach this sample size.&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;This work along with the AutoML model allows Google to understand not only the problem impact size, but also key information about user experiences and where the CUJ users are struggling the most. In many cases a problem record will contain multiple CUJs (user intents) with separate personas and root causes. &lt;/p&gt;&lt;h3&gt;Helping the business&lt;/h3&gt;&lt;p&gt;Once we can make good estimators for different user goals we can work with domain experts to map clear user journeys, i.e., we can now use the data that this pipeline has generated to construct a user journey in a bottoms-up approach. This same amount of work, sifting through data, aggregating similar cases and estimating proportions of user goals would take an entire team of engineers and case scrubbers. With this ML solution we can now get the same (if not better) results with much lower operational costs.&lt;/p&gt;&lt;p&gt;These user journeys then can be fed to internal dashboards for key decision makers to understand the health of their products and service areas. It allows for automated incident management and acts as a safeguard against unplanned changes or user-affecting changes that did not go through the proper change management processes. &lt;br/&gt;&lt;/p&gt;&lt;p&gt;Furthermore, it is critical for problem management and other core functions within our IT service. By having a small team of operational engineers reviewing the output of this ML pipeline, we can create healthy problem records and keep track of our team's top user issues.&lt;br/&gt;&lt;/p&gt;&lt;h3&gt;How do I do this too?&lt;/h3&gt;&lt;p&gt;Want to make your own system for insights into your support pipeline? Here's a recipe to follow that will help you build all the parts you need&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Load your data into BigQuery - &lt;a href="https://cloud.google.com/bigquery/?utm_source=google&amp;amp;utm_medium=cpc&amp;amp;utm_campaign=na-US-all-en-dr-skws-all-all-trial-e-dr-1009892&amp;amp;utm_content=text-ad-none-any-DEV_c-CRE_526598862412-ADGP_Desk%20%7C%20SKWS%20-%20EXA%20%7C%20Txt%20~%20Data%20Analytics%20~%20BigQuery_Big%20Query-KWID_43700060008413254-aud-388092988201%3Akwd-47616965283&amp;amp;utm_term=KW_bigquery-ST_bigquery&amp;amp;gclid=CjwKCAjwt8uGBhBAEiwAayu_9bnwUdRT1MTXqXcoEBfiLSLYjGg_2XCo9GJ6RIptVVha598jFjFi2RoCzA8QAvD_BwE&amp;amp;gclsrc=aw.ds"&gt;Cloud BigQuery&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Vectorize it with TF-IDF - &lt;a href="https://www.tensorflow.org/tfx/transform/api_docs/python/tft/tfidf" target="_blank"&gt;TensorFlow Vectorizer&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Perform clustering - &lt;a href="https://www.tensorflow.org/api_docs/python/tf/compat/v1/estimator/experimental/KMeans" target="_blank"&gt;TensorFlow Clustering&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Score Clusters - &lt;a href="https://www.statsmodels.org/stable/generated/statsmodels.discrete.count_model.ZeroInflatedPoisson.html" target="_blank"&gt;Statsmodels Poisson Regression&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Automate with Dataflow - &lt;a href="https://cloud.google.com/dataflow"&gt;Cloud DataFlow&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Operationalize - &lt;a href="https://developers.google.com/issue-tracker" target="_blank"&gt;IssueTracker API&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;hr/&gt;&lt;p&gt;&lt;sup&gt;1.&lt;/sup&gt; &lt;sup&gt;When modeling a cluster, that cluster’s histogram serves as the regression’s endogenous variable. Additionally, the analogous histogram of the entire ticket set, across all clusters, serves as an exogenous variable. The latter histogram captures the overall ebb and flow in ticket generation rates due to cluster-agnostic business cycles (e.g. rates tend to be higher on weekdays than weekends), and its inclusion mitigates the impact of such cycles on each cluster’s individual model.&lt;/sup&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;sup&gt;2. Gale, William A., and Geoffrey Sampson. &lt;a href="https://drive.google.com/open?id=1QBtItLGTBrdSUM37kTlQxTEipkt8y6GI" target="_blank"&gt;"Good‐turing frequency estimation without tears."&lt;/a&gt; Journal of quantitative linguistics 2.3 (1995): 217-237.&lt;/sup&gt;&lt;/p&gt;&lt;p&gt;&lt;sup&gt;3. Thompson, Steven K. &lt;a href="https://drive.google.com/open?id=1tXdI2sCyH0S_7qWD_dVPA4tcu_5Z8tDW" target="_blank"&gt;"Sample size for estimating multinomial proportions."&lt;/a&gt; The American Statistician 41.1 (1987): 42-46.&lt;/sup&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Mon, 12 Dec 2022 14:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/topics/developers-practitioners/building-out-your-support-insights-pipeline/</guid><category>AI &amp; Machine Learning</category><category>Google Cloud</category><category>Developers &amp; Practitioners</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Building out your support insights pipeline</title><description>Here's how we we used clustering to connect requests for support (in text form) to the best tech support articles so we could answer questions faster and more efficiently.</description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/developers-practitioners/building-out-your-support-insights-pipeline/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Nicholaus Jackson</name><title>Business Analyst</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Max Saltonstall</name><title>Developer Relations Engineer</title><department></department><company></company></author></item><item><title>How StreamNative facilitates integrated use of Apache Pulsar through Google Cloud</title><link>https://cloud.google.com/blog/products/data-analytics/streamnative-and-google-cloud-on-the-use-of-apache-pulsar/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;a href="https://streamnative.io/about/" target="_blank"&gt;StreamNative&lt;/a&gt;, a company founded by the original developers of &lt;a href="https://pulsar.apache.org/" target="_blank"&gt;Apache Pulsar&lt;/a&gt; and &lt;a href="https://bookkeeper.apache.org/" target="_blank"&gt;Apache BookKeeper&lt;/a&gt;, is partnering Google Cloud to build a streaming platform on open source technologies. We are dedicated to helping businesses generate maximum value from their enterprise data by offering effortless ways to realize real-time data streaming. Following the release of &lt;a href="https://streamnative.io/streamnativecloud/" target="_blank"&gt;StreamNative Cloud&lt;/a&gt; in August 2020, which provides scalable and reliable Pulsar-Cluster-as-a-Service, we introduced &lt;a href="https://streamnative.io/cloudforkafka/" target="_blank"&gt;StreamNative Cloud for Kafka&lt;/a&gt;. This is to enable a seamless switch between Kafka API and Pulsar. We then launched &lt;a href="https://streamnative.io/platform/" target="_blank"&gt;StreamNative Platform&lt;/a&gt; to support global event streaming data platforms in multi-cloud and hybrid-cloud environments.&lt;/p&gt;&lt;p&gt;By leveraging our fully-managed Pulsar infrastructure services, our enterprise customers can easily build their event-driven applications with Apache Pulsar and get real-time value from their data. There are solid reasons why Apache Pulsar has become one of the most popular messaging platforms in modern cloud environments, and we have strong beliefs in its capabilities of simplifying building complex event-driven applications. The most prominent benefits of using Apache Pulsar to manage real-time events include:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Single API&lt;/b&gt;: When building a complex event-driven application, it traditionally requires linking multiple systems to support queuing, streaming and table semantics. Apache Pulsar frees developers from the headache of managing multiple APIs by offering one single API that supports all messaging-related workloads.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Multi-tenancy&lt;/b&gt;: With the built-in multi-tenancy feature, Apache Pulsar enables secure data sharing across different departments with one global cluster. This architecture not only helps reduce infrastructure costs, but also avoids data silos.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Simplified application architecture&lt;/b&gt;: Pulsar clusters can scale to millions of topics while delivering consistent performance, which means that developers don’t have to restructure their applications when the number of topic-partitions surpasses hundreds. The application architecture can therefore be simplified.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Geo-replication&lt;/b&gt;: Apache Pulsar supports both synchronous and asynchronous geo-replication out-of-the-box, which makes building event-driven applications in multi-cloud and hybrid-cloud environments very easy.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Facilitating integration between Apache Pulsar and Google Cloud&lt;/h3&gt;&lt;p&gt;To allow our customers to fully enjoy the benefits of Apache Pulsar, we’ve been working on expanding the Apache Pulsar ecosystem by improving the integration between Apache Pulsar and powerful cloud platforms like Google Cloud. In mid-2022, we added &lt;a href="https://streamnative.io/blog/release/2022-6-24-announcing-the-google-cloud-pub-sub-connector-for-apache-pulsar/" target="_blank"&gt;Google Cloud Pub/Sub Connector for Apache Pulsar&lt;/a&gt;, which enables seamless data replication between &lt;a href="https://cloud.google.com/pubsub"&gt;Pub/Sub&lt;/a&gt; and Apache Pulsar, and &lt;a href="https://streamnative.io/blog/release/2022-8-3-announcing-the-google-cloud-bigquery-sink-connector-for-apache-pulsar/" target="_blank"&gt;Google Cloud BigQuery Sink Connector for Apache Pulsar&lt;/a&gt;, which synchronizes Pulsar data to &lt;a href="https://cloud.google.com/bigquery"&gt;BigQuery&lt;/a&gt; in real time, to the Apache Pulsar ecosystem.&lt;/p&gt;&lt;p&gt;Google Cloud Pub/Sub Connector for Apache Pulsar uses Pulsar IO components to realize fully-featured messaging and streaming between Pub/Sub and Apache Pulsar, which has its own distinctive features. Using Pub/Sub and Apache Pulsar at the same time enables developers to realize comprehensive data streaming features on their applications. However, it requires significant development effort to establish seamless integration between the two tools, because data synchronization between different messaging systems depends on the functioning of applications. When applications stop working, the message data cannot be passed on to the other system.&lt;/p&gt;&lt;p&gt;Our connector solves this problem by fully integrating with Pulsar’s system. There are two ways to import and export data between Pub/Sub and Pulsar. The first, is the Google Cloud Pub/Sub source that feeds data from Pub/Sub topics and writes data to Pulsar topics. Alternatively, the Google Cloud Pub/Sub sink can pull data from Pulsar topics and persist data to Pub/Sub topics. Using Google Cloud Pub/Sub Connector for Apache Pulsar brings three key advantages:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Code-free integration&lt;/b&gt;: No code-writing is needed to move data between Apache Pulsar and Pub/Sub.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;High scalability&lt;/b&gt;: The connector can be run on both standalone and distributed nodes, which allows developers to build reactive data pipelines in real time to meet operational needs.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Less DevOps resources required&lt;/b&gt;: The DevOps workloads of setting up data synchronization are greatly reduced, which translates into more resources to be invested in unleashing the value of data.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;By using the BigQuery Sink Connector for Apache Pulsar, organizations can write data from Pulsar directly to BigQuery. This is unlike before, where developers could only use &lt;a href="https://github.com/streamnative/pulsar-io-cloud-storage" target="_blank"&gt;Cloud Storage Sink Connector for Pulsar&lt;/a&gt; to move data to &lt;a href="https://cloud.google.com/storage"&gt;Cloud Storage&lt;/a&gt;, and then query the imported data with external tables in BigQuery which had many limitations,  including low query performance and no support for clustered tables.&lt;/p&gt;&lt;p&gt;Pulling data from Pulsar topics and persisting data to BigQuery tables, our BigQuery sink connector supports real-time data synchronization between Apache Pulsar and BigQuery. Just like our Pub/Sub connector, Google Cloud BigQuery Sink Connector for Apache Pulsar is a low-code solution that supports high scalability and greatly reduces DevOps workloads. Furthermore, our BigQuery connector possesses the Auto Schema feature, which automatically creates and updates BigQuery table structures based on the Pulsar topic schemas to ensure smooth and continuous data synchronization.&lt;/p&gt;&lt;h3&gt;Simplifying Pulsar resource management on Kubernetes&lt;/h3&gt;&lt;p&gt;All the products of StreamNative are built on Kubernetes, and we’ve been developing tools that can simplify resource management on Kubernetes platforms like &lt;a href="https://cloud.google.com/kubernetes-engine"&gt;Google Cloud Kubernetes&lt;/a&gt; (GKE). In August 2022, we introduced &lt;a href="https://streamnative.io/blog/release/2022-08-15-introducing-pulsar-resources-operator-for-kubernetes/" target="_blank"&gt;Pulsar Resources Operator for Kubernetes&lt;/a&gt;, which is an independent controller that provides automatic full lifecycle management for Pulsar resources on Kubernetes.&lt;/p&gt;&lt;p&gt;Pulsar Resources Operator uses manifest files to manage Pulsar resources, which allows developers to get and edit resource policies through the Topic &lt;a href="https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/" target="_blank"&gt;Custom Resources&lt;/a&gt; that render the full field information of Pulsar policies. It enables easier Pulsar resource management compared with using command line interface (CLI) tools, because developers no longer need to remember numerous commands and flags to retrieve policy information. Key advantages of using Pulsar Resources Operator for Kubernetes include:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Easy creation of Pulsar resources&lt;/b&gt;: By applying manifest files, developers can swiftly initialize basic Pulsar resources in their continuous integration (CI) workflows when creating a new Pulsar cluster.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Full integration with Helm&lt;/b&gt;: &lt;a href="https://helm.sh/" target="_blank"&gt;Helm&lt;/a&gt; is widely used as a package management tool in cloud-native environments. Pulsar Resource Operator can seamlessly integrate with Helm, which allows developers to manage their Pulsar resources through Helm templates.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="StreamNative 120922.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/StreamNative_120922.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;How you can contribute&lt;/h3&gt;&lt;p&gt;With the release of Google Cloud Pub/Sub Connector for Apache Pulsar, Google Cloud BigQuery Sink Connector for Apache Pulsar, and Pulsar Resources Operator for Kubernetes, we have unlocked the application potential of open tools like Apache Pulsar by making them simpler to build, easier to manage, and extended their capabilities. Now, developers can build and run Pulsar clusters more efficiently and maximize the value of their enterprise data. &lt;/p&gt;&lt;p&gt;These three tools are community-driven services and have their source codes hosted in the StreamNative GitHub repository. Our team welcomes all types of contributions for the evolution of our tools. We’re always keen to receive feature requests, bug reports and documentation inquiry through &lt;a href="https://github.com/streamnative/pulsar-io-google-pubsub/issues/new/choose" target="_blank"&gt;GitHub&lt;/a&gt;, &lt;a href="https://lists.apache.org/list.html?dev@pulsar.apache.org" target="_blank"&gt;emails&lt;/a&gt; or &lt;a href="https://twitter.com/streamnativeio" target="_blank"&gt;Twitter&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Fri, 09 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/streamnative-and-google-cloud-on-the-use-of-apache-pulsar/</guid><category>Google Cloud</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How StreamNative facilitates integrated use of Apache Pulsar through Google Cloud</title><description>StreamNative, a company founded by the original developers of Apache Pulsar and Apache BookKeeper, is partnering with Google Cloud to build a streaming platform on open source technologies.</description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/streamnative-and-google-cloud-on-the-use-of-apache-pulsar/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sijie Guo</name><title>Apache Pulsar PMC Member, Co-Founder and CEO of StreamNative</title><department></department><company></company></author></item><item><title>How to build comprehensive customer financial profiles with Elastic Cloud and Google Cloud</title><link>https://cloud.google.com/blog/products/data-analytics/build-comprehensive-customer-financial-profiles-with-elastic-cloud-and-google-cloud/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Financial institutions have vast amounts of data about their customers. However, many of them struggle to leverage data to their advantage. Data may be sitting in silos or trapped on costly mainframes. Customers may only have access to a limited quantity of data, or service providers may need to search through multiple systems of record to handle a simple customer inquiry. This creates a hazard for providers and a headache for customers. &lt;/p&gt;&lt;p&gt;Elastic and Google Cloud enable institutions to manage this information. Powerful search tools allow data to be surfaced faster than ever - Whether it's card payments, ACH (Automated Clearing House), wires, bank transfers, real-time payments, or another payment method. This information can be correlated to customer profiles, cash balances, merchant info, purchase history, and  other relevant information to enable the customer or business objective. &lt;/p&gt;&lt;p&gt;This reference architecture enables these use cases:&lt;/p&gt;&lt;p&gt;&lt;b&gt;1. Offering a great customer experience&lt;/b&gt;: Customers expect immediate access to their entire payment history, with the ability to recognize anomalies. Not just through digital channels, but through omnichannel experiences (e.g. customer service interactions).&lt;/p&gt;&lt;p&gt;&lt;b&gt;2. Customer 360&lt;/b&gt;: Real-time dashboards which correlates transaction information across multiple variables, offering the business a better view into their customer base, and driving efforts for sales, marketing, and product innovation.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="1 comprehensive customer financial profiles 120922.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_comprehensive_customer_financial_profile.max-1000x1000.jpg"/&gt;&lt;figcaption class="article-image__caption "&gt;&lt;div class="rich-text"&gt;&lt;i&gt;&lt;b&gt;Customer 360&lt;/b&gt;: The dashboard above looks at 1.2 billion bank transactions and gives a breakdown of what they are, who executes them, where they go, when and more. At a glance we can see who our wealthiest customers are, which merchants our customers send the most money to, how many unusual transactions there are - based on transaction frequency and transaction amount, when folks spend money and what kind spending and income they have.&lt;/i&gt;&lt;/div&gt;&lt;/figcaption&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;b&gt;3. Partnership management&lt;/b&gt;: Merchant acceptance is key for payment providers. Having better access to present and historical merchant transactions can enhance relationships or provide leverage in negotiations. With that, banks can create and monetize new services.&lt;/p&gt;&lt;p&gt;&lt;b&gt;4. Cost optimization&lt;/b&gt;: Mainframes are not designed for internet-scale access. Along-side with technological limitation, the cost becomes a prohibitive factor. While Mainframes will not be replaced any time sooner, this architecture will help to avoid costly access to data to serve new applications.&lt;/p&gt;&lt;p&gt;&lt;b&gt;5. Risk reduction&lt;/b&gt;: By standardizing on the Elastic Stack, banks are  longer limited in the number of data sources they can ingest. With this, banks can better respond to call center delays and potential customer-facing impacts like natural disasters. By deploying machine learning and alerting features, banks can detect and stamp out financial fraud before it impacts member accounts.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="2 comprehensive customer financial profiles 120922.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_comprehensive_customer_financial_profile.max-1000x1000.jpg"/&gt;&lt;figcaption class="article-image__caption "&gt;&lt;div class="rich-text"&gt;&lt;i&gt;&lt;b&gt;Fraud detection&lt;/b&gt;: The &lt;a href="https://www.elastic.co/what-is/elasticsearch-graph"&gt;Graph&lt;/a&gt; feature of Elastic helped a financial services company to identify additional cards that were linked via phone numbers and amalgamations of the original billing address on file with those two cards. The team realized that several credit unions, not just the original one where the alert originated from, were being scammed by the same fraud ring.&lt;/i&gt;&lt;/div&gt;&lt;/figcaption&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h2&gt;Architecture&lt;/h2&gt;&lt;p&gt;The following diagram shows the steps to move data from Mainframe to Google Cloud, process and enrich the data in BigQuery, then provide comprehensive search capabilities through Elastic Cloud.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="3 comprehensive customer financial profiles 120922.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_comprehensive_customer_financial_profile.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;This architecture includes the following components:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Move Data from Mainframe to Google Cloud&lt;/b&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Moving data from IBM z/OS to Google Cloud is straightforward with the &lt;a href="https://github.com/GoogleCloudPlatform/professional-services/tree/main/tools/bigquery-zos-mainframe-connector" target="_blank"&gt;Mainframe Connector&lt;/a&gt;, by following simple steps and defining configurations. The connector runs in z/OS batch job steps and includes a shell interpreter and JVM-based implementations of gsutil, bq and gcloud command-line utilities. This makes it possible to create and run a complete ELT pipeline from JCL, both for the initial batch data migration and ongoing delta updates.&lt;/p&gt;&lt;p&gt;A typical flow of the connector includes:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;Reading the mainframe dataset&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Transcoding the dataset to ORC&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Uploading ORC file to Cloud Storage&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Register ORC file as an external table or load as a native table&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Submit a Query job containing a MERGE DML statement to upsert incremental data into a target table or a SELECT statement to append to or replace an existing table&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;Here are the steps to install the BQ MainFrame Connector:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;p&gt;copy mainframe connector jar to unix filesystem on z/OS&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;copy BQSH JCL procedure to a PDS on z/OS&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;edit BQSH JCL to set site specific environment variables&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;Please refer to the &lt;a href="https://cloud.google.com/blog/products/data-analytics/a-simple-way-to-migrate-mainframe-data-to-the-cloud"&gt;BQ Mainframe connector blog&lt;/a&gt; for example configuration and commands.&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Process and Enrich Data in BigQuery&lt;/b&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/bigquery?utm_source=google&amp;amp;utm_medium=cpc&amp;amp;utm_campaign=na-US-all-en-dr-bkws-all-all-trial-e-dr-1011347&amp;amp;utm_content=text-ad-none-any-DEV_c-CRE_621957121377-ADGP_Desk%20%7C%20BKWS%20-%20EXA%20%7C%20Txt%20~%20Data%20Analytics%20~%20BigQuery_Big%20Query-KWID_43700073023085501-kwd-327307220781&amp;amp;utm_term=KW_gcp%20bigquery-ST_gcp%20bigquery&amp;amp;gclid=Cj0KCQiA37KbBhDgARIsAIzce14zp0ElbazcFfTROEdaXRU4GjF-xAEl_frGnil2TIYq4bXEUExBz68aAlnCEALw_wcB&amp;amp;gclsrc=aw.ds"&gt;BigQuery&lt;/a&gt; is a completely serverless and cost-effective enterprise data warehouse. Its serverless architecture lets you use SQL language to query and enrich Enterprise scale data. And its scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. An integrated BQML and BI Engine enables you to analyze the data and gain business insights. &lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Ingest Data from BQ to Elastic Cloud&lt;/b&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/dataflow"&gt;Dataflow&lt;/a&gt; is used here to ingest data from BQ to Elastic Cloud. It’s a serverless, fast, and cost-effective stream and batch data processing service. Dataflow provides an &lt;a href="https://cloud.google.com/dataflow/docs/guides/templates/provided-batch#bigquery-to-elasticsearch"&gt;Elasticsearch Flex Template&lt;/a&gt; which can be easily configured to create the streaming pipeline. This &lt;a href="https://www.elastic.co/blog/ingest-data-directly-from-google-bigquery-into-elastic-using-google-dataflow" target="_blank"&gt;blog from Elastic&lt;/a&gt; shows an example on how to configure the template.&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Cloud Orchestration from Mainframe&lt;/b&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;It's possible to load both BigQuery and Elastic Cloud entirely from a mainframe job, with no need for an external job scheduler.&lt;/p&gt;&lt;p&gt;To launch the Dataflow flex template directly, you can invoke the &lt;code&gt;gcloud dataflow flex-template run&lt;/code&gt; command in a z/OS batch job step.&lt;/p&gt;&lt;p&gt;If you require additional actions beyond simply launching the template, you can instead invoke the &lt;code&gt;gcloud pubsub topics publish&lt;/code&gt; command in a batch job step after your BigQuery ELT steps are completed, using the &lt;code&gt;--attribute&lt;/code&gt; option to include your BigQuery table name and any other template parameters. The pubsub message can be used to trigger any additional actions within your cloud environment.&lt;/p&gt;&lt;p&gt;To take action in response to the pubsub message sent from your mainframe job, create a &lt;a href="https://cloud.google.com/build/docs/automate-builds-pubsub-events"&gt;Cloud Build Pipeline with a pubsub trigger&lt;/a&gt; and include a Cloud Build Pipeline step that uses the &lt;a href="https://cloud.google.com/build/docs/cloud-builders#supported_builder_images_provided_by"&gt;gcloud builder&lt;/a&gt; to invoke &lt;code&gt;gcloud dataflow flex-template run&lt;/code&gt; and launch the template using the parameters copied from the pubsub message. If you need to use a custom dataflow template rather than the public template, you can use the &lt;a href="https://cloud.google.com/build/docs/cloud-builders#supported_builder_images_provided_by"&gt;git builder&lt;/a&gt; to checkout your code followed by the &lt;a href="https://cloud.google.com/build/docs/building/build-java#using_the_maven_image"&gt;maven builder to compile and launch a custom dataflow pipeline&lt;/a&gt;. Additional pipeline steps can be added for any other actions you require.&lt;/p&gt;&lt;p&gt;The pubsub messages sent from your batch job can also be used to trigger a &lt;a href="https://cloud.google.com/run/docs/tutorials/pubsub"&gt;Cloud Run service&lt;/a&gt; or a &lt;a href="https://cloud.google.com/eventarc/docs/gke/quickstart-pubsub"&gt;GKE service via Eventarc&lt;/a&gt; and may also be consumed directly by a Dataflow pipeline or any other application.&lt;/p&gt;&lt;h2&gt;Mainframe Capacity Planning&lt;/h2&gt;&lt;p&gt;CPU consumption is a major factor in mainframe workload cost. In the basic architecture design above, the Mainframe Connector runs on the JVM and runs on zIIP processor. Relative to simply uploading data to cloud storage, ORC encoding consumes much more CPU time. When processing large amounts of data it's possible to exhaust zIIP capacity and spill workloads onto GP processors. You may apply the following advanced architecture to reduce CPU consumption and avoid increased z/OS processing costs.&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Remote Dataset Transcoding on Compute Engine VM&lt;/b&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="4 comprehensive customer financial profiles 120922.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_comprehensive_customer_financial_profile.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;To reduce mainframe CPU consumption, ORC file transcoding can be delegated to a GCE instance. A gRPC service is included with the mainframe connector specifically for this purpose. Instructions for setup can be found in the mainframe connector documentation. Using remote ORC transcoding will significantly reduce CPU usage of the Mainframe Connector batch jobs and is recommended for all production level BigQuery workloads. Multiple instances of the gRPC service can be deployed behind a load balancer and shared by all Mainframe Connector batch jobs.&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Transfer Data via FICON and Interconnect&lt;/b&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="5 comprehensive customer financial profiles 120922.jpg" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_comprehensive_customer_financial_profile.max-1000x1000.jpg"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Google Cloud technology partners offer products to enable transfer of mainframe datasets via FICON and 10G ethernet to Cloud Storage. Obtaining a hardware FICON appliance and Interconnect is a practical requirement for workloads that transfer in excess of 500GB daily. This architecture is ideal for integration of z/OS and Google Cloud because it largely eliminates data transfer related CPU utilization concerns.&lt;/p&gt;&lt;hr/&gt;&lt;i&gt;&lt;sup&gt;We really appreciate Jason Mar from Google Cloud who provided rich context and technical guidance regarding the Mainframe Connector, and Eric Lowry from Elastic for his suggestions and recommendations, and the Google Cloud and Elastic team members who contributed to this collaboration.&lt;/sup&gt;&lt;/i&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Fri, 09 Dec 2022 17:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/build-comprehensive-customer-financial-profiles-with-elastic-cloud-and-google-cloud/</guid><category>BigQuery</category><category>Google Cloud</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How to build comprehensive customer financial profiles with Elastic Cloud and Google Cloud</title><description>Google Cloud and Elastic Cloud reference architecture for financial transaction search.</description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/build-comprehensive-customer-financial-profiles-with-elastic-cloud-and-google-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Yang Li</name><title>Staff Cloud Solutions Architect, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Dimitri Marx</name><title>Partner Solutions Architecture Lead, Elastic</title><department></department><company></company></author></item><item><title>Google’s Virtual Desktop of the Future</title><link>https://cloud.google.com/blog/topics/developers-practitioners/googles-virtual-desktop-future/</link><description>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;Did you know that most Google employees rely on virtual desktops to get their work done? This represents a paradigm shift in client computing at Google, and was especially critical during the pandemic and the remote work revolution. We’re excited to continue enabling our employees to be productive, &lt;i&gt;anywhere&lt;/i&gt;! This post covers the history of virtual desktops and details the numerous benefits Google has seen from their implementation. &lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Virtual Desktop- Inline image" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image3_6PhPZT5.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Background&lt;/h3&gt;&lt;p&gt;In 2018, Google began the development of virtual desktops in the cloud. A &lt;a href="https://research.google/pubs/pub47055/" target="_blank"&gt;whitepaper&lt;/a&gt; was published detailing how virtual desktops were created with Google Cloud, running on &lt;a href="https://cloud.google.com/compute"&gt;Google Compute Engine&lt;/a&gt;, as an alternative to physical workstations. Further research had shown that it was feasible to move our physical workstation fleet to these virtual desktops in the cloud. The research began with user experience analysis – looking into how employee satisfaction of cloud workstations compared with physical desktops. Researchers found that user satisfaction of cloud desktops was higher than that of their physical desktop counterparts! This was a monumental moment for cloud-based client computing at Google, and this discovery led to additional analyses of Compute Engine to understand if it could become our preferred (virtual) workstation platform of the future.&lt;/p&gt;&lt;p&gt;Today, Google’s internal use of virtual desktops has increased dramatically. Employees all over the globe use a mix of virtual Linux and Windows desktops on Compute Engine to complete their work. Whether an employee is writing code, accessing production systems, troubleshooting issues, or driving productivity initiatives, virtual desktops are providing them with the compute they need to get their work done. Access to virtual desktops is simple: some employees access their virtual desktop instances via Secure Shell (SSH), while others use Chrome Remote Desktop — a graphical access tool. &lt;/p&gt;&lt;p&gt;In addition to simplicity and accessibility, Google has realized a number of benefits from virtual desktops. We’ve seen an enhanced security posture, a boost to our sustainability initiatives, and a reduction in maintenance effort associated with our IT infrastructure. All these improvements were achieved while improving the user experience compared to our physical workstation fleet.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Google Cloud TPU" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_0EHHfvd.max-1000x1000.jpg"/&gt;&lt;figcaption class="article-image__caption "&gt;&lt;div class="rich-text"&gt;Example of Google Data Center&lt;/div&gt;&lt;/figcaption&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;h3&gt;Analyzing Cloud vs Physical Desktops&lt;/h3&gt;&lt;p&gt;Let’s look deeper into the analysis Google performed to compare cloud virtual desktops and physical desktops. Researchers compared cloud and physical desktops on five core pillars: user experience, performance, sustainability, security, and efficiency.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-image_full_width"&gt;&lt;div class="article-module h-c-page"&gt;&lt;div class="h-c-grid"&gt;&lt;figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 "&gt;&lt;img alt="Google Cloud core pillars" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image4_6gvUvXe.max-1000x1000.png"/&gt;&lt;/figure&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="block-paragraph"&gt;&lt;div class="rich-text"&gt;&lt;p&gt;&lt;b&gt;User Experience&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Before the transition to virtual desktops got underway, user experience researchers wanted to know more about how they would affect employee happiness. They discovered that employees embraced the benefits that virtual desktops offered. This included freeing up valuable desk space to provide an always-on, always available compute experience, accessible from anywhere in the world, and reduced maintenance overhead compared to physical desktops. &lt;/p&gt;&lt;p&gt;&lt;b&gt;Performance&lt;/b&gt;&lt;/p&gt;&lt;p&gt;From a performance perspective, cloud desktops are simply better than physical desktops. For example, running on &lt;a href="https://cloud.google.com/compute"&gt;Compute Engine&lt;/a&gt; makes it easy to spin-up on-demand virtual instances with predictable compute and performance – a task that is significantly more difficult with a physical workstation vendor. Virtual desktops rely on a mix of Virtual Machine (VM) families that Google developed based on the performance needs of our users. These include Google Compute Engine&lt;a href="https://cloud.google.com/compute/docs/machine-resource"&gt;E2 high-efficiency instances&lt;/a&gt;, which employees might use for day-to-day tasks, to higher-performance &lt;a href="https://cloud.google.com/compute/docs/machine-resource"&gt;N2/N2D instances&lt;/a&gt;, which employees might use for more demanding machine learning jobs. Compute Engine offers a VM shape for practically any computing workflow. Additionally, employees no longer have to worry about machine upgrades (to increase performance, for example) because our entire fleet of virtual desktops can be upgraded to new shapes (with more CPU and RAM) with a single config change and a simple reboot — all within a matter of minutes. Plus, Compute Engine continues to add features and new machine types, which means our capabilities only continue to grow in this space.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Sustainability&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Google cares deeply about sustainability and has been &lt;a href="https://sustainability.google/" target="_blank"&gt;carbon neutral since 2007&lt;/a&gt;. Moving from physical desktops to virtual desktops on Compute Engine brings us closer to Google sustainability goals of a net-neutral desktop computing fleet. Our internal facilities team has praised virtual desktops as a win for future workspace planning, because a reduction in physical workstations could also mean a reduction in first-time construction costs of new buildings, significant (up to 30%) campus energy reductions, and even further reductions in costs associated with HVAC needs and circuit size needs at our campuses. Lastly, a reduction in physical workstations also contributes to a reduction in physical e-waste and a reduction in the carbon associated with transporting workstations from their factory of origin to office locations. At Google’s scale, these changes lead to an immense win from a sustainability standpoint. &lt;/p&gt;&lt;p&gt;&lt;b&gt;Security&lt;/b&gt;&lt;/p&gt;&lt;p&gt;By their very nature, virtual desktops mitigate the ability for a bad actor to exfiltrate data or otherwise compromise physical desktop hardware since there is no desktop hardware to compromise in the first place. This means attacks such as USB attacks, evil maid attacks, and similar techniques for subverting security that require direct hardware access become worries of the past. Additionally, the transition to cloud-based virtual desktops also brings with it an enhanced security posture through the use of Google Cloud’s myriad security features including &lt;a href="https://cloud.google.com/confidential-computing"&gt;Confidential Computing&lt;/a&gt;, &lt;a href="https://cloud.google.com/blog/products/identity-security/virtual-trusted-platform-module-for-shielded-vms-security-in-plaintext"&gt;vTPMs&lt;/a&gt;, and more. &lt;/p&gt;&lt;br/&gt;&lt;p&gt;&lt;b&gt;Efficiency&lt;/b&gt;&lt;/p&gt;&lt;p&gt;In the past, it was not uncommon for employees to spend days waiting for IT to deliver new machines or fix physical workstations. Today, cloud-based desktops can be created instantaneously on-demand and resized on-demand. They are always accessible, and virtually immune from maintenance-related issues. IT no longer has to deal with concerns like warranty claims, break-fix issues, or recycling. This time savings enables IT to focus on higher priority initiatives all while reducing their workload. With an enterprise the size of Google, these efficiency wins added up quickly. &lt;/p&gt;&lt;h3&gt;Considerations to Keep in Mind&lt;/h3&gt;&lt;p&gt;Although Google has seen significant benefits with virtual desktops, there are some considerations to keep in mind before deciding if they are right for your enterprise. First, it’s important to recognize that migrating to a virtual fleet requires a consistently reliable and performant client internet connection. For remote/global employees, it’s important they’re located geographically near a Google Cloud Region (to minimize latency). Additionally, there are cases where physical workstations are still considered vital. These cases include users who need USB and other direct I/O access for testing/debugging hardware and users who have ultra low-latency graphics/video editing or CAD simulation needs. Finally, to ensure interoperability between these virtual desktops and the rest of our computing fleet, we did have to perform some additional engineering tasks to integrate our asset management and other IT systems with the virtual desktops. Whether your enterprise needs such features and integration should be carefully analyzed before considering a solution such as this. However, should you ultimately conclude that cloud-based desktops are the solution for your enterprise, we’re confident you’ll realize many of the benefits we have!&lt;/p&gt;&lt;h3&gt;Tying It All Together&lt;/h3&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;Although moving Google employees to virtual desktops in the clouds was a significant engineering undertaking, the benefits have been just as significant.  Making this switch has boosted employee productivity and satisfaction, enhanced security, increased efficiency, and provided noticeable improvements in performance and user experience. In short, cloud-based desktops are helping us transform how Googlers get their work done. During the pandemic, we saw the benefits of virtual desktops in a critical time. Employees had access to their virtual desktop from anywhere in the world, which kept our workforce safer and reduced transmission vectors for COVID-19. We’re excited for a future where more and more of our employees are computing in the cloud as we continue to embrace the work-from-anywhere model and as we continue to add new features and enhanced capabilities to Compute Engine!&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;</description><pubDate>Thu, 08 Dec 2022 13:00:00 -0000</pubDate><guid>https://cloud.google.com/blog/topics/developers-practitioners/googles-virtual-desktop-future/</guid><category>Google Cloud</category><category>Developers &amp; Practitioners</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Google’s Virtual Desktop of the Future</title><description>Dive into the history of virtual desktops and the numerous benefits Google has seen from implementing virtual desktops.</description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/developers-practitioners/googles-virtual-desktop-future/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Nick Yeager</name><title>Manager, Google Computing</title><department></department><company></company></author></item></channel></rss>