Google Cloud launches BigLake and unifies its data platforms.

At its Cloud Data Summit, Google Cloud announced the preview of BigLake, a new data lake engine built on the experience gained with its BigQuery cloud data warehouse.

Apparently, Google Cloud has earned a good reputation in recent years as a Data Cloud with its BigQuery data warehouse, its serverless Cloud Run technologies and its “Vertex AI” ML analysis tools.

Riding on the same desire to unify lakes and data warehouses that led Microsoft to launch Azure Synapse a little over a year ago, Google Cloud announces ” BigLake to break down the silos that separate lakes and warehouses. Silos synonymous with limiting analyses, increasing risks but also increasing costs because it is often necessary to duplicate the data to carry out cross-analysis.

BigLake enables enterprises to unify their data lakes and warehouses to analyze data without worrying about the format or underlying storage system, eliminating the need to duplicate or move data and reducing cost and data loss. efficiency explains the editor.

Google Cloud BigLake: System Architecture

The solution is really designed in a spirit of unification of the different data lakes based on all the experience acquired by Google Cloud but also by its customers with BigQuery. BigLake thus not only makes it possible to rely on the BigQuery data warehouse and the data stored there, but also on lakes stored on AWS S3 and on Azure Data Lake. All with maximum security but also efficiency in order to perform advanced analyzes and explore the full potential of ML and AI.

With BigLake, customers get granular access controls, with an API interface spanning Google Cloud and open formats like Parquet, as well as open-source processing engines like Apache Spark. These capabilities extend BigQuery’s innovations over the past decade to data lakes on Google Cloud Storage to create an open, flexible, and cost-effective architecture. adds the editor.

Alongside this flagship announcement, Google Cloud also announced two new features that are sure to resonate with its data service users.

A new feature ” Change Streams allows you to perform real-time change capture (CDC – Change Data Capture) on Spanner, Google Cloud’s distributed SQL database. This functionality makes it possible to track in real time any changes (inserts, updates, deletes) made to the data and to trigger Pub/Sub events, update analyses, etc.

Another novelty, Google Cloud announces the availability in final version of “ Vertex AI Workbench », a new user-friendly and interactive tool to manage the entire life cycle of Data Science projects. ” Vertex AI Workbench brings data and ML systems together in a single interface so teams have a common set of tools for data analysis, data science, and machine learning. Teams can build, train, and deploy an ML model five times faster than before explains the editor.

Also read:

> Google Cloud Next’21: Between sovereign and multicloud initiatives

> Google Cloud launches three new Data services

> Dashboards to measure CO2 emissions from Azure and Google Cloud.

> Azure Synapse unifies data-warehouses and data-lakes

> With Azure Purview, Microsoft is investing in the data governance market

> The French startup Saagie extends its DataOps orchestrator internationally.

Leave a Comment