Federated queries are used to query the data on external sources like Cloud Storage, BigTable, CloudSql, Spreadsheet in google Drive.
Federated queries or have queries that access data that isn’t directly store in bigquery, but insted its store in places like
- Cloud storage (files in Paruet, ORC format files)
- BigTable and CloudSQL
- Spreadsheets in Google Drive
Federated queries let you send a query statement to AlloyDB, Spanner, or Cloud SQL databases and get the result back as a temporary table. Federated queries use the BigQuery Connection API to establish a connection with AlloyDB, Spanner, or Cloud SQL. In your query, you use the EXTERNAL_QUERY
function to send a query statement to the external database, using that database’s SQL dialect. The results are converted to GoogleSQL data types.
How to use federated query – To query data present in Google Storage bucket ?
Here, we are going to see how to use federated queries to pull data from a Google Storage bucket. In this example, CSV files are stored in the Google Storage bucket.
Here’s how you can set up and use federated queries with data in Google Cloud Storage:
- Set Up BigQuery: Ensure you have a Google Cloud project with BigQuery enabled.
- Enable the Required APIs: Make sure that the BigQuery and Google Cloud Storage APIs are enabled in your Google Cloud project.
- Grant Permissions: Ensure that the service account associated with your BigQuery project has the necessary permissions to access the data in Google Cloud Storage.
- Create an External Table: Define an external table in BigQuery that references the data in Google Cloud Storage. You can do this through the BigQuery web UI, command line (bq), or API.Here is an example of how to create an external table using SQL:
CREATE OR REPLACE EXTERNAL TABLE `your_project.your_dataset.external_table`
OPTIONS (
format = 'CSV',
uris = ['gs://your-bucket/your-data-file.csv'] );
Adjust the format and URI options based on your data format and location.
5. Query the External Table: Once the external table is defined, you can query it as if it were a regular BigQuery table. For example:
SELECT *
FROM `your_project.your_dataset.external_table`
LIMIT 10;
reference: https://cloud.google.com/bigquery/docs/federated-queries-intro