Methods to load JSON data to BigQuery
JavaScript Notation Object (JSON) is a widely used data interchange format. It is easily readable, lightweight, and language-independent. You can load data in JSON from most apps and services. There are three main ways to export data from JSON API into BigQuery:
1. JSON connector by Coupler.io
This is the no-code way to import data from JSON to BigQuery dynamically using the connector by Coupler.io. It’s a reporting automation platform to turn raw data into simple reports. The connector creates a data pipeline that automates data updates at regular intervals. You can also transform the data before importing it to BigQuery.
2. Manual Import (The Most Basic Method)
Google BigQuery supports JSON as one of the data sources for creating tables. You can manually import JSON as a data source to create a new table. This has several drawbacks and may not be suitable for loading dynamic datasets
3. Programmatic Import (Code & Build Your Connector)
Build a custom data connector in programming languages like Python. This method requires a good understanding of Google APIs and coding expertise to load JSON data to BigQuery.
We’ll explore each of the above methods to load JSON to BigQuery.
How to dynamically import JSON data to BigQuery with Coupler.io
You can connect API to BigQuery or load data from online published JSON files with three simple steps:
Step 1: Collect data
Click Proceed in the form below to create a new importer with JSON as the source and BigQuery as the destination. You’ll be offered to get started with Coupler.io for free with no credit card required.
Coupler.io also allows you to
- import JSON to Power BI
- connect JSON to Google Sheets
- connect JSON to Tableau, and other data warehouses & BI tools.
Configure the following parameters:
- Enter the JSON URL (often called endpoint URL) from where the importer can retrieve the data. This address can vary according to the data you want to fetch. For example:
- Select the appropriate HTTP Method. GET is set by default and it will be your HTTP method in most cases. Refer to the official API docs of the respective app/service for more details.
- You can also specify the Request headers and URL query parameters if the API requires them. Here is what it may look like
Step 2: Transform data
At this step, Coupler.io will preview up to 500 rows from the data source. You can transform the data before importing it to BigQuery:
- Manage columns – Choose the columns you want to import from JSON to BigQuery. You can hide the unnecessary columns from here.
- Filter and sort data- Set conditions to extract specific data from JSON into BigQuery. You can specify multiple conditions with AND & OR operators. Arrange the data sorted in an ascending or descending order of a specific column.
- Add new columns – Create a calculable column with data calculated from existing columns. Learn more about calculable columns in Coupler.
- Merge data – Choose the way to combine data if you’ve connected multiple data sources.
Step 3: Load data and schedule refresh
Connect your BigQuery account. For this, you’ll need to upload the Google Cloud JSON key file. Enter the Dataset name and Table name where you want to import the data. A new dataset/table will be created if the provided destination is not found.
The importer will auto-detect the data schema. However, you can define the table schema manually by switching the option off.
Optionally, you can configure the importer to add a timestamp of the last update and replace or append new data in the BigQuery table.
The best part of using the Coupler.io connector is that it allows you to automatically load JSON data to BigQuery at regular intervals. Switch on the Automatic data refresh option and configure the desired schedule.
Save and Run your importer, and wait some seconds while your JSON data is loaded to BigQuery.
Now, we can sit back and relax! Coupler.io will run automated load jobs at regular intervals as configured and keep the BigQuery data updated. It requires no further intervention. And that’s JSON to BigQuery connection automated!
Manually using JSON to enter data into BigQuery table
You can manually upload the file in JSONL format from your computer or cloud storage and create a table in BigQuery.
But what does this format mean?
In the Newline Delimited JSON (a.k.a NDJSON or JSONL), the JSON objects are separated by a new line.
JSON Syntax Example:[ {“name”: “Alice”, “age”: 25, “city”: “New York”}, {“name”: “Bob”, “age”: 32, “city”: “San Francisco”}, {“name”: “Eve”, “age”: 28, “city”: “Los Angeles”}, {“name”: “Michael”, “age”: 40, “city”: “Chicago”}]
JSONL Syntax Example:{“name”: “Alice”, “age”: 25, “city”: “New York”}{“name”: “Bob”, “age”: 32, “city”: “San Francisco”}{“name”: “Eve”, “age”: 28, “city”: “Los Angeles”}{“name”: “Michael”, “age”: 40, “city”: “Chicago”}
You can use the free JSON to NDJSON Online Converter app for that.
Once you’re ready with the JSON data in newline delimited format, go to Google BigQuery Explorer and select the dataset to import the JSON.
Click the Create Table button on the top-right corner.
Now, the Create table screen will be displayed where you need to configure the following source settings:
- Create table from – Select the upload to import JSON from your computer. You can select other relevant options, for example, Google Cloud Storage Bucket.
- Select file – Upload the file here.
- File format – Select JSONL (Newline delimited JSON).
Fill up the Table name in the destination section.
Tick the Auto detect option to let BigQuery detect the data type of each column and parse data accordingly. Alternatively, you can declare the data schema manually by keeping this option unchecked.
Click Create Table to import JSON to BigQuery.
Once the import is completed, you can find the table with the JSON data in BigQuery.
The manual JSON import method is very basic. It requires correct formatting of data in JSONL format. Furthermore, it is not an efficient way to import dynamic JSON data sources into BigQuery.
How to solve a common JSON parsing error in BigQuery?
Many users report a JSON parsing error while loading data. It is mainly because of the incompatibility of normal JSON data with BigQuery. The error reads:
“Error while reading data, error message: Failed to parse JSON: Unexpected end of string; Unexpected end of string; Expected key”
You can easily solve this error by converting the data from standard JSON to Newline Delimited JSON format. There are plenty of free online tools to help you with that. Check out JSON to NDJSON Online Converter.
JSON to BigQuery integration using API (programmatic method)
Google offers APIs for its Cloud Platform Services. Application Programming Interface (API) is a standard protocol for exchanging information between applications. It acts like a mediator in the information exchange.
You can use the API to programmatically interact with BigQuery and perform actions such as importing JSON.
But first, you’ll need to enable the BigQuery API in your Google Cloud Console and get the API Key.
Log into your Google Cloud Console and go to Left Menu > APIs and Services > Library.
Search for BigQuery API in the library and enable it.
Once the BigQuery API is enabled, download the API JSON Key from the Google Console.
Now, you can use this downloaded API Key to authenticate requests to BigQuery from a program. Let’s understand this further through an example.
Python script to import JSON to BigQuery
The BigQuery API can be used with programs written in Python, Java, C, and other such languages. Here’s an example of how to do this with Python.
To import JSON to BigQuery via API, you’ll need to install the BigQuery Python library.
Run the following command:
pip install google-cloud-bigquery
Once the Google Cloud BigQuery library is installed, you can use its functions to create tables in BigQuery from the JSON. For example:
from google.cloud import bigquery from google.oauth2 import service_account # Replace with your own JSON file path and BigQuery dataset and table details json_file_path = "path/to/data/source.json" project_id = "bigquery-project-id" dataset_id = "bigquery-dataset-id" table_id = "bigquery-table-id" # Set up credentials (replace 'path/to/your/credentials.json' with your service account key file) credentials = service_account.Credentials.from_service_account_file( "path/to/your/credentials.json", ) # Create a BigQuery client client = bigquery.Client(project=project_id, credentials=credentials) # Specify the dataset and table to which you want to upload the data dataset_ref = client.dataset(dataset_id) table_ref = dataset_ref.table(table_id) # Load the JSON file into BigQuery job_config = bigquery.LoadJobConfig() job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON job_config.autodetect = True # This allows BigQuery to automatically detect the schema with open(json_file_path, "rb") as source_file: job = client.load_table_from_file(source_file, table_ref, job_config=job_config) job.result() # Wait for the job to complete print(f"Loaded {job.output_rows} rows into {table_id}")
In the above script, replace the required elements and save it as a Python script Run the Python program, and it will convert the JSON string into BigQuery tables.
This program supports Newline Delimited JSON; therefore, you’ll need to convert the source file before exporting. Alternatively, you can create an additional function to convert JSON to JSONL format if you’re a pro coder.
This script is a basic example of loading JSON from a local computer using Python. You can modify it to automate the export process or even access JSON from other servers via API. But it demands high-level programming skills.
You’ll also need a server or cloud environment to execute the program. This can be a costly method in the long run.
What is the most efficient way to load JSON data into BigQuery?
Loading JSON data into BigQuery is an excellent way to extract meaningful insights from large data. BigQuery lets you store data in Arrays and Structs, offering more insights. You can choose multiple options to perform data transfer from JSON to BigQuery.
If you’re working with static data sources, which do not require occasional updates, then the manual option can be the right choice. The programmatic method to load JSON to BigQuery seems suitable for data scientists and developers.
On balance, the Coupler.io connector is the most efficient if you’re working with dynamic data sources. Furthermore, Coupler.io offers advanced features, such as data transformation and stitching, which take data analytics to the next level.
Automate data export to BigQuery with Coupler.io
Get started for free