AWS

How to connect to Snowflake from AWS EMR using PySpark

As a ETL developer, we need to transport data between different platforms/services. It involves establishing connections between them. Below is one such use-case to connect Snowflake from AWS. Here are steps to securely connect to Snowflake using PySpark – Login to AWS EMR service and connect to Spark with below snowflake connectors pyspark --packages net.snowflake:snowflake-jdbc:3.11.1,net.snowflake:spark-snowflake_2.11:2.5.7-spark_2.4 Assumption for this article is that secret key is already created in AWS secrets manager service with SnowFlake credentials. In this example, consider the secret key is ‘test/snowflake/cluster’ Using boto3 library connect to AWS secrets manager and extract the snowflake credentials into json object. Sample code snippet below – def ge...

All about AWS – GLUE

What is GLUE? Fully managed ETL service that makes it simple and cost effective to categorize your data, clean it, enrich it and move it reliably between various data stores. It’s a serverless system. Automatically handle discovery and definition of table definitions and schema. Its main use is to serve as a central metadata repository for your data lake. Discover those schemas out of your unstructured data, sitting in S3 or whatever, and publish table definitions for use with analysis tools such as Athena or Redshift or EMR. The purpose of GLUE itself is to extract structure from your unstructured data. If you have data sitting in a data lake, it can provide a schema for that so that you can query it using sequel or sequel like tools including Redshift and Athena and Amazon EMR and ...

Lost Password

Register

24 Tutorials