Here are all the actual test exam dumps for IT exams. Most people prepare for the actual exams with our test dumps to pass their exams. So it's critical to choose and actual test pdf to succeed.

Exam Data-Engineer-Associate Topic 1 Question 122 Discussion

Actual exam question for Amazon's Data-Engineer-Associate exam
Question #: 122
Topic #: 1
A company uploads .csv files to an Amazon S3 bucket. The company's data platform team has set up an AWS Glue crawler to perform data discovery and to create the tables and schemas.
An AWS Glue job writes processed data from the tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creates the Amazon Redshift tables in the Redshift database appropriately.
If the company reruns the AWS Glue job for any reason, duplicate records are introduced into the Amazon Redshift tables. The company needs a solution that will update the Redshift tables without duplicates.
Which solution will meet these requirements?

Suggested Answer: A Vote an answer

To avoid duplicate records in Amazon Redshift, the most effective solution is to perform the ETL in a way that first loads the data into astaging tableand then uses SQL commands like MERGE or UPDATE to insert new records and update existing records without introducing duplicates.
* Using Staging Tables in Redshift:
* The AWS Glue job can write data to astaging tablein Redshift. Once the data is loaded, SQL commands can be executed to compare the staging data with the target table andupdate or insert records appropriately. This ensures no duplicates are introduced during re-runs of the Glue job.
Reference:Amazon Redshift Best Practices
Alternatives Considered:
B (MySQL upsert): This introduces unnecessary complexity by involving another database (MySQL).
C (Spark dropDuplicates): While Spark can eliminate duplicates, handling duplicates at the Redshift level with a staging table is a more reliable and Redshift-native solution.
D (AWS Glue ResolveChoice): The ResolveChoice transform in Glue helps with column conflicts but does not handle record-level duplicates effectively.
References:
Amazon Redshift MERGE Statements
Staging Tables in Amazon Redshift

by Gordon at Apr 07, 2026, 02:36 AM

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Nick name: Submit Cancel
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.