Name: Amazon Data-Engineer-Associate Exam
Brand: AllExamTopics
SKU: Data-Engineer-Associate
Rating: 5.0 (594 reviews)

Free Amazon Data-Engineer-Associate Questions & Answers

Try free Amazon AWS Certified Data Engineer - Associate (DEA-C01) Practice exam questions before buy.

Question # 1
A data engineer needs Amazon Athena queries to finish faster. The data engineer noticesthat all the files the Athena queries use are currently stored in uncompressed .csv format.The data engineer also notices that users perform most queries by selecting a specificcolumn.Which solution will MOST speed up the Athena query performance?

A. Change the data format from .csvto JSON format. Apply Snappy compression.

B. Compress the .csv files by using Snappy compression.

C. Change the data format from .csvto Apache Parquet. Apply Snappy compression.

D. Compress the .csv files by using gzjg compression.

Answer: C
Explanation: Amazon Athena is a serverless interactive query service that allows you to
analyze data in Amazon S3 using standard SQL. Athena supports various data formats,such as CSV, JSON, ORC, Avro, and Parquet. However, not all data formats are equallyefficient for querying. Some data formats, such as CSV and JSON, are row-oriented,meaning that they store data as a sequence of records, each with the same fields. Roworientedformats are suitable for loading and exporting data, but they are not optimal foranalytical queries that often access only a subset of columns. Row-oriented formats alsodo not support compression or encoding techniques that can reduce the data size andimprove the query performance.On the other hand, some data formats, such as ORC and Parquet, are column-oriented,meaning that they store data as a collection of columns, each with a specific data type.Column-oriented formats are ideal for analytical queries that often filter, aggregate, or joindata by columns. Column-oriented formats also support compression and encodingtechniques that can reduce the data size and improve the query performance. Forexample, Parquet supports dictionary encoding, which replaces repeated values withnumeric codes, and run-length encoding, which replaces consecutive identical values witha single value and a count. Parquet also supports various compression algorithms, such asSnappy, GZIP, and ZSTD, that can further reduce the data size and improve the queryperformance. Therefore, changing the data format from CSV to Parquet and applying Snappycompression will most speed up the Athena query performance. Parquet is a columnorientedformat that allows Athena to scan only the relevant columns and skip the rest,reducing the amount of data read from S3. Snappy is a compression algorithm that reducesthe data size without compromising the query speed, as it is splittable and does not requiredecompression before reading. This solution will also reduce the cost of Athena queries, asAthena charges based on the amount of data scanned from S3.The other options are not as effective as changing the data format to Parquet and applyingSnappy compression. Changing the data format from CSV to JSON and applying Snappycompression will not improve the query performance significantly, as JSON is also a roworientedformat that does not support columnar access or encoding techniques.Compressing the CSV files by using Snappy compression will reduce the data size, but itwill not improve the query performance significantly, as CSV is still a row-oriented formatthat does not support columnar access or encoding techniques. Compressing the CSV filesby using gzjg compression will reduce the data size, but it willdegrade the queryperformance, as gzjg is not a splittable compression algorithm and requires decompressionbefore reading. References:Amazon AthenaChoosing the Right Data FormatAWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide,Chapter 5: Data Analysis and Visualization, Section 5.1: Amazon Athena

Question # 2
A company stores data in a data lake that is in Amazon S3. Some data that the company stores in the data lake contains personally identifiable information (PII). Multiple usergroups need to access the raw data. The company must ensure that user groups canaccess only the PII that they require.Which solution will meet these requirements with the LEAST effort?

A. Use Amazon Athena to query the data. Set up AWS Lake Formation and create datafilters to establish levels of access for the company's IAM roles. Assign each user to theIAM role that matches the user's PII access requirements.

B. Use Amazon QuickSight to access the data. Use column-level security features inQuickSight to limit the PII that users can retrieve from Amazon S3 by using AmazonAthena. Define QuickSight access levels based on the PII access requirements of theusers.

C. Build a custom query builder UI that will run Athena queries in the background to accessthe data. Create user groups in Amazon Cognito. Assign access levels to the user groupsbased on the PII access requirements of the users.

D. Create IAM roles that have different levels of granular access. Assign the IAM roles toIAM user groups. Use an identity-based policy to assign access levels to user groups at thecolumn level.

Answer: A
Explanation:
Amazon Athena is a serverless, interactive query service that enables you to analyze datain Amazon S3 using standard SQL. AWS Lake Formation is a service that helps you build,secure, and manage data lakes on AWS. You can use AWS Lake Formation to create datafilters that define the level of access for different IAM roles based on the columns, rows, ortags of the data. By using Amazon Athena to query the data and AWS Lake Formation tocreate data filters, the company can meet the requirements of ensuring that user groupscan access only the PII that they require with the least effort. The solution is to use AmazonAthena to query the data in the data lake that is in Amazon S3. Then, set up AWS LakeFormation and create data filters to establish levels of access for the company’s IAM roles.For example, a data filter can allow a user group to access only the columns that containthe PII that they need, such as name and email address, and deny access to the columnsthat contain the PII that they do not need, such as phone number and social securitynumber. Finally, assign each user to the IAM role that matches the user’s PII accessrequirements. This way, the user groups can access the data in the data lake securely andefficiently. The other options are either not feasible or not optimal. Using AmazonQuickSight to access the data (option B) would require the company to pay for theQuickSight service and to configure the column-level security features for each user.Building a custom query builder UI that will run Athena queries in the background to accessthe data (option C) would require the company to develop and maintain the UI and tointegrate it with Amazon Cognito. Creating IAM roles that have different levels of granularaccess (option D) would require the company to manage multiple IAM roles and policies and to ensure that they are aligned with the data schema. References:Amazon AthenaAWS Lake FormationAWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide,Chapter 4: Data Analysis and Visualization, Section 4.3: Amazon Athena

Question # 3
A company receives call logs as Amazon S3 objects that contain sensitive customerinformation. The company must protect the S3 objects by using encryption. The companymust also use encryption keys that only specific employees can access.Which solution will meet these requirements with the LEAST effort?

A. Use an AWS CloudHSM cluster to store the encryption keys. Configure the process thatwrites to Amazon S3 to make calls to CloudHSM to encrypt and decrypt the objects.Deploy an IAM policy that restricts access to the CloudHSM cluster.

B. Use server-side encryption with customer-provided keys (SSE-C) to encrypt the objectsthat contain customer information. Restrict access to the keys that encrypt the objects.

C. Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the objects thatcontain customer information. Configure an IAM policy that restricts access to the KMSkeys that encrypt the objects.

D. Use server-side encryption with Amazon S3 managed keys (SSE-S3) to encrypt theobjects that contain customer information. Configure an IAM policy that restricts access tothe Amazon S3 managed keys that encrypt the objects.

Answer: C
Explanation: Option C is the best solution to meet the requirements with the least effortbecause server-side encryption with AWS KMS keys (SSE-KMS) is a feature that allowsyou to encrypt data at rest in Amazon S3 using keys managed by AWS Key ManagementService (AWS KMS). AWS KMS is a fully managed service that enables you to create andmanage encryption keys for your AWS services and applications. AWS KMS also allowsyou to define granular access policies for your keys, such as who can use them to encryptand decrypt data, and under what conditions. By using SSE-KMS, you canprotect your S3objects by using encryption keys that only specific employees can access, without having to manage the encryption and decryption process yourself.Option A is not a good solution because it involves using AWS CloudHSM, which is aservice that provides hardware security modules (HSMs) in the AWS Cloud. AWSCloudHSM allows you to generate and use your own encryption keys on dedicatedhardware that is compliant with various standards and regulations. However, AWSCloudHSM is not a fully managed service and requires more effort to set up and maintainthan AWS KMS. Moreover, AWS CloudHSM does not integrate with Amazon S3, so youhave to configure the process that writes to S3 to make calls to CloudHSM to encrypt anddecrypt the objects, which adds complexity and latency to the data protection process.Option B is not a good solution because it involves using server-side encryption withcustomer-provided keys (SSE-C), which is a feature that allows you to encrypt data at restin Amazon S3 using keys that you provide and manage yourself. SSE-C requires you tosend your encryption key along with each request to upload or retrieve an object. However,SSE-C does not provide any mechanism to restrict access to the keys that encrypt theobjects, so you have to implement your own key management and access control system,which adds more effort and risk to the data protection process.Option D is not a good solution because it involves using server-side encryption withAmazon S3 managed keys (SSE-S3), which is a feature that allows you to encrypt data atrest in Amazon S3 using keys that are managed by Amazon S3. SSE-S3 automaticallyencrypts and decrypts your objects as they are uploaded and downloaded from S3.However, SSE-S3 does not allow you to control who can access the encryption keys orunder what conditions. SSE-S3 uses a single encryption key for each S3 bucket, which isshared by all users who have access to the bucket. This means that you cannot restrictaccess to the keys that encrypt the objects by specific employees, which does not meet therequirements.References:AWS Certified Data Engineer - Associate DEA-C01 Complete Study GuideProtecting Data Using Server-Side Encryption with AWS KMS–ManagedEncryption Keys (SSE-KMS) - Amazon Simple Storage ServiceWhat is AWS Key Management Service? - AWS Key Management ServiceWhat is AWS CloudHSM? - AWS CloudHSMProtecting Data Using Server-Side Encryption with Customer-Provided EncryptionKeys (SSE-C) - Amazon Simple Storage ServiceProtecting Data Using Server-Side Encryption with Amazon S3-ManagedEncryption Keys (SSE-S3) - Amazon Simple Storage Service

Question # 4
A data engineer needs to maintain a central metadata repository that users access throughAmazon EMR and Amazon Athena queries. The repository needs to provide the schemaand properties of many tables. Some of the metadata is stored in Apache Hive. The dataengineer needs to import the metadata from Hive into the central metadata repository.Which solution will meet these requirements with the LEAST development effort?

A. Use Amazon EMR and Apache Ranger.

B. Use a Hive metastore on an EMR cluster.

C. Use the AWS Glue Data Catalog.

D. Use a metastore on an Amazon RDS for MySQL DB instance.

Question # 5
A company is planning to use a provisioned Amazon EMR cluster that runs Apache Sparkjobs to perform big data analysis. The company requires high reliability. A big data teammust follow best practices for running cost-optimized and long-running workloads onAmazon EMR. The team must find a solution that will maintain the company's current levelof performance.Which combination of resources will meet these requirements MOST cost-effectively?(Choose two.)

A. Use Hadoop Distributed File System (HDFS) as a persistent data store.

B. Use Amazon S3 as a persistent data store.

C. Use x86-based instances for core nodes and task nodes.

D. Use Graviton instances for core nodes and task nodes.

E. Use Spot Instances for all primary nodes.

Answer: B,D
Explanation: The best combination of resources to meet the requirements of high
reliability, cost-optimization, and performance for running Apache Spark jobs on AmazonEMR is to use Amazon S3 as a persistent data store and Graviton instances for core nodesand task nodes.Amazon S3 is a highly durable, scalable, and secure object storage service that can storeany amount of data for a variety of use cases, including big data analytics1. Amazon S3 isa better choice than HDFS as a persistent data store for Amazon EMR, as it decouples thestorage from the compute layer, allowing for more flexibility and cost-efficiency. Amazon S3also supports data encryption, versioning, lifecycle management, and cross-regionreplication1. Amazon EMR integrates seamlessly with Amazon S3, using EMR File System(EMRFS) to access data stored in Amazon S3 buckets2. EMRFS also supports consistentview, which enables Amazon EMR to provide read-after-write consistency for Amazon S3objects that are accessed through EMRFS2.Graviton instances are powered by Arm-based AWS Graviton2 processors that deliver upto 40% better price performance over comparable current generation x86-basedinstances3. Graviton instances are ideal for running workloads that are CPU-bound,memory-bound, or network-bound, such as big data analytics, web servers, and opensourcedatabases3. Graviton instances are compatible with Amazon EMR, and can beusedfor both core nodes and task nodes. Core nodes are responsible for running the data processing frameworks, such as Apache Spark, and storing data in HDFS or the local filesystem. Task nodes are optional nodes that can be added to a cluster to increase theprocessing power and throughput. By using Graviton instances for both core nodes andtask nodes, you can achieve higher performance and lower cost than using x86-basedinstances.Using Spot Instances for all primary nodes is not a good option, as it can compromise thereliability and availability of the cluster. Spot Instances are spare EC2 instances that areavailable at up to 90% discount compared to On-Demand prices, but they can beinterrupted by EC2 with a two-minute notice when EC2 needs the capacity back. Primarynodes are the nodes that run the cluster software, such as Hadoop, Spark, Hive, and Hue,and are essential for the cluster operation. If a primary node is interrupted by EC2, thecluster will fail or become unstable. Therefore, it is recommended to use On-DemandInstances or Reserved Instances for primary nodes, and use Spot Instances only for tasknodes that can tolerate interruptions. References:Amazon S3 - Cloud Object StorageEMR File System (EMRFS)AWS Graviton2 Processor-Powered Amazon EC2 Instances[Plan and Configure EC2 Instances][Amazon EC2 Spot Instances][Best Practices for Amazon EMR]

AWS Certified Data Engineer - Associate (DEA-C01) Practice Test

Amazon Data-Engineer-Associate Exam Dumps Questions

Data-Engineer-Associate - AWS Certified Data Engineer - Associate (DEA-C01) Practice Exam Material | AllExamTopics

Why Choose AllExamTopics for Data-Engineer-Associate Exam Preparation?

What’s Included in Our Data-Engineer-Associate Exam Questions PDF?

1. Data-Engineer-Associate Real Exam Questions Answers

2. Practice Material for Self-Assessment

3. Online Practice Material

4. Free Data-Engineer-Associate Practice Questions Answers

5. Comprehensive Study Material

Real Data-Engineer-Associate Exam Questions You Can Trust

Take Your AWS Certified Data Engineer - Associate (DEA-C01) to an Expert Level

Everything You Need to Pass, in One Place

Free Amazon Data-Engineer-Associate Questions & Answers

Discussion

Top Microsoft Exams

Top Cisco Exams

Top Amazon Exams