ExamGecko
Question list
Search
Search

List of questions

Search

Related questions











Question 290 - Professional Data Engineer discussion

Report
Export

You work for a large ecommerce company. You store your customers order data in Bigtable. You have a garbage collection policy set to delete the data after 30 days and the number of versions is set to 1. When the data analysts run a query to report total customer spending, the analysts sometimes see customer data that is older than 30 days. You need to ensure that the analysts do not see customer data older than 30 days while minimizing cost and overhead. What should you do?

A.
Set the expiring values of the column families to 30 days and set the number of versions to 2.
Answers
A.
Set the expiring values of the column families to 30 days and set the number of versions to 2.
B.
Use a timestamp range filter in the query to fetch the customer's data for a specific range.
Answers
B.
Use a timestamp range filter in the query to fetch the customer's data for a specific range.
C.
Set the expiring values of the column families to 29 days and keep the number of versions to 1.
Answers
C.
Set the expiring values of the column families to 29 days and keep the number of versions to 1.
D.
Schedule a job daily to scan the data in the table and delete data older than 30 days.
Answers
D.
Schedule a job daily to scan the data in the table and delete data older than 30 days.
Suggested answer: B

Explanation:

By using a timestamp range filter in the query, you can ensure that the analysts only see the customer data that is within the desired time range, regardless of the garbage collection policy1. This option is the most cost-effective and simple way to avoid fetching data that is marked for deletion by garbage collection, as it does not require changing the existing policy or creating additional jobs.You can use the Bigtable client libraries or the cbt CLI to apply a timestamp range filter to your read requests2.

Option A is not effective, as it increases the number of versions to 2, which may cause more data to be retained and increase the storage costs. Option C is not reliable, as it reduces the expiring values to 29 days, which may not match the actual data arrival and usage patterns. Option D is not efficient, as it requires scheduling a job daily to scan and delete the data, which may incur additional overhead and complexity.Moreover, none of these options guarantee that the data older than 30 days will be immediately deleted, as garbage collection is an asynchronous process that can take up to a week to remove the data3.Reference:

1: Filters | Cloud Bigtable Documentation | Google Cloud

2: Read data | Cloud Bigtable Documentation | Google Cloud

3: Garbage collection overview | Cloud Bigtable Documentation | Google Cloud

asked 18/09/2024
Chukwuebuka Ogbonna
41 questions
User
Your answer:
0 comments
Sorted by

Leave a comment first