Redshift performance: SQL queries vs table normalization

By paratrooper

I’m working on building a redshift database by listening to events from from different sources and pump that data into a redshift cluster.

The idea is to use Kinesis firehose to pump data to redshift using COPY command. But I have a dilemma here: I wish to first query some information from redshift using a select query such as the one below:

select A, B, C from redshift__table where D='x' and E = 'y';

After getting the required information from redshift, I will combine that information with my event notification data and issue a request to kinesis. Kinesis will then do its job and issue the required COPY command.

Now my question is that is it a good idea to repeatedly query redshift like say every second since that is the expected time after which I will get event notifications?

Now let me describe an alternate scenario:

If I normalize my table and separate out some fields into a separate table then, I will have to perform fewer redshift queries with the normalized design (may be once every 30 seconds)

But the downside of this approach is that once I have the data into redshift, I will have to carry out table joins while performing real time analytics on my redshift data.

So I wish to know on a high level which approach would be better:

  1. Have a single flat table but query it before issuing a request to kinesis on an event notification. There wont be any table joins while performing analytics.

  2. Have 2 tables and query redshift less often. But perform a table join while displaying results using BI/analytical tools.

Which of these 2 do you think is a better option? Let us assume that I will use appropriate sort keys/distribution keys in either cases.

Source: Stack Overflow


Share it with your friends!

    Fatal error: Uncaught Exception: 12: REST API is deprecated for versions v2.1 and higher (12) thrown in /home/content/19/9652219/html/wp-content/plugins/seo-facebook-comments/facebook/base_facebook.php on line 1273