Week 9 – Data Warehouses

This week we covered data warehouses, with a bit of a focus on the relationship with big data. A few questions posed were:

  1. What changes occur in the presence of big, fast, possibly unstructured data?
  2. Is the Data Warehouse architecture still the same?
  3. If not, what needs to be changed or adapted?

In my view, big data is just another data dimension that can be processed with technologies like Hadoop, and then brought into the data warehouse like any other set of data. Using a mechanism like MapReduce to gain insight from masses of data, then allows that insight to be overlayed with other data from transactional systems, as well as external systems, to provide better information to make business decisions.

But is the data warehouse architecture still the same? Yes and no. I think that originally the driver behind having a data warehouse was to be able to run queries against your data, without affecting your transaction system’s performance. But these days, your could run your transactional system on an in-memory database like SAP HANA which runs very quickly. So do you still need a data warehouse? http://www.element61.be/e/resourc-detail.asp?ResourceId=767 argues that you do because:

  1. Data warehouses provide a single version of the truth over aggregates of data coming from multiple data sources, not just transactional systems;
  2. Data warehouses can run data quality processes that wouldn’t be running in the transactional system;
  3. Data warehouses can provide a historical view of information, which may no longer be stored in a transactional system.

Therefore, it’s likely that the architecture of a data warehouse will remain, augmented by in-memory technologies, with big data systems like Hadoop (or HDFS) used as just another data source as an input to the data warehouse. This was reiterated in one of the readings for the week, “Integrating Hadoop into Business Intelligence and Data Warehousing” by Philip Russom, which notes “Users experienced with HDFS consider it a complement to their DW”.

I think an infrastructural trend towards data warehouses is the creation of them in the cloud. Infrastructure in the cloud is very cheap, with products like Amazon Reshift providing Cloud Datawarehouses the can store petabytes of information, without having to purchase expensive hardware.

Another reading was to look at Tableau’s perspective on the Top 10 Trends in Business Intelligence for 2014. As an Enterprise Architect I read these sorts of sales pitches/white papers every day, and I find them to be a bit generic. The list is as follows:

  1. The end of data scientists.
  2. Cloud business intelligence goes mainstream.
  3. Big data finally goes to the sky.
  4. Agile business intelligence extends its lead.
  5. Predictive analytics, once the realm of advanced and specialized systems, will move into the mainstream.
  6. Embedded business intelligence begins to emerge in an attempt to put analytics in the path of everyday business activities.
  7. Storytelling becomes a priority.
  8. Mobile business intelligence becomes the primary experience for leading-edge organizations.
  9. Organizations begin to analyze social data in earnest.
  10. NoSQL is the new Hadoop.

This list really shows the relationship of data warehouses and BI in the broader context of IT, such as Cloud Computing, Agile, and Mobility. So while all the steps make sense, there’s not too many pearls of wisdom. In fact, pointing out that Storytelling is becoming a priority appears pretty self-evident to me, where if the point of BI is to “turn data into insight to make business decisions”, then if decision makers don’t understand the insight put in front of them, they they’ll fail to use that insight to make their decisions, eroding the business value of BI.

One thought on “Week 9 – Data Warehouses

  1. good discussion of DW vs. big data. I also believe that a traditional DW architecture won’t change much, at least not yet. This may change over the next 10 years, which nobody knows.

Leave a Reply

Your email address will not be published.