This week we covered Big Data and Hadoop, a topic of dear interest to me, as I try and understand what to do with all the electricity smart meter data reads we receive as a company. We used to receive one meter read every two months. Now we receive 48 meter reads a day, or 2880 every two months. That’s quite a volume increase, and increasingly we’ll need to rely on big data techniques to process this data.
Which brings me to my first task for this week, which was to look at other potential or existing use cases for big data. As you can see, the increase in electricity meter reads is quite significant. But it’s still not enough. To start to analyse how people consume electricity, we’ll need to move towards minute-by-minute reading, for each device in the household. So in a day, that could be 7200 meter reads, or 432,000 meter reads every two months. As you can imagine, that’s quite a volume increase from one meter read every two months!
The second task for the week was to check out http://www.kdnuggets.com/2015/07/big-data-big-profits-google-nest-lesson.html, which is a Google Nest case study. Google’s Nest is a thermostat for Heating and Air Conditioning systems in the USA. Nest learns the patterns of behaviour for people in terms of the cooling and heating they want, and more efficiently delivers that than existing ‘dumb’ themostats. Nest is more efficient since it can figure out that no one’s home, and reduce heating, therefore saving power, and money. Of course, to do that, it needs to remember and process a lot of data points, which is a related example of big data similar to the smart meter scenario I pointed out earlier.
The third task for the week was to read an IBM White Paper on the Top Five Ways to get started with big data (http://public.dhe.ibm.com/common/ssi/ecm/im/en/imw14710usen/IMW14710USEN.PDF), which are:
- Big Data exploration, which is exploring information from sensors, and extracting trends. The company I work for currently does this, by extracting information from Power Station sensors, and doing trend analysis, using software called OSIsoft PI Historian (http://www.automatedresults.com/PI/pi-historian.aspx).
- Getting a 360 degree view of the customer, which is something very important to the company I work for. The more information we know about a customer, the more finer grained we can tailor our products and pricing to that customer, which in turn is designed to improve service and reduce churn. Of course, a counterpoint to that is that some people view it as creepy when large organisations collect a large amount of information about customers, and therefore, there is a responsibility to make sure that we do that collection with good intentions, i.e. for the purpose of delivering better products and services. More and more big data needs to be combined with in-memory databases such as SAP HANA (http://hana.sap.com/abouthana.html) to allow us to process data in a timely manner.
- Security and intelligence extension, another valuable use case for the company I work for, since the number of cyber attacks against us continues to grow, being able to sort through the logs of hundreds of servers, and thousands of desktops allows us to spot trends, such as malicious attacks running over multiple months. Without big data, we wouldn’t be able to process this amount of logs. Tools like Splunk (http://www.splunk.com/) allow us to analyse this.
- Operations analysis, which is the optimisation of our business using sensor data. I’d argue this is a pretty similar use case for us as big data exploration, though i understand one is about exploring new trends, and the other one is about optimising existing patterns in the data.
- Data warehouse optimisation, which is particularly important considering the massive increase in data processing (see my original point about smart meter data).
The big implication that I already touched on was the creepiness factor of large organisations knowing more and more information about you. My views is that the mass personalisation of products and pricing just for you delivers better service, though I also understand why some people would want to opt out of this data-utopia. I do think more and more though that’ll become difficult, if not impossible to opt out of. It’s a bit like not using Facebook, sure, you don’t have to, but eventually you’ll never get invited to events because they’re all hosted on Facebook which you’ll never see. So I don’t think all the implications of big data are positive, but then again, all technology has positive and negative consequences.
Finally, we were tasked to think about if big data is the right phrase. Personally, I think it’s just data, rather than big data. There is an explosion of data everywhere, which grows exponentially. Therefore, there won’t be any other processing other that big data.
As a side note, we also went through how MapReduce works. My advice is to check out:
which is an excellent video in describing how MapReduce splits tasks across nodes, then combines the tasks to create a result.