Week 12 – Internet of Things and Ubiquitous computing

This was the final week of MSYS559, E-Business Technologies, a Masters level paper from the University of Waikato.

This week we looked at upcoming trends around the Internet of Things, Ubiquitous computing, and how this new paradigm of computing hangs together with all the previous topics.

The To Dos for this week was the look The Internet of Things – A Primer. The key takeaway is now everything will be connected to the Internet. And not just things, but many parts of a thing. A car can have a connection with an entertainment partner to provide audio, the transport authority to understand safety regulations in the area, other nearby cars to sense their location, the car manufacturer to measure car performance, the petrol company to state where’s the next best place to fill up, and what’s available.

The core components of Internet of Things according to this visualisation are:

  • Technology – nearly every physical object having sensors, communicating their states to other entities;
  • Innovation – we’ve never had the ability to monitor sensor information for everything, all the time, in a hyper-connected world. What are the new things we can do with this?
  • Domains – all this information will be mashed together from different data silos or domains, all interconnected to everything, and starting to consider what to do, without human input.
  • Application – things will now share information, be monitored, respond to conditions, and will behave like a vast autonomous system.

The scenario I’m most interested in is energy, smart grids, and smart homes. As every device in the home knows how much energy it consumes, and for what task, they can be optimised to use energy at the correct time, at the best price, for the best purpose. All this automatically without people needing to think about this.

But there are of course, risks. The Internet is a dangerous place, and we haven’t really considered the consequences of what happens when everything is controllable remotely. For instance, take an electric heater. It’s turned off. Someone accidentally drapes some clothes over it or near it. Hackers turn the heater on. A fire happens, and people die. Who’s liable in this case? If you have a smart kettle and hackers turn it on and it boils nothing, or is turned off an on a hundred times a second, what happens to these devices? Who programs the kettle to now define a safe standard operating mode?

The second to do was to read Software Above the Level of a Single Device – The Implications by Tim O’Reilly. This talks about what’s missing with Internet of Things – People. There’s an obsession towards things, what things do, what things know, what things think about. But this doesn’t take into account that things are tools – tools that help people achieve outcomes. And so while it appears that the problems of Internet of Things are technical problems, most of the time, they’re actually people problems, mostly relating to “What is the user trying to do?”.

Users provide inputs, sometimes explicit by interacting with a thing, and sometimes implicit by interacting with the environment. The example provided was the Nest thermostat, which adjusts the temperature depending on what you physically adjust it to, or based on if you’re in the room or not.

These things make decisions based on sensors, but all these sensors are are user interfaces to the thing. If Nest sees I’m not in the room using infrared, that’s still a user interface to Nest. And sometimes these user interfaces are poorly designed because they only think about what the thing is trying to achieve, without taking into the context of the world we’re in, and how the thing fits into that world. The example used was a Tesla car key, which doesn’t have a key ring. It does everything a car key needs to do, but has a poor user interface to the world since people keep losing the car key.

However, the overriding theme of the reading was, don’t just use Internet of Things to solve today’s problems. Think about tomorrow’s problems and solve those. Solve the hard things.

My reflection is, by the time you’ve come up with something to solve the future problem, the future is already here, and hopefully your timing is great.

Week 11 – Mobile Commerce and Location-based Services

This week we looked at technologies for m-commerce, location-based services, and m-commerce.

The review questions for this week are:

Discuss the attributes, benefits, and fundamental drivers of m-commerce.

The attributes of m-commerce are:

  • Ubiquity – m-commerce is everywhere, you don’t have to go to a special place to find it, it is where you are.
  • Convenience – because it’s everywhere, it’s very easy to take advantage of, which makes it the first choice of commerce for a lot of scenarios, like ordering a taxi.
  • Interactivity – the method of purchasing requires you and the system to be involved, for each party to know each other better to provide better service.
  • Personalisation – as you interact with the service more and more, it understands your tastes, and configures itself to emphasise the things you want. Like how amazon.com shows you things based on your history of things you’ve already looked at.
  • Localisation – because m-commerce is everywhere, it must take advantage of your location in order to tailor services that are relevant to you. If the phone knows where you are, then when you order a taxi, you no longer have to tell it where to pick you up.

The benefits of m-commerce are the concrete realisation of the attributes listed above, which are all positive reasons for people to embrace m-commerce. There’s a device in your pocket which can let you transact with the world at any time, compare prices, give you a wealth of information, and help you make decisions fast. Why wouldn’t people take advantage of it?

The fundemental driver of m-commerce is the fact that the world is becoming more mobile, and mobile is the primary computing platform of choice for more and more people. This year tablets will surpass PCs in sales (http://www.extremetech.com/computing/185937-in-2015-tablet-sales-will-finally-surpass-pcs-fulfilling-steve-jobs-post-pc-prophecy). So if people are moving towards mobile platforms, it’s no surprise that m-commerce exists to satisfy that demand.

Discussion m-commerce applications in banking and financial services.

Kiwibank Home Hunter (http://www.sushmobile.com/nz/home-hunter-5/) is an engaging app used to find houses for sale. Potential customers could locate houses for sale on the app, and then at the location, do things like track the sun in the sky to understand how much sun this house is likely to receive. That is only possible in real time by taking advantage of the location of the app. Once a potential customer had decided they liked the house, they could apply for a mortgage pre-approval on the app to understand their borrowing position.

For me, this was amazing to see the engagement the bank could have through one app. No longer did people need to go to multiple websites to find a house, another website to do a mortgage, another website to learn about the area.

Describe consumer and personal applications of m-commerce including entertainment.

The big application these days of m-commerce is mobile gaming. A powerhouse in mobile gaming is King, and their popular mobile game Candy Crush Saga (https://en.wikipedia.org/wiki/Candy_Crush_Saga), which notes that in 2014 $1.33 billion USD was spent on in-app purchases on the game.

Understand the technologies and potential applications of location-based m-commerce.

The technologies involved with location-based m-commerce rely on understanding where the device is. The primary technology to do this is GPS which these days is augmented with GLONASS (https://en.wikipedia.org/wiki/GLONASS). The principle is based on triangulation between different satellites in the sky, the distance of which is calculated by the amount of time the signal has taken to reach the mobile device from the satellite. Because of the reliance on satellites, GPS+GLONASS doesn’t do very well indoors or in covered areas. There are other less fine grained methods of location calculation such as the connection to a particular mobile base station, or a particular Wifi hotspot that has a known location. There are other more fine grained methods of location such as being close to beacons indoor.

Anyways, the point of knowing a location is to then customise the service provided to the customer. A good example would be to highlight food trucks near by. Food trucks are mobile, and therefore aren’t always in the same location. People are mobile, and therefore aren’t always in the same location. If food trucks and people both have location trackers, then each party can determine the best way to get closer together. Food trucks can target locations with a lot of people, and people can find where their favourite food truck is located. A win-win for both parties.

There were also some To-Dos for this week which were:

Comment on the recent TechCrunch article “The Future Of the Web is All About Context“.

The crux of the article is that today, services are personalised, but only within the silos of information they’re aware of. To provide better personalisation to customers, services need to aggregate more information across different data sources. On top of that, semantic processing is required to understand the context of why someone wants to know information. This is all to get towards a holy grail of being able to answer questions like “what are some movies on near me that are similar to other movies I like?”. To do this, a system would need to get a list of movies I like (say from Netflix), look at movie theaters near me (from Google Maps), then look at movies that are playing near by (say from the Theater company), and then mash all the data together to make a reasonable answer.

I don’t think the article addresses the ‘creepy’ factor, which is, as systems get a better aggregated view of people, are people OK with that? Maybe I don’t mind that Netflix knows the movies I like, but I don’t want movie theater companies to know this.

Identify an additional example of disruptive innovation in connection with mobile commerce.

The example of disruptive innovation relating to mobile commerce I selected was E-Bay. E-Bay have created an auction and ‘garage sale’ that is global, but also accessible from mobiles. The idea that at any time people worldwide can list a product to anyone else worldwide, make a decision around whether to let the market determine the value through an auction, or just sell it at a known price with a known margin is game changing.

I do note that they’re not as ubiquitous throughout the world as they would like, for instance, in Australia they dominate the market, with 60% of online shoppers using the site in 2013. But in New Zealand, TradeMe became the more popular auction website mainly due to first mover advantage, as well as localisation for New Zealand.

Week 10 – E-Business Architectures

This week we discussed E-Business Architectures, a topic of personal interest to me as an Enterprise Architect!

The first principle we covered was the divide and conquer approach to IT. Really anything can be broken up from a conceptual idea, into a group of logical units, that are realised at a physical level. Of course, this really describes the high level principles of Enterprise Architecture!

For me, I like the 12+1 view of Enterprise Architecture which is as follows:

  • Contexual – this is the layer that defines business strategies, purpose, outcomes etc. Why an organisation exists.
  • Conceptual – this is the layer that senior managers operate at, and describes the highest level components within an organisation, like business units, organisational data models, and generic technologies like CRMs and ERPs.
  • Logical – a further breakdown into supporting functions. From a business perspective, talking about business processes. From a technology layer, talking about a particular component such as an Application Server.
  • Physical – the realisation of the logical layers. Think of it as the concrete version of the logical layer. So processes are realised as procedures and work files. The logical view of Application Servers gets realised as an installed bit of software on one or more literal servers, deployed to a literal data centre.

The conceptual, logical and physical layers, are sliced into business, information, application, and technology towers, thereby creating a 12 box matrix, with the +1 being the contexual layer.

Anyways, we then moved onto Service Oriented Architecture, the core characteristics of this was a group of services loosely coupled, which can be combined into a complex process. The advantage of SOA is that any particular service could be replaced by another service without materially changing the rest of the business process. These services could be delivered by both internal and external entities. A good example of a service would be an address lookup service. This could be created internally initially based on an internal database. Or it could be a web service provided by NZ Post.

So a business process could be a customer joins an energy company. For this to happen, customers need to be quoted a price and product. But for this to happen their address needs to be provided. So once an address is provided by the customer, this information can be provided to an address verification service, which I view as a technical service. Once this is returned, we can call the quoting service, which in turn can call the product technical service, which itself calls the pricing service. These collection of loosely coupled services are aggregated into business services, i.e. the customer join service.

We talked about some of the technical methods around SOA, like WSDLs to describe web services, SOAP on the formatting of those messages, and UDDI to discover services, but all of that stuff just wasn’t adopted in reality from my perspective, because of the complexity of all of those integration methods. These days, REST+JSON seems to be the easiest way to integrate services, especially for modern web applications.

We then talked about Enterprise Architecture Frameworks, a topic of interest to me! We looked at the Zachman Framework which is an ontology, or a way of categorising things, like servers, and applications, and business functions. Zachman doesn’t describe how to use the ontology, it just provides one. TOGAF (of which I’m certified!) describes a method of doing Enterprise Architecture, but doesn’t really describe outputs. If you add Zachman and TOGAF together, you’ll end up with something that’s practical, which in my opinion is the Integrated Architecture Framework.

Finally, we looked at ITIL, specifically thinking about the impact on SOA management and governance. ITIL specifically talks about the lifecycle of services, such as service strategy, design, transition, operation, and continual service improvement. Though in practice, I’ve only really seen Service Operation in use in organisations, typically Service Desk, Incident Management, Problem Management, sometimes Capacity Management etc.

However, full ITIL could be used to think about the business value of services, who are the stakeholders of the service, how they define value, and how the service (made up of more technical services) realises the business outcomes of the stakeholders. But in practice, I haven’t seen this in my 13 years in IT in New Zealand.

Week 9 – Data Warehouses

This week we covered data warehouses, with a bit of a focus on the relationship with big data. A few questions posed were:

  1. What changes occur in the presence of big, fast, possibly unstructured data?
  2. Is the Data Warehouse architecture still the same?
  3. If not, what needs to be changed or adapted?

In my view, big data is just another data dimension that can be processed with technologies like Hadoop, and then brought into the data warehouse like any other set of data. Using a mechanism like MapReduce to gain insight from masses of data, then allows that insight to be overlayed with other data from transactional systems, as well as external systems, to provide better information to make business decisions.

But is the data warehouse architecture still the same? Yes and no. I think that originally the driver behind having a data warehouse was to be able to run queries against your data, without affecting your transaction system’s performance. But these days, your could run your transactional system on an in-memory database like SAP HANA which runs very quickly. So do you still need a data warehouse? http://www.element61.be/e/resourc-detail.asp?ResourceId=767 argues that you do because:

  1. Data warehouses provide a single version of the truth over aggregates of data coming from multiple data sources, not just transactional systems;
  2. Data warehouses can run data quality processes that wouldn’t be running in the transactional system;
  3. Data warehouses can provide a historical view of information, which may no longer be stored in a transactional system.

Therefore, it’s likely that the architecture of a data warehouse will remain, augmented by in-memory technologies, with big data systems like Hadoop (or HDFS) used as just another data source as an input to the data warehouse. This was reiterated in one of the readings for the week, “Integrating Hadoop into Business Intelligence and Data Warehousing” by Philip Russom, which notes “Users experienced with HDFS consider it a complement to their DW”.

I think an infrastructural trend towards data warehouses is the creation of them in the cloud. Infrastructure in the cloud is very cheap, with products like Amazon Reshift providing Cloud Datawarehouses the can store petabytes of information, without having to purchase expensive hardware.

Another reading was to look at Tableau’s perspective on the Top 10 Trends in Business Intelligence for 2014. As an Enterprise Architect I read these sorts of sales pitches/white papers every day, and I find them to be a bit generic. The list is as follows:

  1. The end of data scientists.
  2. Cloud business intelligence goes mainstream.
  3. Big data finally goes to the sky.
  4. Agile business intelligence extends its lead.
  5. Predictive analytics, once the realm of advanced and specialized systems, will move into the mainstream.
  6. Embedded business intelligence begins to emerge in an attempt to put analytics in the path of everyday business activities.
  7. Storytelling becomes a priority.
  8. Mobile business intelligence becomes the primary experience for leading-edge organizations.
  9. Organizations begin to analyze social data in earnest.
  10. NoSQL is the new Hadoop.

This list really shows the relationship of data warehouses and BI in the broader context of IT, such as Cloud Computing, Agile, and Mobility. So while all the steps make sense, there’s not too many pearls of wisdom. In fact, pointing out that Storytelling is becoming a priority appears pretty self-evident to me, where if the point of BI is to “turn data into insight to make business decisions”, then if decision makers don’t understand the insight put in front of them, they they’ll fail to use that insight to make their decisions, eroding the business value of BI.

Week 8 – Big Data and Hadoop

This week we covered Big Data and Hadoop, a topic of dear interest to me, as I try and understand what to do with all the electricity smart meter data reads we receive as a company. We used to receive one meter read every two months. Now we receive 48 meter reads a day, or 2880 every two months. That’s quite a volume increase, and increasingly we’ll need to rely on big data techniques to process this data.

Which brings me to my first task for this week, which was to look at other potential or existing use cases for big data. As you can see, the increase in electricity meter reads is quite significant. But it’s still not enough. To start to analyse how people consume electricity, we’ll need to move towards minute-by-minute reading, for each device in the household. So in a day, that could be 7200 meter reads, or 432,000 meter reads every two months. As you can imagine, that’s quite a volume increase from one meter read every two months!

The second task for the week was to check out http://www.kdnuggets.com/2015/07/big-data-big-profits-google-nest-lesson.html, which is a Google Nest case study. Google’s Nest is a thermostat for Heating and Air Conditioning systems in the USA. Nest learns the patterns of behaviour for people in terms of the cooling and heating they want, and more efficiently delivers that than existing ‘dumb’ themostats. Nest is more efficient since it can figure out that no one’s home, and reduce heating, therefore saving power, and money. Of course, to do that, it needs to remember and process a lot of data points, which is a related example of big data similar to the smart meter scenario I pointed out earlier.

The third task for the week was to read an IBM White Paper on the Top Five Ways to get started with big data (http://public.dhe.ibm.com/common/ssi/ecm/im/en/imw14710usen/IMW14710USEN.PDF), which are:

  1. Big Data exploration, which is exploring information from sensors, and extracting trends. The company I work for currently does this, by extracting information from Power Station sensors, and doing trend analysis, using software called OSIsoft PI Historian (http://www.automatedresults.com/PI/pi-historian.aspx).
  2. Getting a 360 degree view of the customer, which is something very important to the company I work for. The more information we know about a customer, the more finer grained we can tailor our products and pricing to that customer, which in turn is designed to improve service and reduce churn. Of course, a counterpoint to that is that some people view it as creepy when large organisations collect a large amount of information about customers, and therefore, there is a responsibility to make sure that we do that collection with good intentions, i.e. for the purpose of delivering better products and services. More and more big data needs to be combined with in-memory databases such as SAP HANA (http://hana.sap.com/abouthana.html) to allow us to process data in a timely manner.
  3. Security and intelligence extension, another valuable use case for the company I work for, since the number of cyber attacks against us continues to grow, being able to sort through the logs of hundreds of servers, and thousands of desktops allows us to spot trends, such as malicious attacks running over multiple months. Without big data, we wouldn’t be able to process this amount of logs. Tools like Splunk (http://www.splunk.com/) allow us to analyse this.
  4. Operations analysis, which is the optimisation of our business using sensor data. I’d argue this is a pretty similar use case for us as big data exploration, though i understand one is about exploring new trends, and the other one is about optimising existing patterns in the data.
  5. Data warehouse optimisation, which is particularly important considering the massive increase in data processing (see my original point about smart meter data).

The big implication that I already touched on was the creepiness factor of large organisations knowing more and more information about you. My views is that the mass personalisation of products and pricing just for you delivers better service, though I also understand why some people would want to opt out of this data-utopia. I do think more and more though that’ll become difficult, if not impossible to opt out of. It’s a bit like not using Facebook, sure, you don’t have to, but eventually you’ll never get invited to events because they’re all hosted on Facebook which you’ll never see. So I don’t think all the implications of big data are positive, but then again, all technology has positive and negative consequences.

Finally, we were tasked to think about if big data is the right phrase. Personally, I think it’s just data, rather than big data. There is an explosion of data everywhere, which grows exponentially. Therefore, there won’t be any other processing other that big data.

As a side note, we also went through how MapReduce works. My advice is to check out:

which is an excellent video in describing how MapReduce splits tasks across nodes, then combines the tasks to create a result.

Week 7 – Cloud Computing

This week’s focus was on Cloud Computing, a topic of dear interest to me. The first thing we were tasked to do was discuss which business models appear appropriate for the cloud. In order to do that, we need to look at the NIST (http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf) Definition of Cloud Computing, which notes the following essential characteristics:

  • On-demand self-service
  • Broad network access
  • Resource pooling
  • Rapid elasticity
  • Measured service

In effect, the Cloud Is a great technology platform for businesses that have started at zero, and would like to scale up without incurring the costs of purchasing hardware, or the significant capital investments of a data centre. Start-ups are a great candidate to use Cloud technology platforms. Another suitable business is traditional businesses that require a platform for proof-of-concepts (POCs). Large companies can try new technologies without any consequence on existing infrastructure, and can be shut down just as easily. Another business type suitable for the Cloud is organisations that need to do batch processing of information in a timely manner. Two examples of that is Metservice, who use AWS (http://www.stuff.co.nz/technology/digital-living/8741213/Amazon-ahead-in-the-cloud) to augment their on-shore weather forecasting simulations, as well as Qantas (http://www.itnews.com.au/news/qantascom-begins-transition-to-aws-402996) who use AWS to do flight and weather forecasting.

Next, we looked at two assigned readings, the first being How CloudFlare promises SSL security – without the key (http://arstechnica.com/information-technology/2014/09/in-depth-how-cloudflares-new-web-service-promises-security-without-the-key/). This article discusses how organisations want to use Cloud computing resources, which allow large organisations, like banks, absorb denial of service attacks. However, these entities want to use Cloud computing without handing over the keys to the kingdom so to speak, or in this particular case, the SSL Private Key used to decrypt communications. Therefore, CloudFlare have created a method that allows the Private Keys to remain stored on Customer’s Servers, rather than on the CloudFlare servers. This allows organisations to take advantage of the cloud, while still controlling their own security.

The second reading was on How can we protect our information in the era of cloud computing (http://www.cam.ac.uk/research/news/how-can-we-protect-our-information-in-the-era-of-cloud-computing). The article describes how information can be protected in the cloud by creating multiple copies in a decentralised manner, also known as peer-to-peer. The article goes on to quote Professor Jon Crowcroft saying “We haven’t seen massive take-up of decentralised networks yet, but perhaps that’s just premature”. I’d argue that we do see massive peer-to-peer networks, they’re just being used to distribute movies and other pirated material. As legal authorities moved to shutdown torrent trackers, these then evolved into Magnet Links (https://en.wikipedia.org/wiki/Magnet_URI_scheme) which no longer require a torrent tracker, but instead identify content based on a hash value.

The final task was to look at the pros and cons of New Zealand Government’s Cloud First strategy (https://www.ict.govt.nz/guidance-and-resources/information-management/requirements-for-cloud-computing). The pros below are listed from the previous link:

  • Cloud computing solutions are scalable: agencies can purchase as much or as little resource as they need at any particular time. They pay for what they use.
  • Agencies do not have to make large capital outlays on computing hardware, or pay for the upkeep of that hardware.
  • Cloud computing provides economies of scale through all-of-government volume discounts. This is particularly beneficial for smaller ICT users.
  • Agencies can easily access the latest versions of common software, which deliver improved and robust functionality, and eliminating significant costs associated with version upgrades.
  • If agencies are able to access the same programmes, and up-to-date versions of those programmes, this will improve resiliency and reduce productivity losses caused when applications are incompatible across agencies.

The cons highlighted in the article is that using the Cloud isn’t a free pass to outsource risk, ultimately it’s the agency’s responsibility to use or not use the Cloud. This includes for example, ensuring that data above RESTRICTED isn’t in a public cloud.

Week 6 – Social Media Technologies

This week we looked at social media technologies, covering an overview of social media technology, social media graphs, finding network communities, and similarity of nodes.

First, we looked generically at social media technologies. For me, the interesting point was organisation’s shift away from more formal knowledge management systems (like Moodle) towards more social knowledge management systems like Wikis. This reflects in my opinion that knowledge in an organisation is held by everyone, and should be updated by everyone where appropriate.

One of the more up and coming uses of social media is a new customer service management channel. http://thenextweb.com/socialforbusiness/2014/10/21/social-media-in-unexpected-parts-of-business/ discusses some interesting opportunities for social media in organisations, specifically:

  1. Cross-team relationships
  2. Monitoring customer conversations
  3. Customer behavioural targeting

Of the three, I’ve had recent experience evaluating an SAP product called Hybris (https://www.hybris.com/en/) which is focused on monitoring customer conversations, and customer behavioural targeting. I think it’s important to note that we live in a multi-channel world where customers choose the social networks they wish to interact with an organisation with, whether that be in real time on Twitter, a Facebook group, sending an email, or using a feedback form. In this scenario, a customer posting a comment on Facebook should be treated no differently than them using a form on the website, and expects the same service. More interestingly is the use of social media for sentiment analysis, which filters all the data from sites like Twitter, and searches for both positive and negative words. This way organisations can see as news is released, or campaigns are released, whether there is positive or negative sentiment towards the brand. This allows organisations in real time to adjust their marketing to amplify positive effects, or counter negative effects.

There were three readings this week, the first being Social media technology usage and customer relationship performance: A capabilities-based examination of social CRM. CRM is Customer Relationship Management software, which is used to capture the relationship a customer has with an organisation. Social CRM augments a traditional CRM with social connectivity. I think some organisations would view this as a communication revolution. The key takeaway for me was noted by Trainor, Andzulis, Rapp, & Agnihotri (2014) as social CRM as a technology “alone may not be sufficient to gain a competitive advantage. Instead, social media technologies merely facilitate capabilities that allow firms to better meet the needs of a customer”. This wasn’t a surprising outcome for myself. I view social media as a natural collaborative extension to email, and really the modern version of bulletin boards and newsgroups. So tooling that takes that into account will really help to gain insight, and at best, amplify a conversation with a customer, but in itself won’t do too much.

The second reading was Social media? Get serious! Understanding the functional building blocks of social media. Kietzmann, Hermkens, McCarthy, & Silvestre (2011) defined seven functional blocks of social media:

  1. Presence – The extent to which users know if others are available
  2. Relationships – The extent to which users relate to each other
  3. Reputation – The extent to which users know the social standing of others and content
  4. Groups – The extent to which users are ordered or form communities
  5. Conversations – The extent to which users communicate with each other
  6. Sharing – The extent to which users exchange, distribute, and receive content
  7. Identity – The extent to which users reveal themselves

All social media sites have these aspects, just in varying degrees. My own personal belief is that Identity is the next big thing on the Internet, and unlike Mark Zuckerberg’s view that “Having two identities for yourself is an example of a lack of integrity” (http://www.michaelzimmer.org/2010/05/14/facebooks-zuckerberg-having-two-identities-for-yourself-is-an-example-of-a-lack-of-integrity/), I believe that we all have multiple identities, one for work (LinkedIn), one for family (Facebook), one for close friends (SnapChat), one for dating etc.

The final reading was Chapter 10 of Mining of Massive Datasets. This, and a few of the videos for this week covered some of the algorithms used as a foundation of grouping nodes together, and understanding their relationships. We looked at relationships between sets of clusters, or Betweenness, and then went into some detail on the Girvan-Newman algorithm, which Rajaraman, Leskovec, & Ullman notes “visits each node X once and computes the number of shortest paths from X to each of the other nodes that go through each of the edges”. While this sounds scary, really it all means how do we infer useful information from a social network? Well, like-minding topics or people like to cluster around each other. So how do you know there’s a cluster? How do you know the relationship between that cluster and other clusters, so you can decide which content to show? Algorithms exist to help us make sense of all this related information.


Trainor, K., Andzulis, J., Rapp, A., & Agnihotri, R. (2014). Social media technology usage and customer relationship performance: A capabilities-based examination of social CRM. Journal of Business Research, 67(2014), 1201-1208.

Kietzmann, J., Hermkens, K., McCarthy, I., & Silverstre, B. (2011). Social media? Get serious! Understanding the functional building blocks of social media. Business Horizons, 54(2011), 241-251.

Rajaraman, A., Leskovec, J., & Ullman, J. (2010). Mining of Massive Datasets.

Week 5 – Recommender Systems

This week we looked at Recommender Systems. We all see Recommender Systems everywhere on the Internet, the big ones being Amazon.com recommending “Other products you may like”, and Netflix recommending other movies to watch.

Recommender systems recommend things we should concentrate our attention on. Recommender systems have been around for ages, when we look at the articles in the newspaper, do we ever stop to wonder, why has the newspaper recommended this article for me? Well, obviously the newspaper can’t make a perfect newspaper for everybody personally, so it uses editorial recommendations, or hand selected recommendations.

However, now that we have the Internet, mass-recommendation is possible, and preferable in the age of The Long Tail.

The basic concepts of recommendation engines is:

  • Users want recommendations
  • Sites have a list of items for recommendation (movies, books, people)
  • We take some inputs like ratings, demographics, and content data
  • We output a prediction of what people might like, the top choice being the recommendation

We could recommend what people are likely to hate, but there’s not much point in that, so recommender engines are tuned towards positive recommendations.

There’s two types of recommender systems:

  • Content-based, which looks at content (like movies), and recommends similar content to that user. So if you’ve watched a cowboy movie and rated it highly on Netflix, Netflix will recommend other cowboy movies that are similar (same actors, same time period, same genre etc).
  • Collaborative filtering, which looks at customers or items, and says, if you like the same book as other users who purchased this book, then here are the items that those customers purchased as well. For example, if you purchased a camera, and other people who purchased that camera also bought a case, Amazon will recommend that case to you.

Sites that recommend similar to Netflix include:

  • Hulu.com (popular TV shows)
  • Pandora.com (music)
  • Crunchyroll.com (Japanese Anime shows)

Sites that recommend differently to Netflix include:

  • iPredict. iPredict recommends outcomes based on the concept of shares. A share pays out $1 if an outcome becomes true. Therefore, $1 = 100% likelihood that an outcome is true. So if a share is trading at $0.04, then there is a 4% likelihood according to the free market than that outcome is true. This is a bit like having 100,000 people bet on recommendations for what movie you would enjoy based on you liking Harry Potter, and then taking the average of their bets, and saying “that’s what the market thinks you’ll enjoy”.

We also were to look at item profiles for the various items, and describe attributes used to describe those items:

  • Ant Man Movie – Year of production, age classification, length of movie, genre, release date, critic scores, director, writers, actors, award nominations,tropes contained, etc.
  • A document (like the Wikipedia page on Recommender systems) – Links to other Wikipedia pages, categories of the page, Links to External pages, number of words in the article, bounce rate of people visiting the page who then left the site, number of people who arrive at Wikipedia at this article first.

Finally, we were to combine all the previous topics we discussed (search engines, advertisers, and recommenders) into a start up business model that delivered groceries to the home.

For me, the model would be shipping American candies and snacks automatically to people’s home, as a Snack Subscription.

First, explaining the business model – the value proposition is delivering delicious American candies and snacks regularly to people who enjoy and value them. The revenue model is Subscription based, where every month people exchange $30 to get a box of snacks delivered to them. The target customer is anyone in the world who values American snacks so much they’re willing to pay $30 a month every month ($360 a year) on American snacks. The distribution channel is via the web for the ordering service, and physical courier for the delivery service.

The second step would be to create a website and get that crawled by a search engine like Google. It would be important to know the keywords that customers are searching for using something like Google Trends for Snack Subscription. Google knows that people searching for Snack Box also want to search for Snack Subscription. We’d have to get links to our websites from other popular websites, to increase our rankings with the Pagerank algorithm.

Of course, it’s not always easy to get links on popular websites, so instead we can advertise instead. Advertising is all getting attention to our links, so we can take our above research for Snack Subscriptions, look at our competitors, and start bidding on keywords. To get better value for money, we can increase the information quality of our advertising by having a site optimised for mobile, including video, and describing our site in a format that the Google Knowledge Graph understands. This in turn will decrease the amount of money we have to pay per click, and increase the likelihood of our ad being displayed on the limited number of advertising slots available on a search query.

Finally, once we have customers arriving at our site, we can present a range of snack subscriptions, and give them the choice of ranking which types of snacks they prefer. These explicit rankings will provide a motivated customer with a more accurate selection of products they’d enjoy. But if they didn’t want to rank anything, we could provide options based on what other customers on the site prefer.

Week 4 – Advertising

This week we covered Advertising. We looked at some of the older types of digital advertising, such as banners, and how to determine when to place them. Banner advertising relies on users randomly browsing websites, and then the advertiser determining what ad to display to the user. There’s only a small amount of information available used to decide what ad to display, such as previous search terms, or previous pages visited. This means there’s a relatively low information value for an ad to a visitor, leading to a low click rate, and a low ROI.

This leads to learning about the difference between knowing a full set of inputs to calculate the best outputs, versus only knowing a subset of inputs, and trying to calculate the best-available output. This can retrospectively be compared to the absolute best output, and the ratio between best-available output and absolute-best output can be used as a metric to understand the value of a particular algorithm.

My reflection is that advertising is really the business of attracting attention of people. There is 300 hours of new YouTube videos uploaded per minute. So can a person gain the attention of others to view their video? There’s some choices:

  1. Improve the discoverability of the video (get people to the video);
  2. Improve the informational value of the video (get people to stay on the video).

People can tell others that a video is good, and that is social media. The system can tell others that a video is good, and that is the platform. Advertisers can tell others that a video is good, and that is advertising. Therefore if we think of advertising as the business of attracting attention, it’s also in competition with platforms and social media. This probably explains why Google created a social media platform (Google Plus), owns a platform (YouTube), and why Facebook has invested heavily in video sharing.

Anyways, back to the topic of advertising. As advertising matured, the more recent innovation is advertising auctions, where advertisers bid on keywords, which they then pay for as clicks are received. Google’s platform is known as Adwords.

When it comes to displaying ads on a query, the problem boils down to:

  • How many ad slots are available on a query page
  • What are the bids that advertisers have for a query
  • How effective is the ad for that advertiser (since Google only gets paid on a click, not just an impression)
  • What budget is available (since advertisers don’t have unlimited budgets).

Google addresses the Adwords problem by:

  • Looking at the bid per advertiser;
  • Looking at the quality of the ad (based on previous click rate);
  • Looking at the attractiveness of the ad (described as format impact);

All of the above values calculates an ad’s Ad Rank. An interesting component is that they use a Vickery Auction, where the winning bid is equal to the bid of the second highest bidder. This determines who should win (the highest bidder), and what the market price should be (the second highest bid).

Of note is that there is the possibility of gaming the system, known as Click Fraud. Because advertisers pay for clicks as a proxy for attention, automated systems can generate clicks on ads, which appear legitimate, but because it’s not actually a person, the advertising doesn’t have any benefit.

That’s why other metrics of engagement, such as Facebook Likes may be better measures of attention than just clicks.

I think that clicks as a proxy for attention as a business model is vulnerable to a different business model that more accurately measures attention. If we define attention as someone listening to a message, understanding it, and actioning it, then perhaps we could:

  • Ask people if they believed the link was valuable to them (whether that be an ad or a social media interaction);
  • Measure understanding (perhaps using product engagement online);
  • Measure actions (perhaps by measuring if further activity around that ad, i.e. searching on Amazon for alternatives).

Week 3 – Search Engines

This week we covered search engines, the obvious example being Google.

Interesting side note – I used to sell search engines to companies around New Zealand, specifically the french EXALEAD CloudView from Dassault.

Google Search these days is founded around the idea of the Knowledge Graph. A good video that shows how people use the knowledge graph is at YouTube,. (2015) and is shown below:

The Knowledge Graph is explained at YouTube,. (2015) and is shown below:

The Knowledge Graph in effect means that Google understands more of the meaning behind your search terms, and their relationship to other things. This in turn allows Google to synthesise answers to your queries quicker, and sometimes by not even leaving the site at all.

The first unconventional way to use Google is to define attributes relating to the main thing you’re search for. https://www.google.com/search?as_st=y&tbm=isch&as_q=cats&as_epq=&as_oq=&as_eq=&cr=&as_sitesearch=.edu&safe=images&tbs=sur:f&gws_rd=ssl is a search for “cats” that is restricted to the .edu domain, and has been labeled for noncommercial reuse. Rather than searching for “cats edu creative commons”, Google has more accurate filters than a straight keyword search.

The second unconventional search is queries about attributes of a search topic. https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=active&q=nz475 searches for the flight NZ475:






Google has interpreted that NZ475 is a flight, and therefore has shown me flight related information in a card. After that are search results which are pages other than Google.

The third unconventional search is Google’s ability to crawl sensitive information. https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=active&q=password+filetype:xls+site:.edu is a link restricted to Excel files “xls” that contain the word “password” and are on an “.edu” domain. Not all of these files are sensitive, however, this goes to show that “security by obscurity” is not an acceptable practice, now that it is easy to use Google to surface sensitive information.

The foundation of the Google Search Engine is Pagerank, a page ranking algorithm. This is described in the paper
The PageRank Citation Ranking: Bringing Order to the Web (Page, Brin, Motwani, & Winograd, 1999). To summarise, links to pages are treated as endorsements of that page. Links themselves are weighted depending on if that webpage is trusted. Trust is defined by the number of links to a website. That way a link from the BBC carries more weighting than a link from this blog.

Finally, we briefly touched on spam. Google,. (2015) defines spam as:

irrelevant or unsolicited messages sent over the Internet, typically to large numbers of users, for the purposes of advertising, phishing, spreading malware, etc.

My definition of spam is:

Something tricking you into giving it undesired attention.

Whether that be email spam trying to get you to buy shares, Facebook spam trying to get you to click on pointless videos, link spam trying to escalate the importance of fake websites, or ads on websites trying get you to visit their clickbait articles, spam is trying to steal your attention in a dishonest way.


YouTube,. (2015). Explore lists and collections with Google search. Retrieved 29 July 2015, from https://www.youtube.com/watch?v=mg91_trV4hY

YouTube,. (2015). Introducing the Knowledge Graph. Retrieved 29 July 2015, from https://www.youtube.com/watch?v=mmQl6VGvX-c

Google.com,. (2015). cats site:.edu – Google Search. Retrieved 29 July 2015, from https://www.google.com/search?as_st=y&tbm=isch&as_q=cats&as_epq=&as_oq=&as_eq=&cr=&as_sitesearch=.edu&safe=images&tbs=sur:f&gws_rd=ssl

Google.co.nz,. (2015). Google. Retrieved 29 July 2015, from https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=active&q=nz475

Google.co.nz,. (2015). Google. Retrieved 29 July 2015, from https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=active&q=password+filetype:xls+site:.edu

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Retrieved 29 July 2015, from http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

Google.co.nz,. (2015). Google. Retrieved 29 July 2015, from https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=active&q=spam%20definition