This week we will start exploring the whole cycle of the data science process. We will be particularly looking at the perspectives of those who see potential in big data approaches to disrupt or at least influence education for the better.
We will focus on emerging examples of how learning analytics and educational data mining are already being used in policy and practice, how these are affecting approaches to research and our practical work will look at how we store data. If you missed last week's seminar on big data then please watch the video on Blackboard announcements.
When reading this chapter I would like you to think about the following questions:
We are going to discuss the impact of Galton's work that underlies big data approaches. Sir Francis Galton is considered the inventor of the questionnaire or survey: a means of collecting and structuring data from human subjects. We will come back to Galton's work from a different perspective next week.
Below are some videos about some of Galton's key ideas that underpin big data approaches - these are in the context of financial markets, but the ideas are explained clearly. As we will look at citizen science projects this week, we can see here some underpinning principles on the 'crowd'.
Watch: The Wisdom of the Crowd
Watch this video on Galton as the inventor of 'Big Data' and the wisdom of the crowd (this is based on financial training) and consider;
Galton is also known for developing regression to the mean, a predictive approach that is used in statistics and analytics. This is a helpful explanation of the thinking behind that approach.
The video below introduces the Galton board and explores regression to the mean.
In the next few weeks we will explore how the 'grammar' of design directs, enables, limits what it is possible for our data capture tools to express - how the design of an application looks. The nature of the devices for which these applications are intended also directs the design. A design for a mobile interface might be very different to that for a full size monitor or interactive whiteboard.
But we are also limited by what happens at a lower level - lower here means a less abstracted level, closer to the hardware level of data storage. It can help to have some sense of what these lower levels of abstraction are. What is the actual 'stuff' of data? We know it represents practices in the world, but how is that physically stored? Understanding this can help us see how we go about engineering data for use in analytics. Equally important, how is that storage likely to change in the next decade as that might change the possibilities for what data we can capture and use for analytics?
We must remember to consider the issues of sustainability for storage. This post by Neil Selwyn introduces some of the sustainability challenges posed by Ed Tech.