Week 2

Welcome to the second week of the unit. Each week there will be a number of required tasks indicated at the top of the page for this week. It is important you attempt these before the seminar and read the material on this week's page.

Read Chapter 1 of Doing Data Science.

Join Teams and introduce yourself using the Blackboard link.

Identify examples of data capture in your interaction with the university.

This week we will start exploring the whole cycle of the data science process. We will be particularly looking at the perspectives of those who see potential in big data approaches to disrupt or at least influence education for the better.

We will focus on emerging examples of how learning analytics and educational data mining are already being used in policy and practice, how these are affecting approaches to research and our practical work will look at how we store data. If you missed last week's seminar on big data then please watch the video on Blackboard announcements.

Cathy O'Neil and Rachel Schutt's book 'Doing Data Science'

When reading this chapter I would like you to think about the following questions:

What is it that is shocking or surprising (eye-brow raising) about Big Data and Data science? (see pages 1-2)
What is the 'Feedback Loop'? (see page 5)
What is 'Datafication'? (see page 5)
Who studies Data Science (see page 15).
What is a 'like'?

Please make a note of your answers as we will be discussing them in the seminar on Thursday.

We are going to discuss the impact of Galton's work that underlies big data approaches. Sir Francis Galton is considered the inventor of the questionnaire or survey: a means of collecting and structuring data from human subjects. We will come back to Galton's work from a different perspective next week.

Below are some videos about some of Galton's key ideas that underpin big data approaches - these are in the context of financial markets, but the ideas are explained clearly. As we will look at citizen science projects this week, we can see here some underpinning principles on the 'crowd'.

Watch: The Wisdom of the Crowd

Watch this video on Galton as the inventor of 'Big Data' and the wisdom of the crowd (this is based on financial training) and consider;

What is crowd data?
What does he mean by 'the wisdom of the crowd'?

Galton is also known for developing regression to the mean, a predictive approach that is used in statistics and analytics. This is a helpful explanation of the thinking behind that approach.

The video below introduces the Galton board and explores regression to the mean.

IFA.com - Francis Galton: Part 2: The Wisdom of the Crowd

What data footprint have you left this week?

Consider the different devices, platforms, services and systems which you use in your everyday life. This can be everything from using a credit card, searching the internet, posting on social media or ordering a taxi. What data footprint have you left this weeK? What does this reveal about you?

How much of this data footprint has been generated in the university?

Building on last week's discussion of personal digital footprints (quickly review week 1 if needed) I would like you to think about how data is generated in your interaction with the university. Please come to the seminar with at least one example of an interaction you have as a student which contributes to a data footprint.

What is the actual 'stuff' of data? We know it represents practices in the world, but how is that physically stored?

In the next few weeks we will explore how the 'grammar' of design directs, enables, limits what it is possible for our data capture tools to express - how the design of an application looks. The nature of the devices for which these applications are intended also directs the design. A design for a mobile interface might be very different to that for a full size monitor or interactive whiteboard.

But we are also limited by what happens at a lower level - lower here means a less abstracted level, closer to the hardware level of data storage. It can help to have some sense of what these lower levels of abstraction are. What is the actual 'stuff' of data? We know it represents practices in the world, but how is that physically stored? Understanding this can help us see how we go about engineering data for use in analytics. Equally important, how is that storage likely to change in the next decade as that might change the possibilities for what data we can capture and use for analytics?

We must remember to consider the issues of sustainability for storage. This post by Neil Selwyn introduces some of the sustainability challenges posed by Ed Tech.

Is learning a science? Should learning be a science? Two short videos to introduce themes we will come back to.