data analytics units 3 and 4

VCE Data Analytics units 3 and 4

KK = Key Knowledge. All KK are examinable (as are all terms in the glossary that are relevant to this course)
If a KK uses the word "including", all of the following items are examinable.
If a KK uses the words "for example" or "such as", the following items are for clarity, and are not directly examinable.
Any bullet points shown below have been inserted by me for clarity, and are not in the original VCAA study design. (Which is available locally here.)
Hover your mouse of a red asterisk to see the official VCAA glossary definition of a term.

vcedata.com slideshow links to a (big) slideshow if the icon has a label beside it

links to a webpage on this site.


Check out my the postmortem of VCAA's 2020 sample DA exam with lots of exam tips.

DA U3O1 KK01

Techniques for efficient and effective data collection, including methods to collect

[Note: I have edited the KK punctuation to clarify its ambiguous punctuation. Let me know if you read it differently.]

vcedata.com slideshow General Data collection techniques

Geographic Information Systems

 

DA U3O1 KK02

Factors influencing the integrity of data, including

  • accuracy,
  • authenticity,
  • correctness,
  • reasonableness,
  • relevance and
  • timeliness

vcedata.com slideshow Data integrity

Timely?

Dictionary: 1. Done or happening at the appropriate or proper time. 2. Before a time limit expires

 

DA U3O1 KK03 Sources of, and methods and techniques for, acquiring authentic data stored in large repositories

vcedata.com slideshow

 

DA U3O1 KK04

Methods for referencing primary and secondary sources, including

  • American Psychological Association (APA) referencing system

vcedata.com slideshow Primary and secondary data

APA referencing system summary

 

DA U3O1 KK05

Characteristics of Data typeglossary links

slideshow Data types

 

DA U3O1 KK06

Methods for documenting a problem, need or opportunity

vcedata.com slideshow Problem statements

 

DA U3O1 KK07 Methods for determining Solutionglossary link requirements, constraints and scope

vcedata.com slideshow Software solution requirements (SRS)

 

DA U3O1 KK08 Naming Conventionglossary links to support Efficient use of databases, spreadsheets and Data visualisations

vcedata.com slideshow Naming conventions

vcedata.com slideshow Database structure naming

vcedata.com slideshow File Naming

 

DA U3O1 KK09

A methodology for creating a database structure:

  • identifying entities,
  • defining tables and fields to represent entities;
  • defining relationships by identifying primary key fields and foreign key fields;
  • defining Data typeglossary links and field sizes,
  • normalisation to third normal form

vcedata.com slideshow

 

DA U3O1 KK10

Design tools for representing databases, spreadsheets and Data visualisationglossary links, including

  • data dictionaries,
  • tables,
  • charts,
  • input forms,
  • queries and
  • reports

vcedata.com slideshow Database design tools

vcedata.com slideshow Entity relationship diagram

vcedata.com slideshow Data dictionary

vcedata.com slideshow Mockups

Design Principles

 

DA U3O1 KK11

Design principleglossary links that influence the functionality and appearance of

  • databases,
  • spreadsheets and
  • Data visualisationglossary links

Design Principles

vcedata.com slideshow Design Factors

vcedata.com slideshow Design Elements

 

DA U3O1 KK12 Functions and techniques to retrieve required information through querying data sets, including searching, sorting and filtering to identify relationships and patterns

vcedata.com slideshow

 

DA U3O1 KK13

Software functions, techniques and procedures to efficiently and effectively

  • validate,
  • manipulate and
  • cleanse data including files, and

applying Formatsglossary link and Conventionglossary links

Data Validation

vcedata.com slideshow Formats and conventions (mainly web)

 

DA U3O1 KK14 Types and purposes of Data visualisationglossary links

Data visualisations

 

DA U3O1 KK15 Formatsglossary link and Conventionglossary links applied to Data visualisationglossary links to improve their Effectiveness for intended users, including clarity of message

vcedata.com slideshow Formats and conventions (mainly web)

 

DA U3O1 KK16

Methods and techniques for Testingglossary link

  • databases,
  • spreadsheets and
  • Data visualisationglossary links

vcedata.com slideshow Testing

vcedata.com slideshow User Acceptance Testing

Data visualisations

 

DA U3O1 KK17 Reasons why organisations acquire data.

vcedata.com slideshow The Importance of data to organisation

 

DA U3O2 KK01 Roles, functions and characteristics of digital systemglossary link components

vcedata.com slideshow Hardware & software

vcedata.com slideshow Hardware

 

DA U3O2 KK02 Physicalglossary link and software security controls used by organisations for protecting stored and communicated data

vcedata.com slideshow Data Security

 

DA U3O2 KK03

Primary and secondary data sources and methods of collecting data, including

  • interviews,
  • observation,
  • querying of data stored in large repositories and
  • surveys

vcedata.com slideshow Primary and secondary data

Data Acquisition

 


Querying of data repositories

Refers to searching datasets to extract useful information from the masses of data.
May query primary data (original data collected personally by the researchers) or secondary data (data collected by other people, e.g. government departments like CSIRO)
Many providers of secondary data provide an API (application programming interface) to allow outsiders controlled access to their datasets.
E.g. the Australian Bureau of Meteorology lets developers create apps that extract weather forecast data from its databases.

Queries may use SQL or QBE

SQL = Structured Query Language. A standardised format for selecting data from databases. E.g.

SELECT * FROM Book WHERE price > 100.00 ORDER BY title;

QBE = Query By Example. The user types what they want to find in the field they want to find it.

 

 

DA U3O2 KK04

Techniques for

  • searching,
  • browsing and
  • downloading

data sets

vcedata.com slideshow

 

DA U3O2 KK05 Suitability of quantitative and qualitative data for manipulation

vcedata.com slideshow

Quantitative data - usually numeric. It is fact-based, measures of values or counts - e.g. this printer produces 15 pages per minute

e.g.

From the ABS website, 2022-02-28

Quantitative data can be verified for accuracy.

Qualitative data - descriptive, judgemental, non-stastical, unstructured, opinion-based data expressed in words rather than number - e.g. "I think this page layout is attractive." It can be in the form of text, video, photos, audio recordings, art, music, body language, facial expressions... anything that can't be measured and given a numeric value.

Arbitrary categories are also qualitative, e.g. fashionable vs unfashionable clothes, social class, conservative vs liberal politician.

Whenever possible and appropriate, prefer quantitative data. It's hard to argue with.

Use qualitative data if you have to, especially regarding human feelings that cannot be measured. e.g. "That logo is cute. The other one is yucky - urghh"

Example:

SEX = quantitative - must be male, female, intersex - according to Australian law. It can be scientifically established.

vs

GENDER = qualitative -  it depends on an individual's emotions or judgement, and cannot be externally proved or disproved.

The Australian Bureau of Statistics website defines qualitative data as "measures of 'types' and may be represented by a name, symbol, or a number code. Qualitative data are data about categorical variables (e.g. what type)."

(But what would they know?)

Note - neither quantitative nor qualitative data is always better or worse than the other. Their value depends on appropriateness to a particular use. They are often used together to fully describe a situation.

e.g. when researching students, it is found that 32% show elevated stress levels using data about their blood pressure, blood cortisol levels etc. That is valuable quantitative data, but it does not explain the reasons for the stress. That qualitative data - which can vary from person to person - can only be gathered through other means such as observation, interviews, surveys etc.

e.g. When buying a printer, quantitative speed measurements are valuable. When judging the layout of a book cover, qualitative data about people's reactions and emotions is more important.

Quiz: which data types are qualitative and which are quantitative?

  • He had 15 years of formal education
  • He is well-educated.
  • He is smart.
  • He went to a good school.
  • He was a popular student.
  • Of his cohort, he got the top exam result (96%).
  • He is tall.
  • He weighs 80kg.

Answers: highlight the next blank line to see the qualitative attributes

HIGHLIGHT FROM HERE

  • He is well-educated.
  • He is smart.
  • He went to a good school.
  • He was a popular student.
  • He weighs 80kg.
TO HERE

Quiz 2 (thanks to https://www.g2.com/articles/qualitative-vs-quantitative-data for the idea)

A bookcase is described as follows. Which descriptors are qualitative and which are quantitative?

  • Costs $1500
  • Is made of oak
  • Has golden knobs
  • Is 100cm long
  • Was built in Italy
  • Has a smooth finish
  • Weighs 50kg
  • Has 3 shelves
  • Is deep brown

HIGHLIGHT FROM HERE to see the qualitative descriptors

  • Is made of oak
  • Has golden knobs
  • Was built in Italy
  • Has a smooth finish
  • Is deep brown

TO HERE

DA U3O2 KK06 Characteristics of Data typeglossary links and Data structureglossary links relevant to selected software tools

vcedata.com slideshow Data Types

vcedata.com slideshow Data Structures - records & files in databases

 

DA U3O2 KK07 Methods for referencing secondary sources, including the APA referencing system

APA referencing

DA U3O2 KK08

Criteria to check the integrity of data, including

  • accuracy,
  • authenticity,
  • correctness,
  • reasonableness,
  • relevance and
  • timeliness

vcedata.com slideshow Data integrity

Consistency (not examinable, but important)

For data to have integrity, it should be consistent. Inconsistency occurs within a single data source or between data sources when conflicting versions of data appear in different places. Data inconsistency is unwelcome because it means that data have become unreliable. If the ‘true’ value cannot be determined, the entire data source loses integrity and becomes tainted. Any information drawn from it becomes untrustworthy.

Examples:

  • a date of birth in one database table does not match an age calculated in another table
  • in one survey question people say they’ve never used drugs, but later answers say they used marijuana more than zero times
  • data stored in one location (e.g. a local database) does not match corresponding data in another location (e.g. a linked website database).


Above: Excel tries to spot inconsistencies and warn the user

Common cures:

  • normalise to third normal form (3NF) to ensure proper synchronisation of table data, and prevent duplicate data. Any data stored in more than one place immediately invites the risk of inconsistency. If employee addresses are stored in the STAFF database and also the CONTACTS database, they could easily become inconsistent if one table is updated and the other is not.
  • enforce referential integrity in databases to ensure key fields cannot be deleted without also deleting or modifying related fields. If the ‘Sales’ department is deleted from the ‘Departments’ table, everyone working in ‘Sales’ need to be either deleted or moved to another department, otherwise their ‘Department’ data becomes meaningless.
  • ask the same question again in different ways to check the consistency of answers in surveys and questionnaires. This is important for issues about which people tend to lie or bend the truth to avoid embarrassment or boost their egos. Question 7 may ask “Do you gamble?” and question 37 may ask “Last year, how often did you bet on a horse, buy a lottery ticket, or use poker machines?” People often forget they fibbed in an earlier question and accidentally tell the truth later!
  • Use electronic checking to detect inconsistent answers. e.g. IF (NumberOfChildren = 0 and AgeOfOldestChild > 0) then Consistency Error!
  • Use a single master-copy data source (‘Single Point of Truth’ or ‘SPOT’) instead of multiple data clones that are sluggishly synchronised. If all data was read from a single master copy rather than a copy that is only updated occasionally, the data is guaranteed to be up to date.
  • use consistent data formatting and validation rules to enforce consistent data entry. If people can enter a birthdate in either of two places in a database, use the same data format in each location. Don’t expect “YYYY-MM-DD” on one form and “D MMM YY” on the other form.
  • use an input mask to force a particular data entry format. (e.g. XXXX-XXXX-XXXX-XXXX for credit card numbers)
  • use fuzzy logic to handle trivial differences in data e.g. accept ‘Robert’ if he’s also known as ‘Bob’; treat “St Kilda” the same as “Saint Kilda.”
  • standardise data so they are in a consistent, predictable and comparable format
    • store family names in all capitals so - when doing comparisons and queries - “Smith” is not overlooked as a match for “smith” or “SMITH”
    • strip spaces from phone number data, and the beginnings and ends of all text data.
    • when dates are stored as text, they always use the same format

Thinking Time“YYYY-MM-DD” is the international ISO standard format for dates. e.g. currentyear-01-02 means 2 January currentyear. Can you suggest why is there an international standard for date formats?

    • use a standardised time setting, such as Greenwich Mean Time, Australian Eastern Summer Time etc.
    • use a consistent year numbering scheme:
      • Common Era (e.g. ‘the moon landing in 1969’)
      • Islamic (‘the moon landing in 1347’)
      • Jewish (‘the moon landing in 5729’)
    • all currency values are stored in the same units (e.g. $AU, $US, £UK, €)
DA U3O2 KK09 Techniques for coding qualitative data to support manipulation

vcedata.com slideshow

 

DA U3O2 KK10

Features of a research question, including a statement identifying the research question as an information problem

Yes! It's a webpage, not a slideshow.

Also, you need to see this, especially if you're at Hume Grammar.... I've been watching you guys.

vcedata.com slideshow Problem Statements

 

DA U3O2 KK11

Functional and non-functional requirements, including

  • data to support the research question,
  • constraints and
  • scope

vcedata.com slideshow

 

DA U3O2 KK12

Types and purposes of

  • Infographicsglossary link and
  • dynamic Data visualisationglossary links

Infographics

 

DA U3O2 KK13

Design principleglossary links that influence

  • the appearance of Infographicsglossary link and
  • the functionality and appearance of dynamic Data visualisationglossary links

Design Principles

Infographics

 

DA U3O2 KK14

Design tools for representing the appearance and functionality of Infographicsglossary link and dynamic Data visualisationglossary links, including data manipulation and Validationglossary link, where appropriate

vcedata.com slideshow Design-Elements-Web

vcedata.com slideshow design-factors

vcedata.com slideshow design-mockups

vcedata.com slideshow Design-Tools-Website

vcedata.com slideshow Design-Tools

Design Principles

Infographics

 

DA U3O2 KK15 Techniques for generating alternative design ideas

vcedata.com slideshow Design Ideas

 

DA U3O2 KK16 Criteria for evaluating alternative design ideas and the Efficiencyglossary link and Effectiveness of Infographicsglossary link or dynamic Data visualisationglossary links

Infographics

 

DA U3O2 KK17

Features of Project managementglossary link using Gantt charts, including

  • the identification and sequencing of tasks,
  • time allocation,
  • dependencies,
  • milestones and
  • the critical path

vcedata.com slideshow Gantt charts

vcedata.com slideshow Gantt and PERT charts (PERT is not examinable in Data Analytics, but really nice to know)

vcedata.com slideshow Project management

 

DA U3O2 KK18

Key Legal requirementsglossary link for the storage and communication of data and information, including

  • human rights requirements,
  • intellectual property and
  • privacy.

vcedata.com slideshow Human Rights and Spam legislation

vcedata.com slideshow Privacy Act

vcedata.com slideshow Copyright Act

 

DA U4O1 KK01

Procedures and techniques for handling and Managing filesglossary link, including

  • archiving,
  • backing up,
  • disposing of files and
  • security

vcedata.com slideshow File management

vcedata.com slideshow Backups

 

DA U4O1 KK02 The functional capabilities of software to create Infographicsglossary link and dynamic Data visualisationglossary links

Infographics

 

DA U4O1 KK03

Characteristics of information for educating targeted audiences, including

  • age appropriateness,
  • commonality of language,
  • culture inclusiveness and
  • gender

Inclusiveness

 

DA U4O1 KK04

Characteristics of efficient and effective

  • Infographicsglossary link and
  • dynamic Data visualisationglossary links

Infographics

 

DA U4O1 KK05 Functions, techniques and procedures for efficiently and effectively manipulating data using software tools

Welp! This KK is a three-year course in itself. It includes nearly everything you consider "IT"

So, let's try to break this down into manageable chunks:

Manipulating data = basically what any software does. If software doesn't manipulate data, it's not software.

Data manipulation = using data to create new data (e.g. create a graph from a table of numbers)

Data modification = changing data to a new value and replacing the original data (e.g. replacing Excel cell E3 with a new value)

How can data be manipulated?

Sorted, e.g. by size, name, age,

Categorised - e.g. by media type, musical genre

Summarised - statistically: to find averages, trends, maximum & minimum values

Converted - to different forms, e.g. from numeric data to a visual graph, from digital data to sound or video. Converting a spreadsheet into a database. Converting a slideshop into a webpage. Reformatting data.

Compressed - e.g. with JPG / GIF / TIFF (images), MP3 / FLAC (audio), RAR / ZIP (files) ,

Encrypted - PGP, WEP / WPA (wifi), HTTPS, SSL / TLS (web pages)

Deleted, merged, inserted


Common data manipulation software tools

Spreadsheet - a flat-file database (though VLOOKUP gives some valuable relational ability)

Relational database - with two or more related tables

Specialist tools - e.g. video editors, web browsers, file managers, text editors. They are very limited in the data they can accept and the manipulation they can do, but they are vital for specific tasks.


SPREADSHEETS

If you only spent an hour in a spreadsheet class, you would have been told about:

- the structure of spreadsheets: sheets, rows, columns, cells, formulas, values

- basic functions - sum, average, maximum/minimum

 


DATABASES

If you only spent three hours in a database class, you would have been told about:

- the structure of databases: tables, records, fields

- relationships between tables

- queries, reports

- basic functions - sum, average, maximum/minimum

 

More to come, eventually

2022-04-25 @ 1:20 PM

 

DA U4O1 KK06

Techniques for creating

  • Infographicsglossary link and
  • dynamic Data visualisationglossary links

Infographics

 

DA U4O1 KK07 Techniques for Validatingglossary link and Verifyingglossary link data

Data Validation

 

DA U4O1 KK08 Techniques for Testingglossary link that Solutionglossary links perform as intended

vcedata.com slideshow Testing

vcedata.com slideshow Test data

vcedata.com slideshow Trace Tables, Desk Checking

 

DA U4O1 KK09

Techniques for recording the progress of projects, including

  • adjustments to tasks and timeframes,
  • annotations and
  • logs

vcedata.com slideshow

 

DA U4O1 KK10

Strategies for

  • evaluating the Effectiveness of Infographicsglossary link and dynamic Data visualisationglossary links Solutionglossary links and
  • assessing project plans.

Infographics

 

DA U4O2 KK01 Characteristics of wired, wireless and mobile networks

vcedata.com slideshow Networks - wired, wireless, mobile

vcedata.com slideshow Network hardware

 

DA U4O2 KK02 Types and causes of accidental, deliberate and events-based threats to the integrity and security of data and information used by organisations

vcedata.com slideshow Threats to data and information

 

DA U4O2 KK03

Physical and software security controls for

  • preventing unauthorised access to data and information and for
  • minimising the loss of data accessed by authorised and unauthorised users

vcedata.com slideshow Data security

Penetration Testing, White Hat / Ethical hacking

 

DA U4O2 KK04 The role of hardware, software and technical protocols in managing, controlling and securing data in Information systemglossary links

vcedata.com slideshow

 

DA U4O2 KK05 The advantages and disadvantages of using network attached storage and cloud computing for storing, communicating and disposing of data and information

vcedata.com slideshow Cloud computing

 

DA U4O2 KK06

Characteristics of data that has integrity, including

  • accuracy,
  • authenticity,
  • correctness,
  • reasonableness,
  • relevance and
  • timeliness

vcedata.com slideshow Data integrity

 

DA U4O2 KK07 The importance of data and information to organisations

vcedata.com slideshow Importance of data and information

 

DA U4O2 KK08 The importance of data and information security strategies to organisations

vcedata.com slideshow Data security

 

DA U4O2 KK09 The impact of diminished data integrity in Information systemglossary links

vcedata.com slideshow Data integrity

 

DA U4O2 KK10

Key legislation that affects how organisations control the collection, storage, communication and disposal of their data and information:

  • the Health Records Act 2001,
  • the Privacy Act 1988 and
  • the Privacy and Data Protection Act 2014

vcedata.com slideshow Privacy Act

 

DA U4O2 KK11 Ethical issues arising from data and information security practices

vcedata.com slideshow Ethical dilemmas

 

DA U4O2 KK12 Strategies for resolving legal and ethical issues between stakeholders arising from information security practices

vcedata.com slideshow Ethical dilemmas

 

DA U4O2 KK13

Reasons to prepare for disaster and the scope of disaster recovery plans, including

  • backing up,
  • evacuation,
  • restoration and
  • test plans

vcedata.com slideshow Data Disaster Recovery Plans

vcedata.com slideshow Data Backups

vcedata.com slideshow Data Security

vcedata.com slideshow Threats to Data Security

 

DA U4O2 KK14 Possible consequences for organisations that fail or violate security measures

vcedata.com slideshow

A summary from the Security-Threats PPT:

  • Loss of valuable data that can't be replaced at all, or only with huge effort and cost
  • Competitors finding out your secrets
  • Damage to or loss of expensive equipment
  • Financial loss through misuse of credit cards or bank accounts
  • Unwitting participation in illegal actions such as spamming or DDOS attacks
  • Damage to of reputation through negligently letting customer information go public
  • Penalties by the tax office for not having proper GST or tax records
  • Prosecution under the Privacy Act if sensitive information is not properly protected.
  • Loss of income when unable to do business due to system failure
  • Total failure of the organisation after catastrophic data loss
  • Organisational death.

 

DA U4O2 KK15

Criteria for evaluating the effectiveness of data and information security strategies.

Point 1 - Effectiveness relates to how well the job is done, regardless of time, money or effort is expended. Efficiency refers to speed, cost and labour. Everything else is effectiveness.

Point 2 - Criteria are the relevant factors used to measure the value of something. e.g. the criteria of a cat are: fluffiness, purriness, jumpiness. Not how well they guard the house, or lead the blind..

Point 3 - Data: the raw, unprocessed numbers gained from a sensor, survey, census, or other data-gathering system. Data have potential data, but are unorganised, bulky, and their meaning is unclear,

Point 4 - Information: is processed from data to create meaningful and understandable knowledge. e.g. A national census may contain the incomes of 15 million people (which is data) but only when those 15 million incomes are added up or averaged will useful information be derived from them.

e.g.:

  • DATA = crude oil. Unprocessed, may be useful.
  • INFORMATION = petrol. Processed. Useful.

So - back to the Key Knowledge.

How does one evaluate how well different methods protect the security of data and information?

We're not worried about what the different methods are. They might involve armed guards, virus scanners, ceiling-mounted laser cannons. It matters not.

What we're worried about is what rules we use to judge the quality of the different methods.

Now it gets easier.

What criteria would you use to judge to evaluate the effectiveness of a CAR? speed, reliability, running costs, safety, seating capacity, cup holders.

What criteria would you use to judge to evaluate the effectiveness of a DOG? woofiness, friendliness, size. OK. I don't understand dogs.

In case you're getting bored, here's a picture of a dog.

You use very different criteria when judging cats and dogs.

And you use different vriteria for evaluating the effectiveness of data and information security strategies.

CRITERIA:

- Speed of response to the threat. Nope! That is an efficiency criterion, not effectiveness as the KK asks. Same applies to the cost, or amount of labour required. Be careful!

- Accuracy. How many false negatives (real attacks that were not detected) or false alarms (harmless events that were wrongly judged as being harmful) were recorded?

- Usefulness. If a strategy gave useful information about the source of the threat, the type or nature of the threat, how to counteract the threat etc, it would be a jolly good strategy. It would give quality information.

- Recoverability - yes, it's an ugly word (and not examinable) but a quality data security strategy would offer a quick and easy way to get back lost or damaged data quickly and accurately. You could also call it resilence - the ability to get back to normal after a setback.

- Reliability- can the strategy be trusted? Is it dodgy?

- Comprehensiveness- Does the strategy detect all relevant threats? A security scanner that completely ignores trojan horses or rootkits, for example, may leave a system open to exploitation..

- Basically - ANY means of judgement that does NOT involve speed, cost, or labour.

 

 

 

Go back to wherever you were before this page

All original content copyright © vcedata.com
All rights reserved.

This page was created on 2022-01-17
Last modified on Saturday 29 October, 2022 15:23