DA U3O1 KK01 |
Techniques for efficient and effective data collection, including methods to collect
[Note: I have edited the KK punctuation to clarify its ambiguous punctuation. Let me know if you read it differently.]
General Data collection techniques
Geographic Information Systems
|
DA U3O1 KK02 |
Factors influencing the integrity of data, including
- accuracy,
- authenticity,
- correctness,
- reasonableness,
- relevance and
- timeliness
Data integrity
Timely?
Dictionary: 1. Done or happening at the appropriate or proper time. 2. Before a time limit expires
|
DA U3O1 KK03 |
Sources of, and methods and techniques for, acquiring authentic data stored in large repositories
|
DA U3O1 KK04 |
Methods for referencing primary and secondary sources, including
- American Psychological Association (APA) referencing system
Primary and secondary data
APA referencing system summary
|
DA U3O1 KK05 |
Characteristics of Data type s
Data types
|
DA U3O1 KK06 |
Methods for documenting a problem, need or opportunity
Problem statements
|
DA U3O1 KK07 |
Methods for determining Solution requirements, constraints and scope Software solution requirements (SRS)
|
DA U3O1 KK08 |
Naming Convention s to support Efficient use of databases, spreadsheets and Data visualisation s Naming conventions
Database structure naming
File Naming |
DA U3O1 KK09 |
A methodology for creating a database structure:
- identifying entities,
- defining tables and fields to represent entities;
- defining relationships by identifying primary key fields and foreign key fields;
- defining Data type
s and field sizes,
- normalisation to third normal form
|
DA U3O1 KK10 |
Design tools for representing databases, spreadsheets and Data visualisation s, including
- data dictionaries,
- tables,
- charts,
- input forms,
- queries and
- reports
Database design tools
Entity relationship diagram
Data dictionary
Mockups
Design Principles |
DA U3O1 KK11 |
Design principle s that influence the functionality and appearance of
- databases,
- spreadsheets and
- Data visualisation
s
Design Principles
Design Factors
Design Elements |
DA U3O1 KK12 |
Functions and techniques to retrieve required information through querying data sets, including searching, sorting and filtering to identify relationships and patterns
|
DA U3O1 KK13 |
Software functions, techniques and procedures to efficiently and effectively
- validate,
- manipulate and
- cleanse data including files, and
applying Formats
and Convention s
Data Validation
Formats and conventions (mainly web) |
DA U3O1 KK14 |
Types and purposes of Data visualisation s
Data visualisations
|
DA U3O1 KK15 |
Formats
and Convention s applied to Data visualisation s to improve their Effectiveness
for intended users, including clarity of message Formats and conventions (mainly web) |
DA U3O1 KK16 |
Methods and techniques for Testing
- databases,
- spreadsheets and
- Data visualisation
s
Testing
User Acceptance Testing
Data visualisations |
DA U3O1 KK17 |
Reasons why organisations acquire data. The Importance of data to organisation
|
DA U3O2 KK01 |
Roles, functions and characteristics of digital system components Hardware & software
Hardware |
DA U3O2 KK02 |
Physical and software security controls used by organisations for protecting stored and communicated data Data Security |
DA U3O2 KK03 |
Primary and secondary data sources and methods of collecting data, including
- interviews,
- observation,
- querying of data stored in large repositories and
- surveys
Primary and secondary data
Data Acquisition
Querying of data repositories
Refers to searching datasets to extract useful information from the masses of data.
May query primary data (original data collected personally by the researchers) or secondary data (data collected by other people, e.g. government departments like CSIRO)
Many providers of secondary data provide an API (application programming interface) to allow outsiders controlled access to their datasets.
E.g. the Australian Bureau of Meteorology lets developers create apps that extract weather forecast data from its databases.
Queries may use SQL or QBE
SQL = Structured Query Language. A standardised format for selecting data from databases. E.g.
SELECT * FROM Book WHERE price > 100.00 ORDER BY title;
QBE = Query By Example. The user types what they want to find in the field they want to find it.

|
DA U3O2 KK04 |
Techniques for
- searching,
- browsing and
- downloading
data sets
|
DA U3O2 KK05 |
Suitability of quantitative and qualitative data for manipulation
Quantitative data - usually numeric. It is fact-based, measures of values or counts - e.g. this printer produces 15 pages per minute
e.g.

From the ABS website, 2022-02-28
Quantitative data can be verified for accuracy.
Qualitative data - descriptive, judgemental, non-stastical, unstructured, opinion-based data expressed in words rather than number - e.g. "I think this page layout is attractive." It can be in the form of text, video, photos, audio recordings, art, music, body language, facial expressions... anything that can't be measured and given a numeric value.
Arbitrary categories are also qualitative, e.g. fashionable vs unfashionable clothes, social class, conservative vs liberal politician.
Whenever possible and appropriate, prefer quantitative data. It's hard to argue with.
Use qualitative data if you have to, especially regarding human feelings that cannot be measured. e.g. "That logo is cute. The other one is yucky - urghh"
Example:
SEX = quantitative - must be male, female, intersex - according to Australian law. It can be scientifically established.
vs
GENDER = qualitative - it depends on an individual's emotions or judgement, and cannot be externally proved or disproved.
The Australian Bureau of Statistics website defines qualitative data as "measures of 'types' and may be represented by a name, symbol, or a number code. Qualitative data are data about categorical variables (e.g. what type)."
(But what would they know?)
Note - neither quantitative nor qualitative data is always better or worse than the other. Their value depends on appropriateness to a particular use. They are often used together to fully describe a situation.
e.g. when researching students, it is found that 32% show elevated stress levels using data about their blood pressure, blood cortisol levels etc. That is valuable quantitative data, but it does not explain the reasons for the stress. That qualitative data - which can vary from person to person - can only be gathered through other means such as observation, interviews, surveys etc.
e.g. When buying a printer, quantitative speed measurements are valuable. When judging the layout of a book cover, qualitative data about people's reactions and emotions is more important.
Quiz: which data types are qualitative and which are quantitative?
- He had 15 years of formal education
- He is well-educated.
- He is smart.
- He went to a good school.
- He was a popular student.
- Of his cohort, he got the top exam result (96%).
- He is tall.
- He weighs 80kg.
Answers: highlight the next blank line to see the qualitative attributes
HIGHLIGHT FROM HERE
- He is well-educated.
- He is smart.
- He went to a good school.
- He was a popular student.
- He weighs 80kg.
TO HERE
Quiz 2 (thanks to https://www.g2.com/articles/qualitative-vs-quantitative-data for the idea)
A bookcase is described as follows. Which descriptors are qualitative and which are quantitative?
- Costs $1500
- Is made of oak
- Has golden knobs
- Is 100cm long
- Was built in Italy
- Has a smooth finish
- Weighs 50kg
- Has 3 shelves
- Is deep brown
HIGHLIGHT FROM HERE to see the qualitative descriptors
- Is made of oak
- Has golden knobs
- Was built in Italy
- Has a smooth finish
- Is deep brown
TO HERE |
DA U3O2 KK06 |
Characteristics of Data type s and Data structure s relevant to selected software tools Data Types
Data Structures - records & files in databases |
DA U3O2 KK07 |
Methods for referencing secondary sources, including the APA referencing system
APA referencing
|
DA U3O2 KK08 |
Criteria to check the integrity of data, including
- accuracy,
- authenticity,
- correctness,
- reasonableness,
- relevance and
- timeliness
Data integrity
Consistency (not examinable, but important)
For data to have integrity, it should be consistent. Inconsistency occurs within a single data source or between data sources when conflicting versions of data appear in different places. Data inconsistency is unwelcome because it means that data have become unreliable. If the ‘true’ value cannot be determined, the entire data source loses integrity and becomes tainted. Any information drawn from it becomes untrustworthy.
Examples:
- a date of birth in one database table does not match an age calculated in another table
- in one survey question people say they’ve never used drugs, but later answers say they used marijuana more than zero times
- data stored in one location (e.g. a local database) does not match corresponding data in another location (e.g. a linked website database).
Above: Excel tries to spot inconsistencies and warn the user
Common cures:
- normalise to third normal form (3NF) to ensure proper synchronisation of table data, and prevent duplicate data. Any data stored in more than one place immediately invites the risk of inconsistency. If employee addresses are stored in the STAFF database and also the CONTACTS database, they could easily become inconsistent if one table is updated and the other is not.
- enforce referential integrity in databases to ensure key fields cannot be deleted without also deleting or modifying related fields. If the ‘Sales’ department is deleted from the ‘Departments’ table, everyone working in ‘Sales’ need to be either deleted or moved to another department, otherwise their ‘Department’ data becomes meaningless.
- ask the same question again in different ways to check the consistency of answers in surveys and questionnaires. This is important for issues about which people tend to lie or bend the truth to avoid embarrassment or boost their egos. Question 7 may ask “Do you gamble?” and question 37 may ask “Last year, how often did you bet on a horse, buy a lottery ticket, or use poker machines?” People often forget they fibbed in an earlier question and accidentally tell the truth later!
- Use electronic checking to detect inconsistent answers. e.g. IF (NumberOfChildren = 0 and AgeOfOldestChild > 0) then Consistency Error!
- Use a single master-copy data source (‘Single Point of Truth’ or ‘SPOT’) instead of multiple data clones that are sluggishly synchronised. If all data was read from a single master copy rather than a copy that is only updated occasionally, the data is guaranteed to be up to date.
- use consistent data formatting and validation rules to enforce consistent data entry. If people can enter a birthdate in either of two places in a database, use the same data format in each location. Don’t expect “YYYY-MM-DD” on one form and “D MMM YY” on the other form.
- use an input mask to force a particular data entry format. (e.g. XXXX-XXXX-XXXX-XXXX for credit card numbers)
- use fuzzy logic to handle trivial differences in data e.g. accept ‘Robert’ if he’s also known as ‘Bob’; treat “St Kilda” the same as “Saint Kilda.”
- standardise data so they are in a consistent, predictable and comparable format
- store family names in all capitals so - when doing comparisons and queries - “Smith” is not overlooked as a match for “smith” or “SMITH”
- strip spaces from phone number data, and the beginnings and ends of all text data.
- when dates are stored as text, they always use the same format
“YYYY-MM-DD” is the international ISO standard format for dates. e.g. currentyear-01-02 means 2 January currentyear. Can you suggest why is there an international standard for date formats?
- use a standardised time setting, such as Greenwich Mean Time, Australian Eastern Summer Time etc.
- use a consistent year numbering scheme:
- Common Era (e.g. ‘the moon landing in 1969’)
- Islamic (‘the moon landing in 1347’)
- Jewish (‘the moon landing in 5729’)
- all currency values are stored in the same units (e.g. $AU, $US, £UK, €)
|
DA U3O2 KK09 |
Techniques for coding qualitative data to support manipulation
|
DA U3O2 KK10 |
Features of a research question, including a statement identifying the research question as an information problem
Yes! It's a webpage, not a slideshow.
Also, you need to see this, especially if you're at Hume Grammar.... I've been watching you guys.
Problem Statements
|
DA U3O2 KK11 |
Functional and non-functional requirements, including
- data to support the research question,
- constraints and
- scope
|
DA U3O2 KK12 |
Types and purposes of
- Infographics
and
- dynamic Data visualisation
s
Infographics
|
DA U3O2 KK13 |
Design principle s that influence
- the appearance of Infographics
and
- the functionality and appearance of dynamic Data visualisation
s
Design Principles
Infographics |
DA U3O2 KK14 |
Design tools for representing the appearance and functionality of Infographics
and dynamic Data visualisation s, including data manipulation and Validation , where appropriate
Design-Elements-Web
design-factors
design-mockups
Design-Tools-Website
Design-Tools
Design Principles
Infographics
|
DA U3O2 KK15 |
Techniques for generating alternative design ideas Design Ideas
|
DA U3O2 KK16 |
Criteria for evaluating alternative design ideas and the Efficiency and Effectiveness
of Infographics or dynamic Data visualisation s Infographics
|
DA U3O2 KK17 |
Features of Project management using Gantt charts, including
- the identification and sequencing of tasks,
- time allocation,
- dependencies,
- milestones and
- the critical path
Gantt charts
Gantt and PERT charts (PERT is not examinable in Data Analytics, but really nice to know)
Project management |
DA U3O2 KK18 |
Key Legal requirements for the storage and communication of data and information, including
- human rights requirements,
- intellectual property and
- privacy.
Human Rights and Spam legislation
Privacy Act
Copyright Act |
DA U4O1 KK01 |
Procedures and techniques for handling and Managing files , including
- archiving,
- backing up,
- disposing of files and
- security
File management
Backups |
DA U4O1 KK02 |
The functional capabilities of software to create Infographics and dynamic Data visualisation s Infographics |
DA U4O1 KK03 |
Characteristics of information for educating targeted audiences, including
- age appropriateness,
- commonality of language,
- culture inclusiveness and
- gender
Inclusiveness
|
DA U4O1 KK04 |
Characteristics of efficient and effective
- Infographics
and
- dynamic Data visualisation
s
Infographics
|
DA U4O1 KK05 |
Functions, techniques and procedures for efficiently and effectively manipulating data using software tools
Welp! This KK is a three-year course in itself. It includes nearly everything you consider "IT"
So, let's try to break this down into manageable chunks:
Manipulating data = basically what any software does. If software doesn't manipulate data, it's not software.
Data manipulation = using data to create new data (e.g. create a graph from a table of numbers)
Data modification = changing data to a new value and replacing the original data (e.g. replacing Excel cell E3 with a new value)
How can data be manipulated?
Sorted, e.g. by size, name, age,
Categorised - e.g. by media type, musical genre
Summarised - statistically: to find averages, trends, maximum & minimum values
Converted - to different forms, e.g. from numeric data to a visual graph, from digital data to sound or video. Converting a spreadsheet into a database. Converting a slideshop into a webpage. Reformatting data.
Compressed - e.g. with JPG / GIF / TIFF (images), MP3 / FLAC (audio), RAR / ZIP (files) ,
Encrypted - PGP, WEP / WPA (wifi), HTTPS, SSL / TLS (web pages)
Deleted, merged, inserted
Common data manipulation software tools
Spreadsheet - a flat-file database (though VLOOKUP gives some valuable relational ability)
Relational database - with two or more related tables
Specialist tools - e.g. video editors, web browsers, file managers, text editors. They are very limited in the data they can accept and the manipulation they can do, but they are vital for specific tasks.
SPREADSHEETS
If you only spent an hour in a spreadsheet class, you would have been told about:
- the structure of spreadsheets: sheets, rows, columns, cells, formulas, values
- basic functions - sum, average, maximum/minimum
DATABASES
If you only spent three hours in a database class, you would have been told about:
- the structure of databases: tables, records, fields
- relationships between tables
- queries, reports
- basic functions - sum, average, maximum/minimum
More to come, eventually
2022-04-25 @ 1:20 PM
|
DA U4O1 KK06 |
Techniques for creating
- Infographics
and
- dynamic Data visualisation
s
Infographics |
DA U4O1 KK07 |
Techniques for Validating and Verifying data Data Validation
|
DA U4O1 KK08 |
Techniques for Testing that Solution s perform as intended Testing
Test data
Trace Tables, Desk Checking
|
DA U4O1 KK09 |
Techniques for recording the progress of projects, including
- adjustments to tasks and timeframes,
- annotations and
- logs
|
DA U4O1 KK10 |
Strategies for
- evaluating the Effectiveness
of Infographics and dynamic Data visualisation s Solution s and
- assessing project plans.
Infographics |
DA U4O2 KK01 |
Characteristics of wired, wireless and mobile networks Networks - wired, wireless, mobile
Network hardware |
DA U4O2 KK02 |
Types and causes of accidental, deliberate and events-based threats to the integrity and security of data and information used by organisations Threats to data and information
|
DA U4O2 KK03 |
Physical and software security controls for
- preventing unauthorised access to data and information and for
- minimising the loss of data accessed by authorised and unauthorised users
Data security
Penetration Testing, White Hat / Ethical hacking |
DA U4O2 KK04 |
The role of hardware, software and technical protocols in managing, controlling and securing data in Information system s
|
DA U4O2 KK05 |
The advantages and disadvantages of using network attached storage and cloud computing for storing, communicating and disposing of data and information Cloud computing |
DA U4O2 KK06 |
Characteristics of data that has integrity, including
- accuracy,
- authenticity,
- correctness,
- reasonableness,
- relevance and
- timeliness
Data integrity
|
DA U4O2 KK07 |
The importance of data and information to organisations Importance of data and information
|
DA U4O2 KK08 |
The importance of data and information security strategies to organisations Data security
|
DA U4O2 KK09 |
The impact of diminished data integrity in Information system s Data integrity
|
DA U4O2 KK10 |
Key legislation that affects how organisations control the collection, storage, communication and disposal of their data and information:
- the Health Records Act 2001,
- the Privacy Act 1988 and
- the Privacy and Data Protection Act 2014
Privacy Act
|
DA U4O2 KK11 |
Ethical issues arising from data and information security practices Ethical dilemmas
|
DA U4O2 KK12 |
Strategies for resolving legal and ethical issues between stakeholders arising from information security practices Ethical dilemmas |
DA U4O2 KK13 |
Reasons to prepare for disaster and the scope of disaster recovery plans, including
- backing up,
- evacuation,
- restoration and
- test plans
Data Disaster Recovery Plans
Data Backups
Data Security
Threats to Data Security |
DA U4O2 KK14 |
Possible consequences for organisations that fail or violate security measures
A summary from the Security-Threats PPT:
- Loss of valuable data that can't be replaced at all, or only with huge effort and cost
- Competitors finding out your secrets
- Damage to or loss of expensive equipment
- Financial loss through misuse of credit cards or bank accounts
- Unwitting participation in illegal actions such as spamming or DDOS attacks
- Damage to of reputation through negligently letting customer information go public
- Penalties by the tax office for not having proper GST or tax records
- Prosecution under the Privacy Act if sensitive information is not properly protected.
- Loss of income when unable to do business due to system failure
- Total failure of the organisation after catastrophic data loss
- Organisational death.
|
DA U4O2 KK15 |
Criteria for evaluating the effectiveness
of data and information security strategies.
Point 1 - Effectiveness relates to how well the job is done, regardless of time, money or effort is expended. Efficiency refers to speed, cost and labour. Everything else is effectiveness.
Point 2 - Criteria are the relevant factors used to measure the value of something. e.g. the criteria of a cat are: fluffiness, purriness, jumpiness. Not how well they guard the house, or lead the blind..
Point 3 - Data: the raw, unprocessed numbers gained from a sensor, survey, census, or other data-gathering system. Data have potential data, but are unorganised, bulky, and their meaning is unclear,
Point 4 - Information: is processed from data to create meaningful and understandable knowledge. e.g. A national census may contain the incomes of 15 million people (which is data) but only when those 15 million incomes are added up or averaged will useful information be derived from them.
e.g.:
- DATA = crude oil. Unprocessed, may be useful.
- INFORMATION = petrol. Processed. Useful.
So - back to the Key Knowledge.
How does one evaluate how well different methods protect the security of data and information?
We're not worried about what the different methods are. They might involve armed guards, virus scanners, ceiling-mounted laser cannons. It matters not.
What we're worried about is what rules we use to judge the quality of the different methods.
Now it gets easier.
What criteria would you use to judge to evaluate the effectiveness of a CAR? speed, reliability, running costs, safety, seating capacity, cup holders.
What criteria would you use to judge to evaluate the effectiveness of a DOG? woofiness, friendliness, size. OK. I don't understand dogs.
In case you're getting bored, here's a picture of a dog.

You use very different criteria when judging cats and dogs.
And you use different vriteria for evaluating the effectiveness of data and information security strategies.
CRITERIA:
- Speed of response to the threat. Nope! That is an efficiency criterion, not effectiveness as the KK asks. Same applies to the cost, or amount of labour required. Be careful!
- Accuracy. How many false negatives (real attacks that were not detected) or false alarms (harmless events that were wrongly judged as being harmful) were recorded?
- Usefulness. If a strategy gave useful information about the source of the threat, the type or nature of the threat, how to counteract the threat etc, it would be a jolly good strategy. It would give quality information.
- Recoverability - yes, it's an ugly word (and not examinable) but a quality data security strategy would offer a quick and easy way to get back lost or damaged data quickly and accurately. You could also call it resilence - the ability to get back to normal after a setback.
- Reliability- can the strategy be trusted? Is it dodgy?
- Comprehensiveness- Does the strategy detect all relevant threats? A security scanner that completely ignores trojan horses or rootkits, for example, may leave a system open to exploitation..
- Basically - ANY means of judgement that does NOT involve speed, cost, or labour.
|