VCE Data Analytics units 3 and 4 |
KK = Key Knowledge. All KK are examinable (as are all terms in the glossary that are relevant to this course) links to a (big) slideshow if the icon has a label beside it links to a webpage on this site. Check out my the (incomplete) postmortem of VCAA's 2020 sample DA exam with lots of exam tips. |
DA U3O1 KK01 |
Techniques for efficient and effective data collection, including methods to collect
[Note: I have edited the KK punctuation to clarify its ambiguous punctuation. Let me know if you read it differently.] General Data collection techniques Geographic Information Systems
|
DA U3O1 KK02 | Factors influencing the integrity of data, including
Timely? Dictionary: 1. Done or happening at the appropriate or proper time. 2. Before a time limit expires
|
DA U3O1 KK03 | Sources of, and methods and techniques for, acquiring authentic data stored in large repositories "Authentic" data refers to facts that have not been corrupted since their creation. Corruption tends to happen during processing and editing. e.g. an inappropriate choice of 'average' may be chosen. e.g. A country ruled by a dictator has a population of a million people. 99% of them earn nearly no money. 1% are billionaires. The dictator proudly delares that his subjects earn an average (mean) of $100,000 a year. But the mode - the most common value - is nearly zero. Corruption may be accidental, e.g. valuable data is deleted or mistyped. Corruption may be deliberate, e.g. "cherry-picked" to only present the data that supports the thesis of the publishers. e.g. "The twentieth century was a peaceful time" - leaving out any mention of world wars 1 and 2, Korea, Vietnam, Pol Pot, etc. Authentic data must accurately represent the true state of the real world. And this is NOT easy. Rarely, if ever, will data include 100% information from the entire population. Even the compulsory 2021 Australian census was only completed by 96% of the population. Most data collections are samples of the full population, e.g. 10% of the total number of relevant people. Samples may be unrepresentative of the population, e.g interviewing people walking down the street at 10am on a weekday about health care could artificially select a large number of unemployed or retired people, who could well provide a skewed and unrepresentative result. Authentic data is more likely to come from a competent, unbiased source that has the resouces to gather and process the data fairly. Biased sources (those with a vested interest in the contents of the data) are to be treated with suspicion: e.g. churches, policical zealots, idiots, agitators... Basically, that includes everyone. Collecting masses of data is very difficult and expensive, and is usually undertaken by organisations who will somehow benefit from presenting the data. Ask yourself: "Here is a lot of data. Who collected it? Why did they collect it? What did they gain?" But let us ignore the motive and competence of data sources for the moment. Let's assume the data is representative, vast, and honest. For example, the Australian Bureau of Meteorology's (BOM) historical weather statistics. But data is not always free: even the government's BOM charges for large data requests. Large data requests usually involve an API - application programming interface. It's a way that a data provider can supply data to users safely. It's like buying something behind the counter at a milk bar. You approach the API interface (the counter), make your request to the shop assistant (the API) in a fashion they can understand, and are provided with the goods you need. The API controls what data (goods) are provided, and the provider protect itself with defences such as a counter, bulletproof glass windows, and logins. Many large data sources, such as Quora, Reddit, Wikipedia, IMDb, offer an API to allow high-volume users to request and receive their data. The format of the data provided by a large data repository is important to consider. Can you decode it? You might be able to load a Microsoft Excel spreadsheet if you have Excel or a clone (e.g. Libre Office - my favourite). But what if the download is in an unfamiliar or unhelpful format like XML, JSON, RSS, or CSV? You might be able to download it, but can you interpret it? Probably not, if you're a casual user with a spreadsheet on your PC. Large data repositories with APIs are not intended for casual download cowboys. They are for Big Data Grownups who know what JSON is, and can convert from XML to SQL. T'hats probably not you. It's certainly not me. And once you have acquired a LARGE data set - what is 'large'? We may be talking not just megabtytes, but gigabytes of data. Petabytes, maybe. Your phone can't handle that. Your laptop probably can't. Your 2023 desktop with an Intel Core i7-13700K running at 3.4GHZ with 32 GB of RAM and a 12TB HDD would be struggling to process such quantities of data. Your MS Excel or Libre Office Calc would freeze in shock. Large data repositories are seriously large. There are data warehouses, gathering data from many sources. There are data lakes. Data marts, with data that is specific to certain users. Data cubes. The list goes on. SO To summarise... Authentic data is trustworthy and unpolluted by errors or misinformation. It is available from reliable sources that have no reason to mislead their users. It is usually provided though an API that controls the requests for and delivery of stored data. It is often available in huge quantities in formats that many casual users will not be able to easily process.
|
DA U3O1 KK04 | Methods for referencing primary and secondary sources, including
APA referencing system summary
|
DA U3O1 KK05 | Characteristics of Data types
|
DA U3O1 KK06 | Methods for documenting a problem, need or opportunity
|
DA U3O1 KK07 | Methods for determining Solution requirements, constraints and scope Software solution requirements (SRS)
|
DA U3O1 KK08 | Naming Conventions to support Efficient use of databases, spreadsheets and Data visualisations
|
DA U3O1 KK09 | A methodology for creating a database structure:
Database-normalisation-example Databases-Structure-Datatypes-Naming Examiners love their databases. |
DA U3O1 KK10 | Design tools for representing databases, spreadsheets and Data visualisations, including
|
DA U3O1 KK11 | Design principles that influence the functionality and appearance of
|
DA U3O1 KK12 | Functions and techniques to retrieve required information through querying data sets, including searching, sorting and filtering to identify relationships and patterns
|
DA U3O1 KK13 | Software functions, techniques and procedures to efficiently and effectively
applying Formats and Conventions Formats and conventions (mainly web)
|
DA U3O1 KK14 | Types and purposes of Data visualisations
|
DA U3O1 KK15 | Formats
and Conventions applied to Data visualisations to improve their Effectiveness
for intended users, including clarity of message Formats and conventions (mainly web)
|
DA U3O1 KK16 | Methods and techniques for Testing
|
DA U3O1 KK17 | Reasons why organisations acquire data. The Importance of data to organisation
|
DA U3O2 KK01 | Roles, functions and characteristics of digital system components
|
DA U3O2 KK02 | Physical and software security controls used by organisations for protecting stored and communicated data
|
DA U3O2 KK03 | Primary and secondary data sources and methods of collecting data, including
Querying of data repositoriesRefers to searching datasets to extract useful information from the masses of data. Queries may use SQL or QBE SQL = Structured Query Language. A standardised format for selecting data from databases. E.g. SELECT * FROM Book WHERE price > 100.00 ORDER BY title; QBE = Query By Example. The user types what they want to find in the field they want to find it.
|
DA U3O2 KK04 | Techniques for
data sets
|
DA U3O2 KK05 | Suitability of quantitative and qualitative data for manipulation
Quantitative data - usually numeric. It is fact-based, measures of values or counts - e.g. this printer produces 15 pages per minute e.g. From the ABS website, 2022-02-28 Quantitative data can be verified for accuracy. Qualitative data - descriptive, judgemental, non-stastical, unstructured, opinion-based data expressed in words rather than number - e.g. "I think this page layout is attractive." It can be in the form of text, video, photos, audio recordings, art, music, body language, facial expressions... anything that can't be measured and given a numeric value. Arbitrary categories are also qualitative, e.g. fashionable vs unfashionable clothes, social class, conservative vs liberal politician. Whenever possible and appropriate, prefer quantitative data. It's hard to argue with. Use qualitative data if you have to, especially regarding human feelings that cannot be measured. e.g. "That logo is cute. The other one is yucky - urghh" Example: SEX = quantitative - must be male, female, intersex - according to Australian law. It can be scientifically established. vs GENDER = qualitative - it depends on an individual's emotions or judgement, and cannot be externally proved or disproved. The Australian Bureau of Statistics website defines qualitative data as "measures of 'types' and may be represented by a name, symbol, or a number code. Qualitative data are data about categorical variables (e.g. what type)." (But what would they know?) Note - neither quantitative nor qualitative data is always better or worse than the other. Their value depends on appropriateness to a particular use. They are often used together to fully describe a situation. e.g. when researching students, it is found that 32% show elevated stress levels using data about their blood pressure, blood cortisol levels etc. That is valuable quantitative data, but it does not explain the reasons for the stress. That qualitative data - which can vary from person to person - can only be gathered through other means such as observation, interviews, surveys etc. e.g. When buying a printer, quantitative speed measurements are valuable. When judging the layout of a book cover, qualitative data about people's reactions and emotions is more important. Quiz: which data types are qualitative and which are quantitative?
Answers: highlight the next blank line to see the qualitative attributes HIGHLIGHT FROM HERE
Quiz 2 (thanks to https://www.g2.com/articles/qualitative-vs-quantitative-data for the idea) A bookcase is described as follows. Which descriptors are qualitative and which are quantitative?
HIGHLIGHT FROM HERE to see the qualitative descriptors
TO HERE |
DA U3O2 KK06 | Characteristics of Data types and Data structures relevant to selected software tools
Data Structures - records & files in databases
|
DA U3O2 KK07 | Methods for referencing secondary sources, including the APA referencing system |
DA U3O2 KK08 | Criteria to check the integrity of data, including
Consistency (not examinable, but important)For data to have integrity, it should be consistent. Inconsistency occurs within a single data source or between data sources when conflicting versions of data appear in different places. Data inconsistency is unwelcome because it means that data have become unreliable. If the ‘true’ value cannot be determined, the entire data source loses integrity and becomes tainted. Any information drawn from it becomes untrustworthy. Examples:
Common cures:
“YYYY-MM-DD” is the international ISO standard format for dates. e.g. currentyear-01-02 means 2 January currentyear. Can you suggest why is there an international standard for date formats?
|
DA U3O2 KK09 | Techniques for coding qualitative data to support manipulation
|
DA U3O2 KK10 | Features of a research question, including a statement identifying the research question as an information problem Yes! It's a webpage, not a slideshow. Also, you need to see this, especially if you're at Hume Grammar.... I've been watching you guys.
|
DA U3O2 KK11 | Functional and non-functional requirements, including
|
DA U3O2 KK12 | Types and purposes of
|
DA U3O2 KK13 | Design principles that influence
|
DA U3O2 KK14 | Design tools for representing the appearance and functionality of Infographics and dynamic Data visualisations, including data manipulation and Validation, where appropriate
|
DA U3O2 KK15 | Techniques for generating alternative design ideas
|
DA U3O2 KK16 | Criteria for evaluating alternative design ideas and the Efficiency and Effectiveness
of Infographics or dynamic Data visualisations
|
DA U3O2 KK17 | Features of Project management using Gantt charts, including
Gantt and PERT charts (PERT is not examinable in Data Analytics, but really nice to know)
|
DA U3O2 KK18 | Key Legal requirements for the storage and communication of data and information, including
Human Rights and Spam legislation
|
DA U4O1 KK01 | Procedures and techniques for handling and Managing files, including
|
DA U4O1 KK02 | The functional capabilities of software to create Infographics and dynamic Data visualisations
|
DA U4O1 KK03 | Characteristics of information for educating targeted audiences, including
|
DA U4O1 KK04 | Characteristics of efficient and effective
|
DA U4O1 KK05 | Functions, techniques and procedures for efficiently and effectively manipulating data using software tools
Welp! This KK is a three-year course in itself. It includes nearly everything you consider "IT" So, let's try to break this down into manageable chunks: Manipulating data = basically what any software does. If software doesn't manipulate data, it's not software. Data manipulation = using data to create new data (e.g. create a graph from a table of numbers) Data modification = changing data to a new value and replacing the original data (e.g. replacing Excel cell E3 with a new value) How can data be manipulated?Sorted, e.g. by size, name, age, Categorised - e.g. by media type, musical genre Summarised - statistically: to find averages, trends, maximum & minimum values Converted - to different forms, e.g. from numeric data to a visual graph, from digital data to sound or video. Converting a spreadsheet into a database. Converting a slideshop into a webpage. Reformatting data. Compressed - e.g. with JPG / GIF / TIFF (images), MP3 / FLAC (audio), RAR / ZIP (files) , Encrypted - PGP, WEP / WPA (wifi), HTTPS, SSL / TLS (web pages) Deleted, merged, inserted Common data manipulation software toolsSpreadsheet - a flat-file database (though VLOOKUP gives some valuable relational ability) Relational database - with two or more related tables Specialist tools - e.g. video editors, web browsers, file managers, text editors. They are very limited in the data they can accept and the manipulation they can do, but they are vital for specific tasks. SPREADSHEETSIf you only spent an hour in a spreadsheet class, you would have been told about: - the structure of spreadsheets: sheets, rows, columns, cells, formulas, values - basic functions - sum, average, maximum/minimum
DATABASESIf you only spent three hours in a database class, you would have been told about: - the structure of databases: tables, records, fields - relationships between tables - queries, reports - basic functions - sum, average, maximum/minimum
More to come, eventually 2022-04-25 @ 1:20 PM
|
DA U4O1 KK06 | Techniques for creating
|
DA U4O1 KK07 | Techniques for Validating and Verifying data
|
DA U4O1 KK08 | Techniques for Testing that Solutions perform as intended
Trace Tables, Desk Checking
|
DA U4O1 KK09 | Techniques for recording the progress of projects, including
|
DA U4O1 KK10 | Strategies for
|
DA U4O2 KK01 | Characteristics of wired, wireless and mobile networks Networks - wired, wireless, mobile
|
DA U4O2 KK02 | Types and causes of accidental, deliberate and events-based threats to the integrity and security of data and information used by organisations Threats to data and information
|
DA U4O2 KK03 | Physical and software security controls for
Penetration Testing, White Hat / Ethical hacking
|
DA U4O2 KK04 | The role of hardware, software and technical protocols in managing, controlling and securing data in Information systems
|
DA U4O2 KK05 | The advantages and disadvantages of using network attached storage and cloud computing for storing, communicating and disposing of data and information
Cloud computing and cloud storage are very different beasts. VCAA seems to be unsure what they mean. |
DA U4O2 KK06 | Characteristics of data that has integrity, including
|
DA U4O2 KK07 | The importance of data and information to organisations Importance of data and information
|
DA U4O2 KK08 | The importance of data and information security strategies to organisations
|
DA U4O2 KK09 | The impact of diminished data integrity in Information systems
|
DA U4O2 KK10 | Key legislation that affects how organisations control the collection, storage, communication and disposal of their data and information:
|
DA U4O2 KK11 | Ethical issues arising from data and information security practices
|
DA U4O2 KK12 | Strategies for resolving legal and ethical issues between stakeholders arising from information security practices
|
DA U4O2 KK13 | Reasons to prepare for disaster and the scope of disaster recovery plans, including
|
DA U4O2 KK14 | Possible consequences for organisations that fail or violate security measures
A summary from the Security-Threats PPT:
|
DA U4O2 KK15 | Criteria for evaluating the effectiveness of data and information security strategies.Point 1 - Effectiveness relates to how well the job is done, regardless of time, money or effort is expended. Efficiency refers to speed, cost and labour. Everything else is effectiveness. Point 2 - Criteria are the relevant factors used to measure the value of something. e.g. the criteria of a cat are: fluffiness, purriness, jumpiness. Not how well they guard the house, or lead the blind.. Point 3 - Data: the raw, unprocessed numbers gained from a sensor, survey, census, or other data-gathering system. Data have potential data, but are unorganised, bulky, and their meaning is unclear, Point 4 - Information: is processed from data to create meaningful and understandable knowledge. e.g. A national census may contain the incomes of 15 million people (which is data) but only when those 15 million incomes are added up or averaged will useful information be derived from them. e.g.:
So - back to the Key Knowledge. How does one evaluate how well different methods protect the security of data and information? We're not worried about what the different methods are. They might involve armed guards, virus scanners, ceiling-mounted laser cannons. It matters not. What we're worried about is what rules we use to judge the quality of the different methods. Now it gets easier. What criteria would you use to judge to evaluate the effectiveness of a CAR? speed, reliability, running costs, safety, seating capacity, cup holders. What criteria would you use to judge to evaluate the effectiveness of a DOG? woofiness, friendliness, size. OK. I don't understand dogs. In case you're getting bored, here's a picture of a dog. You use very different criteria when judging cats and dogs. And you use different vriteria for evaluating the effectiveness of data and information security strategies. CRITERIA: - Speed of response to the threat. Nope! That is an efficiency criterion, not effectiveness as the KK asks. Same applies to the cost, or amount of labour required. Be careful! - Accuracy. How many false negatives (real attacks that were not detected) or false alarms (harmless events that were wrongly judged as being harmful) were recorded? - Usefulness. If a strategy gave useful information about the source of the threat, the type or nature of the threat, how to counteract the threat etc, it would be a jolly good strategy. It would give quality information. - Recoverability - yes, it's an ugly word (and not examinable) but a quality data security strategy would offer a quick and easy way to get back lost or damaged data quickly and accurately. You could also call it resilence - the ability to get back to normal after a setback. - Reliability- can the strategy be trusted? Is it dodgy? - Comprehensiveness- Does the strategy detect all relevant threats? A security scanner that completely ignores trojan horses or rootkits, for example, may leave a system open to exploitation.. - Basically - ANY means of judgement that does NOT involve speed, cost, or labour.
|
Go back to wherever you were before this page All original content copyright ©
vcedata.com This page was created on 2022-01-17 |