**Data mining mcq with answers**

Here are the 60 most important Data mining mcq and warehousing mcq that can be asked in your be comp online examination. Data mining mcq questions are given with the show answer button. dmw mcqs sppu can ask for 1 marks are all listed here.

**In data mining and warehousing mcq questions given below, BOLD option is the correct answer **

**What is the method to interpret the results after rule generation?**

A : Absolute Mean

B : **Lift ratio**

C : Gini Index

D : Apriori

**OLAP database design is**

A : Application-oriented

B : Object-oriented

C : Goal-oriented

D : **Subject-oriented**

**Multilevel association rules can be mined efficiently using**A : Support

B : Confidence

C : Support count

D :

**Concept Hierarchies under support-confidence framework**

**accuracy is used to measure**

A : classifierâ€™s true abilities

B : classifierâ€™s analytic abilities

C : classifierâ€™s decision abilities

D : **classifierâ€™s predictive abilities**

**Supervised learning and unsupervised clustering both require at least one**

A : hidden attribute

B : output attribute

C :

**input attribute**

D : categorical attribute

**The task of building decision model from labeled training data is called as**

A : **Supervised Learning**B : Unsupervised Learning

C : Reinforcement Learning

D : Structure Learning

**What is the range of the cosine similarity of the two documents?**

A : **Zero to One**

B : Zero to infinity

C : Infinity to infinity

D : Zero to Zero

**Multi-class classification makes the assumption that each sample is assigned to**

A : **one and only one label**

B : many labels

C : one or many labels

D : no label

**Which of these is not a frequent pattern mining algorithm?**

A : **Decision trees**

B : Eclat

C : FP growth

D : Apriori

**The first steps involved in the knowledge discovery is?**

A : Data Integration

B : Data Selection

C : Data Transformation

D : **Data Cleaning**

**The distance between two points calculated using Pythagoras theorem is**

A : Supremum distance

B : **Euclidean distance**

C : Linear distance

D : Manhattan Distance

**What do you mean by dissimilarity measure of two objects?**

A : Is a numerical measure of how alike two data objects are.

B : **Is a numerical measure of how different two data objects are.**

C : Higher when objects are more alike

D : Lower when objects are more different

**An ROC curve for a given model shows the trade-off between**

A : random sampling

B : test data and train data

C : cross validation

D : **the true positive rate (TPR) and the false positive rate(FPR)**

**Each dimension is represented by only one table. Recognize the type of schema.**

A : **Star Schema**

B : Snowflake schema

C : Fact constellation

D : Database schema

**Choose the correct concept hierarchy.**

A : city < street < state < country

B : **street < city < state < country **C : street > city > state > country

D : street > city > country > state

**Height is an example of which type of attribute**

A : Nominal

B : Binary

C : Ordinal

D : **Numeric**

**Which angle is used to measure document similarity?**

A : Sin

B : Tan

C : **Cos**

D : Sec

**Which of the following is the data mining tool?**

A : Borland C

B : **Weka**

C : Borland C++

D : Visual C

**A decision tree is also known as**

A : general tree

B : binary tree

C : **prediction tree**D : None of the options

**recall is a measure of**

A : **completeness of what percentage of positive tuples are labeled**

B : a measure of exactness for misclassification

C : a measure of exactness of what percentage of tuples are not classified

D : a measure of exactness of what percentage of tuples labeled as negative are at actual

**What is the approach of basic algorithm for decision tree induction?**

A : **Greedy**

B : Top Down

C : Procedural

D : Step by Step

**The rule is considered as intersting if**

A : **They satisfy both minimum support and minimum confidence threshold**

B : They satisfy both maximum support and maximum confidence threshold

C : They satisfy maximum support and minimum confidence threshold

D : They satisfy minimum support and maximum confidence threshold

**For mining frequent itemsets, the Data format used by Apriori and FP Growth algorithms are**

A : Apriori uses horizontal and FP-Growth uses vertical data format

B : Apriori uses vertical and FP-Growth uses horizontal data format

C : Apriori and FP-Growth both uses vertical data format

D : **Apriori and FP-Growth both uses horizontal data format**

**Which of the following sequence is used to calculate proximity measures for ordinal attribute?**

A : Replacement discretization and distance measure

B : Replacement characterizarion and distance measure

C : Normalization discretization and distance measure

D : **Replacement normalization and distance measure**

**Multilevel association rule mining is**A : Association rules generated from candidate-generation method

B : Association rules generated from without candidate-generation method

C :

**Association rules generated from mining data at multiple abstarction level**

D : Assocation rules generated from frequent itemsets

**Which of the following is not correct use of cross validation?**

A : Selecting variables to include in a model

B : Comparing predictors

C : Selecting parameters in prediction function

D : **classification**

**What do you mean by support(A)?**

A : Total number of transactions containing A

B : Total Number of transactions not containing A

C : **Number of transactions containing A / Total number of transactions**

D : Number of transactions not containing A / Total number of transactions

**Data mining and warehousing mcq pdf**

**The fact table contains**

A : The names of the facts

B : Keys to each of the related dimension tables

C : **Facts and keys**

D : Facts or keys

**Every key structure in the data warehouse contains a time element**

A : records

B : Explicitly

C : Implicitly and explicitly

D : **Implicitly or explicitly**

**The accuracy of a classifier on a given test set is the percentage of**

A : **test set tuples that are correctly classified by the classifier**

B : test set tuples that are incorrectly classified by the classifier

C : test set tuples that are incorrectly misclassified by the classifier

D : test set tuples that are not classified by the classifier

**How will you counter over-fitting in decision tree?**

A : By creating new rules

B : **By pruning the longer rules**

C : Both By pruning the longer rulesâ€™ and â€˜ By creating new rulesâ€™

D : BY creating new tree

**The confusion matrix is a useful tool for analyzing**

A : Regression

B : **Classification**C : Sampling

D : Cross validation

**If A, B are two sets of items, and A is a subset of B. Which of the following statement is always true?**A : Support(A) is less than or equal to Support(B)

B :

**Support(A) is greater than or equal to Support(B)**

C : Support(A) is equal to Support(B)

D : Support(A) is not equal to Support(B)

**What is the limitation behind rule generation in Apriori algorithm?**

A : Need to generate a huge number of candidate sets

B : Need to repeatedly scan the whole database and Check a large set of candidates by

pattern matching

C : Dropping itemsets with valued information

D : **Both (a) dnd (b)**

**In asymmetric attribute**

A : No value is considered important over other values

B : All values are equal

C : **Only non-zero value is important**

D : Range of values is important

**One of the most well known software used for classification is**

A : Java

B : **C4.5**C : Oracle

D : C++

**Identify the example of sequence data**

A : weather forecast

B : data matrix

C : market basket data

D : **genomic data**

**What type of matrix is required to represent binary data for proximity measures?**

A : Normal matrix

B : Sparse matrix

C : Dense matrix

D : **Contingency matrix**

**Some company wants to divide their customers into distinct groups to send offers this is an example of**

A : Data Extraction

B : **Data Classification**C : Data Discrimination

D : Data Selection

**This operation may add new dimension to the cube**

A : Roll up

B : **Drill down**C : Slice

D : Dice

**Which of the following sentence is FALSE regarding regression?**

A : It relates inputs to outputs.

B : It is used for prediction.

C : It may be used for interpretation.

D : **It discovers causal relationships.**

**The following represents age distribution of students in an elementaryclass. Find the mode of the values: 7, 9, 10, 13, 11, 7, 9, 19, 12, 11, 9, 7, 9, 10, 11.**

A : 7

B :

**9**

C : 10

D : 11

**These numbers are taken from the number of people that attended a particular church every Friday for 7 weeks: 62, 18, 39, 13, 16, 37, 25. Find the mean.**

A : 25

B : 210

C : 62

D :

**30**

**Effectiveness of the browsing is highest. Recognize the type of schema.**

A : **Star Schema**

B : Snowflake schema

C : Fact constellation

D : Database schema

**The cuboid that holds the lowest level of summarization is called as**

A : 0-D cuboid

B : 1-D cuboid

C : **Base cuboid**

D : 2-D cuboid

**The tables are easy to maintain and saves storage space.**

A : Star Schema

B : **Snowflake schema**

C : Fact constellation

D : Database schema

**A database has 4 transactions.Of these, 4 transactions include milk and bread. Further , of the given 4 transactions, 3 transactions include cheese. Find the support percentage for the following association rule, ” If milk and bread purchased then cheese is also purchased”.**A :

**0.6**

B : 0.75

C : 0.8

D : 0.7

**What is the range of the angle between two term frequency vectors?**

A : Zero to Thirty

B : **Zero to Ninety**

C : Zero to One Eighty

D : Zero to Fourty Five

**What does a Pearson’s product-moment allow you to identify?**

A : **Whether there is a relationship between variables**

B : Whether there is a significant effect and interaction of independent variables

C : Whether there is a significant difference between variables

D : Whether there is a significant effect and interaction of dependent variables

**Consider three itemsets V1={tomato, potato,onion}, V2={tomato,potato}, V3={tomato}. Which of the following statement is correct?**

A : support(V1) is greater than support (V2)

B :

**support(V3) is greater than support (V2)**

C : support(V1) is greater than support(V3)

D : support(V2) is greater than support(V3)

**What is the another name of Supremum distance?**

A : Wighted Euclidean distance

B : City Block distance

C : **Chebyshev distance**

D : Euclidean distance

**This technique uses mean and standard deviation scores to transform real-valued attributes.**

A : decimal scaling

B : min-max normalization

C : **z-score normalization**

D : logarithmic normalization

**When do we use Manhattan distance in data mining?**

A : Dimension of the data decreases

B : **Dimension of the data increases**

C : Underfitting

D : Moderate size of the dimensions

**Correlation analysis is used for**

A : handling missing values

B : **identifying redundant attributes**

C : handling different data formats

D : eliminating noise

**If True Positives (TP): 7, False Positives (FP): 1,False Negatives (FN): 4, True Negatives (TN): 18. Calculate Precision and Recall.**A :

**Precision = 0.88, Recall=0.64**

B : Precision = 0.44, Recall=0.78

C : Precision = 0.88, Recall=0.22

D : Precision = 0.77, Recall=0.55

**A sub-database which consists of set of prefix paths in the FP-tree co-occuring with the sufix pattern is called as**A : Suffix path

B : FP-tree

C : Prefix path

D :

**Condition pattern base**

**Cost complexity pruning algorithm is used in?**

A : CART

B : **C4.5**

C : ID3

D : ALL

**Which is the most well known association rule algorithm and is used in most commercial products.**

A : **Apriori algorithm**

B : Pincer-search algorithm

C : Distributed algorithm

D : Partition algorithm

**Which operation is required to calculate Hamming distacne between two objects?**

A : AND

B : OR

C : NOT

D : **XOR**

data mining mcq questions, data mining and warehousing mcq, data mining and warehousing mcq sppu, data mining and warehousing sppu mcq, data mining mcq pdf, dmw mcq, dmw mcq pdf, data warehousing mcq, data mining and warehousing mcq pdf