
Data mining mcq with answers
Here are the 60 most important Data mining mcq and warehousing mcq that can be asked in your be comp online examination. Data mining mcq questions are given with the show answer button. dmw mcqs sppu can ask for 1 marks are all listed here.
In data mining and warehousing mcq questions given below, BOLD option is the correct answer
What is the method to interpret the results after rule generation?
A : Absolute Mean
B : Lift ratio
C : Gini Index
D : Apriori
OLAP database design is
A : Application-oriented
B : Object-oriented
C : Goal-oriented
D : Subject-oriented
Multilevel association rules can be mined efficiently using
A : Support
B : Confidence
C : Support count
D : Concept Hierarchies under support-confidence framework
accuracy is used to measure
A : classifier’s true abilities
B : classifier’s analytic abilities
C : classifier’s decision abilities
D : classifier’s predictive abilities
Supervised learning and unsupervised clustering both require at least one
A : hidden attribute
B : output attribute
C : input attribute
D : categorical attribute
The task of building decision model from labeled training data is called as
A : Supervised Learning
B : Unsupervised Learning
C : Reinforcement Learning
D : Structure Learning
What is the range of the cosine similarity of the two documents?
A : Zero to One
B : Zero to infinity
C : Infinity to infinity
D : Zero to Zero
Multi-class classification makes the assumption that each sample is assigned to
A : one and only one label
B : many labels
C : one or many labels
D : no label
Which of these is not a frequent pattern mining algorithm?
A : Decision trees
B : Eclat
C : FP growth
D : Apriori
The first steps involved in the knowledge discovery is?
A : Data Integration
B : Data Selection
C : Data Transformation
D : Data Cleaning
The distance between two points calculated using Pythagoras theorem is
A : Supremum distance
B : Euclidean distance
C : Linear distance
D : Manhattan Distance
What do you mean by dissimilarity measure of two objects?
A : Is a numerical measure of how alike two data objects are.
B : Is a numerical measure of how different two data objects are.
C : Higher when objects are more alike
D : Lower when objects are more different
An ROC curve for a given model shows the trade-off between
A : random sampling
B : test data and train data
C : cross validation
D : the true positive rate (TPR) and the false positive rate
(FPR)
Each dimension is represented by only one table. Recognize the type of schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Choose the correct concept hierarchy.
A : city < street < state < country
B : street < city < state < country
C : street > city > state > country
D : street > city > country > state
Height is an example of which type of attribute
A : Nominal
B : Binary
C : Ordinal
D : Numeric
Which angle is used to measure document similarity?
A : Sin
B : Tan
C : Cos
D : Sec
Which of the following is the data mining tool?
A : Borland C
B : Weka
C : Borland C++
D : Visual C
A decision tree is also known as
A : general tree
B : binary tree
C : prediction tree
D : None of the options
recall is a measure of
A : completeness of what percentage of positive tuples are labeled
B : a measure of exactness for misclassification
C : a measure of exactness of what percentage of tuples are not classified
D : a measure of exactness of what percentage of tuples labeled as negative are at actual
What is the approach of basic algorithm for decision tree induction?
A : Greedy
B : Top Down
C : Procedural
D : Step by Step
The rule is considered as intersting if
A : They satisfy both minimum support and minimum confidence threshold
B : They satisfy both maximum support and maximum confidence threshold
C : They satisfy maximum support and minimum confidence threshold
D : They satisfy minimum support and maximum confidence threshold
For mining frequent itemsets, the Data format used by Apriori and FP Growth algorithms are
A : Apriori uses horizontal and FP-Growth uses vertical data format
B : Apriori uses vertical and FP-Growth uses horizontal data format
C : Apriori and FP-Growth both uses vertical data format
D : Apriori and FP-Growth both uses horizontal data format
Which of the following sequence is used to calculate proximity measures for ordinal attribute?
A : Replacement discretization and distance measure
B : Replacement characterizarion and distance measure
C : Normalization discretization and distance measure
D : Replacement normalization and distance measure
Multilevel association rule mining is
A : Association rules generated from candidate-generation method
B : Association rules generated from without candidate-generation method
C : Association rules generated from mining data at multiple abstarction level
D : Assocation rules generated from frequent itemsets
Which of the following is not correct use of cross validation?
A : Selecting variables to include in a model
B : Comparing predictors
C : Selecting parameters in prediction function
D : classification
What do you mean by support(A)?
A : Total number of transactions containing A
B : Total Number of transactions not containing A
C : Number of transactions containing A / Total number of transactions
D : Number of transactions not containing A / Total number of transactions
Data mining and warehousing mcq pdf
The fact table contains
A : The names of the facts
B : Keys to each of the related dimension tables
C : Facts and keys
D : Facts or keys
Every key structure in the data warehouse contains a time element
A : records
B : Explicitly
C : Implicitly and explicitly
D : Implicitly or explicitly
The accuracy of a classifier on a given test set is the percentage of
A : test set tuples that are correctly classified by the classifier
B : test set tuples that are incorrectly classified by the classifier
C : test set tuples that are incorrectly misclassified by the classifier
D : test set tuples that are not classified by the classifier
How will you counter over-fitting in decision tree?
A : By creating new rules
B : By pruning the longer rules
C : Both By pruning the longer rules’ and ‘ By creating new rules’
D : BY creating new tree
The confusion matrix is a useful tool for analyzing
A : Regression
B : Classification
C : Sampling
D : Cross validation
If A, B are two sets of items, and A is a subset of B. Which of the following statement is always true?
A : Support(A) is less than or equal to Support(B)
B : Support(A) is greater than or equal to Support(B)
C : Support(A) is equal to Support(B)
D : Support(A) is not equal to Support(B)
What is the limitation behind rule generation in Apriori algorithm?
A : Need to generate a huge number of candidate sets
B : Need to repeatedly scan the whole database and Check a large set of candidates by
pattern matching
C : Dropping itemsets with valued information
D : Both (a) dnd (b)
In asymmetric attribute
A : No value is considered important over other values
B : All values are equal
C : Only non-zero value is important
D : Range of values is important
One of the most well known software used for classification is
A : Java
B : C4.5
C : Oracle
D : C++
Identify the example of sequence data
A : weather forecast
B : data matrix
C : market basket data
D : genomic data
What type of matrix is required to represent binary data for proximity measures?
A : Normal matrix
B : Sparse matrix
C : Dense matrix
D : Contingency matrix
Some company wants to divide their customers into distinct groups to send offers this is an example of
A : Data Extraction
B : Data Classification
C : Data Discrimination
D : Data Selection
This operation may add new dimension to the cube
A : Roll up
B : Drill down
C : Slice
D : Dice
Which of the following sentence is FALSE regarding regression?
A : It relates inputs to outputs.
B : It is used for prediction.
C : It may be used for interpretation.
D : It discovers causal relationships.
The following represents age distribution of students in an elementary
class. Find the mode of the values: 7, 9, 10, 13, 11, 7, 9, 19, 12, 11, 9, 7, 9, 10, 11.
A : 7
B : 9
C : 10
D : 11
These numbers are taken from the number of people that attended a particular church every Friday for 7 weeks: 62, 18, 39, 13, 16, 37, 25. Find the mean.
A : 25
B : 210
C : 62
D : 30
Effectiveness of the browsing is highest. Recognize the type of schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
The cuboid that holds the lowest level of summarization is called as
A : 0-D cuboid
B : 1-D cuboid
C : Base cuboid
D : 2-D cuboid
The tables are easy to maintain and saves storage space.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
A database has 4 transactions.Of these, 4 transactions include milk and bread. Further , of the given 4 transactions, 3 transactions include cheese. Find the support percentage for the following association rule, ” If milk and bread purchased then cheese is also purchased”.
A : 0.6
B : 0.75
C : 0.8
D : 0.7
What is the range of the angle between two term frequency vectors?
A : Zero to Thirty
B : Zero to Ninety
C : Zero to One Eighty
D : Zero to Fourty Five
What does a Pearson’s product-moment allow you to identify?
A : Whether there is a relationship between variables
B : Whether there is a significant effect and interaction of independent variables
C : Whether there is a significant difference between variables
D : Whether there is a significant effect and interaction of dependent variables
Consider three itemsets V1={tomato, potato,onion}, V2={tomato,potato}, V3={tomato}. Which of the following statement is correct?
A : support(V1) is greater than support (V2)
B : support(V3) is greater than support (V2)
C : support(V1) is greater than support(V3)
D : support(V2) is greater than support(V3)
What is the another name of Supremum distance?
A : Wighted Euclidean distance
B : City Block distance
C : Chebyshev distance
D : Euclidean distance
This technique uses mean and standard deviation scores to transform real-valued attributes.
A : decimal scaling
B : min-max normalization
C : z-score normalization
D : logarithmic normalization
When do we use Manhattan distance in data mining?
A : Dimension of the data decreases
B : Dimension of the data increases
C : Underfitting
D : Moderate size of the dimensions
Correlation analysis is used for
A : handling missing values
B : identifying redundant attributes
C : handling different data formats
D : eliminating noise
If True Positives (TP): 7, False Positives (FP): 1,False Negatives (FN): 4, True Negatives (TN): 18. Calculate Precision and Recall.
A : Precision = 0.88, Recall=0.64
B : Precision = 0.44, Recall=0.78
C : Precision = 0.88, Recall=0.22
D : Precision = 0.77, Recall=0.55
A sub-database which consists of set of prefix paths in the FP-tree co-occuring with the sufix pattern is called as
A : Suffix path
B : FP-tree
C : Prefix path
D : Condition pattern base
Cost complexity pruning algorithm is used in?
A : CART
B : C4.5
C : ID3
D : ALL
Which is the most well known association rule algorithm and is used in most commercial products.
A : Apriori algorithm
B : Pincer-search algorithm
C : Distributed algorithm
D : Partition algorithm
Which operation is required to calculate Hamming distacne between two objects?
A : AND
B : OR
C : NOT
D : XOR
data mining mcq questions, data mining and warehousing mcq, data mining and warehousing mcq sppu, data mining and warehousing sppu mcq, data mining mcq pdf, dmw mcq, dmw mcq pdf, data warehousing mcq, data mining and warehousing mcq pdf