Big data analytics mcq with answers pdf

WhatsApp Image 2021 04 18 at 11.53.14

Data Analytics sppu mcq

1. ——– function is used to add a title to each axis instance in a figure.

A : set_title()
B : get_title()
C : set_label()
D : title()



2. ———- provides arange of supervised and un-supervised learning
algorithms via consistant interface in python

A : Pandas
B : Numpy
C : Scikit-Learn
D : image



3. The ———- attribute specifies the number of dimensions or axes of the

A : ndarray.size
B : ndarray.dtype
C : ndarray.ndim
D : ndarray.axes




The ———– algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.

A : Clustring
B : Regression
C : Naïve Bays
D : Apriori


5. —————- submodule of scipy is dedicated to image processing

A : ndarray
B : spatial
C : ndimage
D : special



6. If number of input features are 3 then optimal hyperplane in support
vector machine is ————-

A : Single point
B : Line
C : 2-D Plane
D : Non linear line


2-D Plane

7.  ————— is an example of human generated unstructured data.

A : Text files
B : Satellite data
C : Sensor data
D : Seismic imagery data

Text files

8. ————- must be installed before you use scikit learn

A : Matlab
B : Scilab
C : Scipy
D : Numpy



9. The procedure to organize items of a given collection into groups based on
some similar features called as ————-

A : Regression
B : Clustering
C : Ddecion Trees
D : Association


10. In statistics, a population consists of ——————

A : All People living in a country.
B : All People living in the city.
C : All subjects or objects whose characteristics are being studied.
D : Part of whole dataset


All subjects or objects whose characteristics are being studied.

Data analytics mcq with answers

11. Which function is used to give title for the axes.

A : plt.title()
B : plt.xlabel()
C : plt.ylabel()
D : plt.xscale()


12. ————- function is used to plot a histogram using matplotlib library

A : hist()
B : bar()
C : pie()



13. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.

A : Probability
B : Gini Index
C : Regression
D : Association

Gini Index

14. Email data is an example of ———-

A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered

Un-Structured data

15. Which of the following is not a type of clustering algorithm?

A : Density clustering
B : K-Mean clustering
C : Centroid clustering
D : Simple clustering


Simple clustering

16. —— answers the questions like ” How can we make it happen?”

A : Descriptive
B : Prescriptive
C : Predictive
D : Probability


17. ————– data does not fits into a data model due to variatins in contents

A : Structured data
B : Un-Structured data
C : Semi-Structured data


Un-Structured data

18. ————— function multiply two matrices in numpy.

A : prod()
B : mult()
C : dot()
D : *


19. ——————– is a general purpose array-processing package provides a
high performance multi-dimentional array object and tools for working with
these arrays

A : NumPy
B : SciPy
C : sklearn
D : None of these


20. ——– library is built on the top of Numpy, SciPy and Matplotlib

A : Sympy
B : Scikit
C : Pandas
D : Numpy


Data analytics mcq questions and answers

21. The last element of ndarray is indexed by ————-

A : 0
B : -1
C : 1
D : -2



22. ————the step is performed by data scientist after acquiring the data.

A : Data Cleansing
B : Data Integration
C : Data Replication
D : Data loading

Data Cleansing

23. ———— function is used to save an array as in image file.

A : matplotlib.pyplot.image()
B : matplotlib.pyplot.imread()
C : matplotlib.pyplot.imwrite()
D : matplotlib.pyplot.imsave()


24. ————- is unsupervised machine learning technique.

B : Support Vector Machines
C : Decision trees
D : Cluster analysis

Cluster analysis

25. What is correct syntax to generate inetegers between 10 to 30

A : x=numpy.arange(10,30)
B : x=numpy.array(10,30)
C : x=numpy.arange(10,31)
D : x=arange(10,31)



Big data analytics mcq

26. _——— function used to get arrays elementwise remainder of division

A : numpy.divide(x1,x2)
B : numpy.mod(x1,x2)
C : numpy.true_divide(x1,x2)
D : numpy.reminder(x1,x2)


27. ———– is an indication of how often the rule has been found to be true in
association rule mining.

A : Confidence
B : Support
C : Lift
D : None of These



28. A ———— is a supervised machine learning algorithm which relies on the
assumptiion of feature independent to classify input data

A : Clustring
B : Regression
C : Naïve Bays
D : Apriori

Naïve Bays

29. What is the use of following function? Plt.xlabel(“Total Marks”)

A : Gives label to X-Axis
B : Gives label to Y-Axis
C : Gives title to figure
D : Add text to figure

Gives label to X-Axis

30. Pandas provide ———– function as the entry point for all standard
database join operations while merging two DataFrame objects.

A : concat()
B : replace()
C : merge()
D : add()



31. Data generated on twitter is an example of ———

A : Structured data
B : Un-Structured data
C : Semi-Structured data
D : Scattered

Un-Structured data

32. —————– is an excellent 2D and 3D graphics library for generating
scientific figures?

A : Pandas
B : Numpy
C : matplotlib
D : ndarray


33. Support(B) =

A : (Transacions containing (B)) / (Total Transactions)
B : (Transacions containing (B)) / 100
C : (Total Transactions) / (Transacions containing (B))
D : 100/ (Transacions containing (B))

(Transacions containing (B)) / (Total Transactions)

34. ———— is an example of semi structured data

A : NoSQL data
B : YouTube data
C : Text File data
D : Satellite imagery data


NoSQL data

35. ——————— is raster graphic format with lossless compression.

D : PS


36. ——————is a flow-chart like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and
leaf nodes represent classes or class distributions.

A : Decision tree
B : Association Rule Mining
C : Clustering
D : Support vector machines

Decision tree

37. ——————— is a form of supervised learning algorithm which is used in
mail service providers like Gmail, yahoo, etc. to classify a new mail as spam or not spam

A : Classification
B : Regression
C : Clustering
D : Naïve bays


38. In ———— the x-axes are grouped into bins and each bin will be treated
as a category

A : Bar
B : Line
C : Scatter
D : Histogram



39. When data are collected in a statistical study for only a portion or subset
of all elements of interest we are using

A : Sample
B : Parameter
C : Population
D : Probability


40. ———— regression finds a relaitionship between one or more features
(independent variables) and a continuous variables (dependent variable).

A : Non-linear
B : Linear
C : Both of these
D : None of These


41. It is a measure of disorder or purity or unpredictability or uncertainty.

A : Entropy
B : Support
C : Confidence
D : lift


42. Which of the following function is not used to iterate over the rows of the

A : iteritems()
B : iterrows()
C : itertuples()
D : iterpanel()


43. ——— is technique that duplicates smaller array to make dimensionality
and size of an array as the size and dimensionality of larger array

A : Multiplation
B : Broadcasting
C : Addition
D : Flatten



44. Which of the following task is not performed by Data Scientist.

A : Define the question
B : Create reproducible code
C : Challenge results
D : Staff Recruitemen

Staff Recruitemen

45. To save a figure into a file we can use ———— method in the figure class
of matplotlib.pyplot.

A : save()
B : save_fig()
C : Figure()
D : save_image()


46. ———- machine learning algorithm used in cross marketing to work with
other businesss that complement your own business but not to other competitors.

A : Decision tree
B : Association Rule Mining
C : Clustering
D : Support vector machine

B : Association Rule Mining

47. Which function returns an ndarray object that contains the numbers that
are evenly spaced on a log scale.

A : numpy.logspace()
B : numpy.log()
C : numpy.fill()
D : numpy.random()



48. The ——— argument of merge function while merging two dataframes
specifies which keys are to be included in the resulting dataframe.

A : right
B : on
C : sort
D : how


49. Which of the following function is used to split a figure into nrows*ncols

A : plot()
B : draw()
C : bar()
D : subplot()


50. ——– function is used to display an image through an external viewer in

A : display()
B : imread()
C : imshow()
D : show()



51. ——– is an unsupervised algorithm used for frequent itemset mining

A : Apriori
B : Support Vector Machines
C : Decision trees
D : Cluster analysis

Cluster analysis

52. The — —– is characterized by a bell shapped curve and area under curve
represents probabilities

A : Normal Distribution
B : Binomial Distribution
C : Poission Distribution
D : Probability

Normal Distribution

53. Apriori algorithm uses breadth first search and ————structure to
count candidate item sets efficiently.

A : Decision tree
B : Hash tree
C : Red-Black Tree
D : AVL Tree


Hash tree

54. In Data science project data acquisition step involves——–

A : Acquiring data from various sources.
B : Selecting dataset
C : Data preprocessing
D : Data modeling

Acquiring data from various sources.

55. Select the correct statement:

A : Raw data is original source of data.
B : Preprocessed data is original source of data.
C : Raw data is the data obtained after processing steps.
D : Analysed data is original source of data

Raw data is original source of data.

56. Which of the following statement will create an axes at the top right
corner of the current figure

A : subplot(2,3,3)
B : subplot(2,3,2)
C : subplot(2,3,4)
D : subplot(2,3,5)


57. Catelog design is complex process where the selection of items in a
business’s catelog are often designed to complement each other so that buying
one item will lead to buying of another. So these items are often complements or
very related. Which algorith

A : Decision tree
B : Association Rule Mining
C : Clustering
D : Support vector machine


Association Rule Mining

58. While plotting using matplotlib.pyplot A function call similar to
subplot(2,3,4) is

A : subplot(234)
B : subplot(243)
C : subplot(324)
D : subplot(4)


59. ———— algorithm models a series of logical If-Then- Else decision
statements, there is no underlying assumption of a linear or non-linear
relationship between the input variables and response variables.

A : Regression
B : Decision Trees
C : Clustering
D : Naïve bays

Decision Trees

60. To reach to the final point and to make prediction , decision trees must
be traversed from ———-

A : Top – to – bottom
B : Bottom- to – Top
C : Left- to Right
D : Right – to – Left


Top – to – bottom

Data analytics mcq sppu

data analytics mcq sppu, big data analytics mcq pdf, data analytics mcq with answers, big data analytics mcq with answers, data analytics mcq with answers pdf, data analytics sppu mcq, data analytics mcq questions, big data analytics mcq,

Leave a Comment

Your email address will not be published. Required fields are marked *

error: Content is protected !!
Scroll to Top