IGTechTeam

Python Machine Learning Deep Learning Data Structure

Here are the most popular and usually askable questions about Machine Learning during a job interview or an internship program. I have written each answer in very simple language so that it will be easy for you to get all.

1. What is machine learning?

Machine learning (ML) is a subset or field of Artificial Intelligence (AI) which enables computers to learn automatically from data. ML extracts pattern out of raw data to make decisions.

2. What are the types of machine learning?

There are mainly three types of machine learning. They are:

Supervised Learning (Linear Regression, Logistic Regression, Decision Tree, Random Forest, Naive Bayes, KNN, SVM)
Unsupervised Learning (K-Means Clustering, Segmentation, Association Mining, Anomaly Detection)
Reinforcement Learning (Decision Process, Reward Process, Recommendation System)

3. What is supervised learning?

Supervised learning involves training the machine using well-labeled training data. That means this method has a supervisor (teacher) to guide the input process.

In the below example, we have some data. Our task is to identify whether each of the data is a circle or rectangle. As it is supervised learning, we must have labeled data to know what a circle or rectangle actually looks like i.e., what are its properties. Based on this, we can predict the final output.

There are different classes of supervised learning. They are:

Classification (Logistic Regression, SVM, KNN, Naive Bayes, Decision Tree)
Regression (Linear Regression)

4. What is unsupervised learning?

Unsupervised learning does not include labeled data. That means this method does not have a supervisor (teacher) to guide the input process.

In the below example, we have some data. Our task is to identify whether the data is a circle or a rectangle. As it is unsupervised learning, we don't have labeled data. That means we have no initial information. We can separate data based on its characteristics like color, shape, or other. This problem can be solved by classifying data based on shape. Thus, we end with two classes - Class A and Class B. Class A will contain a rectangle whereas Class B will contain a circle.

5. What is Reinforcement learning?

Reinforcement learning is a method by which agents learned from the environment. An agent gets a positive and negative response from the environment. For each good action, an agent gets a positive response (reward) whereas for bad action, an agent gets a negative response (punishment).

6. What are Machine Learning algorithms?

Machine learning algorithm helps to extract useful pattern from the data.

Machine Learning algorithms:

Linear Regression:

It predicts a continuous dependent variable based on an independent variable.

Equation: Y = b0 + b1X

Logistic Regression:

It is a classification algorithm that classifies dependent variables based on independent variables.

Decision Tree:

It is a supervised learning algorithm and is used for classification problems.

K-nearest neighbor (KNN):

This algorithm addresses both classification and regression problems. But mostly used for classification.

Support Vector Machine (SVM):

It is also used for both classification and regression and even outlier detection.

This method works by drawing a random line between the data points.

Then, find the optimal hyperplane that has the maximum margin.

7. Is logistic Regression a classification or regression problem?

Logistic Regression is used for classification problems. The name is Logistic Regression because its primary technique is similar to Linear Regression.

8. What is 'Naive' in Naive Bayes?

The Naive Bayes is a supervised learning algorithm. The Naive Bayes is also referred to as Naive because it assumes that each input variable is independent. This may be impractical in the real world because some input variables are truly possible to be dependent on other variables.

The Naive Bayes is based on Bayes Theorem and follows a probabilistic approach.

We can written it as:

where,

P(A|B) = conditional probability of an event A occurring, given an event B.

P(A) = likelihood of an event A occurring.

P(B) = likelihood of an event B occurring.

P(B|A) = conditional probability of an event B occurring, given an event A.

9. What do you mean by bias and variance?

Both are errors in the ML algorithm.
Bias is the difference between the prediction of the values by the ML model and the correct value.
An algorithm should always be low-biased to avoid underfitting.
Variance refers to changes in the model when using different portions of the training data set.
High bias will result in low variance, and vice versa.
The balance between the bias error and variance error is called the bias-variance tradeoff.

10. What do you mean by covariance and correlation?

Covariance measures how two variables are related to each other and how one would vary with respect to change in the other variable.

Correlation identifies the relationship between two random variables.

1: positive relationship

-1: negative relationship

0: two variables are independent of each other.

11. What are precision, recall, and F1-score?

Precision refers to positive predictive value i.e. amount of accurate positives our model claims compared to the number of positives it actually claims.

Recall refers to the true positive rate i.e. the number of positives our model claims compared to the actual number of positives there are in our entire data.

F1-score is a harmonic mean between Precision and Recall.

12. What is Type I error and Type II error?

Type I error refers to a false positive i.e. falsely claiming that something has occurred.

Type II error refers to false negatives i.e. claiming nothing has happened when it has.

13. What is ROC?

ROC stands for the Receiver operating characteristics curve. It is a graphical representation of true positive and false positive rates at various thresholds.

14. What is PCA?

PCA stands for Principle Component Analysis. It is used for dimension reduction. PCA is mainly used in preprocessing step especially when there are linear relations between features.

15. What is cross-validation?

Cross-validation is a statistical method for estimating the effectiveness or accuracy of machine learning models. It is a method of splitting our data into three parts: training, testing, and validation set. So, in cross-validation, we train our model using the subset of the dataset and then evaluate using the complementary subset of the dataset.

We use cross-validation to detect overfitting i.e. failure to generate a pattern.

16. What distinguishes classification from regression?

17. What are overfitting and underfitting?

A model is said to be overfitted when the model is fitted to training data too well. When a model is trained with a large amount of data, it begins to learn from noise and wrong information. To solve this, we need to resample the data and estimate the model accuracy using techniques like k-fold cross-validation.

A model is said to be underfitted when the model is unable to understand or extract patterns from the data. It happens when the algorithm used to extract patterns is inappropriate or the data to train the model is very less.

Machine Learning full course | Types, classes, and algorithms [Conclusion]

Machine learning is a very interesting course because we know how humans learn but we may not know how machines do. The topics covered in this course (in video) are:

What is machine learning?
Machine learning forms/types: Supervised learning, Unsupervised learning, and Reinforcement learning
Classes of supervised learning: Classification and Regression
Machine learning algorithms: Linear Regression, Logistic Regression, Decision Tree, KNN (K-nearest neighbor), and SVM (Support vector machine).
Implementing the theoretical concept of linear regression into the program.

Machine Learning:

ML is a subset or field of artificial intelligence which enables computers to learn automatically from data.
ML extracts pattern out of raw data to make decisions.

Types of machine learning:

1. Supervised learning:

2. Unsupervised learning:

3. Reinforcement learning

Classes of Supervised Learning:

Classification
Regression

Machine learning algorithm:

Linear regression: It predicts a continuous dependent variable based on an independent variable.

Logistic Regression: It is a classification algorithm that uses independent variables to classify dependent variables.

Decision tree: It is a supervised learning algorithm used for classification problems.

KNN: It is used for both classification and regression problems. But mostly used for classification.

SVM: It is also used for both classification and regression problems.

Draw a random line between the data points to find an optimal plane.

Finding the optimal hyperplane that has a maximum margin.

Linear regression

1: Import Packages and Classes

import numpy as np  #pip install numpy
import matplotlib.pyplot as plt   #pip install matplotlib
from sklearn.linear_model import LinearRegression  #pip install scikit-learn

2: Collect the data

speed = np.array([30,32,36,40,45,48,51,55,62,70])
salary = np.array([1300,1400,1500,1700,2100,2500,2800,2900,3400,3500])

3: Visualize the data

plt.plot(speed, salary, 'o')   #creating scatter plot
b1,b0 = np.polyfit(speed, salary, 1)  #calculate b0 and b1
plt.plot(speed, b0+b1*speed)   #add best fit line
plt.xlabel("typing speed")
plt.ylabel("Salary")

4: Reshaping the data

speed = speed.reshape(-1,1)
print(speed)

5: Create a model and fit the data

model = LinearRegression()
model.fit(speed, salary)

6: Get the result

# print(b0)
print(model.intercept_)

7: Prediction

new_speed = np.array([72,75,80]).reshape(-1,1)

# new_salary = b0 + b1*new_speed
new_salary = model.predict(new_speed)

print(new_salary)

Machine Learning most important Interview Questions 2025 | ML Full course