What do we mean by machine learning?
If we split the term into two simple words, we get machine and second word we get is learning and if we use these two words to make a sentence, we can say that the process of training a machine and making it learn to automate a task is called machine learning.
If we explain it in more technical terms, we can define machine learning as the subset of Artificial intelligence that provides a machine an ability to learn from the raw data and predict the output using one of the many available machine learning algorithms.
What is testing and training data set?
Training data set – it is the data set on which the machine is trained. First task is to collect the data required to design the model from surveys or online sources. This collected data will not be perfect as it may contain many junk values or null values etc. which can harm the prediction of the model. Hence, we have to process this raw data to make it suitable to feed as training set into the machine. This process of cleansing the data is called data wrangling.
Testing data set – Once the model has been trained, we have to test its efficiency to predict the output before moving it to production. This is done using testing data. This testing data will have the sample values similar to training data but these values are not fed to system during training phase. In general system is trained on 90% of data and 10% of data is kept to test the performance of the model. This 10% of data which help to test the efficiency of the system is called as testing data set.
Simple example of machine learning
Let me explain machine learning through a simple example of even and odd numbers. Suppose we have a task to identify if the number is odd or even. From a programming perspective we can define a condition as if number % 2 is equal to zero then number is even else the number is odd. Here we have used programming to do this task.
Now suppose we want to accomplish this task without writing a program explicitly. This we can do by training the machine with raw data which we call as training data. We will provide set of data having relation as odd or even. e.g.
(X ,Y) => (2, even),(13, odd),(25, odd),(34, even),(37, odd),(38, even) ……
so on till we have sufficient samples. Now if we feed this data to the machine using learning algorithm, the machine will automatically find the relationship between X and Y. Now if give some numbers as input, the machine will automatically predict the output i.e. if the number is even or odd.
Real life example of machine learning
1. Email Spam filtering – Email service providers use ML to train the machine to filter out spam mails from authenticated mails to give better user experience.
2. Online customer support – Some service providers use this as bot which engage customers with their query till customer gets their answer or till any human support executive is available.
3. Product Recommendations – Online shopping websites observe your purchase trend and based on this data shows you recommendation of other types of product.
4. Types of ad shown – You might have noticed that google shows you the ads based on your searches or on YouTube you will find ads based on your search history or on the type of video you are watching.
Prediction and efficiency.
The ability to predict the output is totally dependent on the quality of raw data we have used to train the machine. It is evident that in real life scenarios it is difficult to get perfect data hence it’s very difficult to predict the output with 100% certainty but by following the data wrangling methods i.e. removing the junk data, using specific data, avoiding null values in testing set etc. can help the machine to predict the output close to 90% efficiency or even more.
Conclusion
If the training data is good, machine learning models are predicting the output with too much precision, in some scenarios when they are combined with AI, they have defeated human intelligence too. As we often see in news, IT companies are investing huge amount in their Data Science projects, we can see It is the future of industries.