What is Data Science ?
Data is just a collection of information and Data Science is simply the science of data. It is the area of study which involves various scientific methods, processes and algorithms to extract insights from data.
Why Data Science ?
“Data is the new OIL.”
A report by Forbes states that we generate around 2.5 quintillion bytes of data every day. Around 90 per cent of the world’s data was generated in the last 2 years. With the availability of such huge amounts of data every data, more and more companies are looking to use this data to solve various business problems. Companies are looking to find the right solutions to their problems using exploratory analysis. And thus, the need for people who can do this job is also increasing day by day. It is estimated that around 11.5 new data science jobs will be created in the US by 2026.
Applications of Data Science
Data Science finds its applications in almost every company nowadays. The applications are vast and are increasing rapidly day by day. Some of the applications include:
- Healthcare – Medical Image Analysis, Care Recommendations, Virtual Assistance for patience and customer support.
- Internet Search – Accurate Google Search Results, Target Advertisement, Website Recommendations
- Finance – Risk Analysis, Fraud Detection, Customer Identity Verification
While these are very few, data science also finds applications in route optimization(Google Maps), facial recognition, crime prediction and much more.
Process of Data Science:
Data Science has the following steps involved:
- Data Extraction
- Data Preprocessing
- Data Analysis
- Data Modelling and Prediction
- Evaluation and Fine Tuning
- Data Visualization
This is the process of collection of data from various sources. The sources can be anything where data is generated. Examples include IoT sensors, historical records, survey data, internet data etc. While these are raw sources of data, processed data is freely available in various sources on the internet. Websites like Kaggle, UCI have huge collections of datasets. Google also has a search engine specifically for datasets.
Raw data is generally unstructured and needs to be processed before it can be analysed to gain insights. Various discrepancies such as null values, extreme values can deviate our analysis and negatively impact the outcome. All such discrepancies are cleared in this step.
This is a major step in the process. In this step, we use various algorithms and methods to bring out the hidden patterns in our dataset. Analysis can be generally classified into four types:
● Descriptive Analysis — knowing what happened.
● Diagnostic Analysis — finding out the reason why something happened.
● Predictive Analysis — what is likely to happen?
● Prescriptive Analysis — What should we do for something to happen?
The types of analysis you should do depends on the business problem you want to solve.
Data Modelling and Prediction.
Although the analysis part gives us enough insights on the problem it is still not enough in many cases. Predicting the occurrence of various actions and taking steps accordingly is very important. Based on the analysis done, the major factors influencing the results are taken into consideration. Various algorithms are used to find the best model that can predict the outcome most accurately. The algorithms include Linear Regression, SVM, Naive Bayes, Decision Tree etc. The type of algorithm needed depends on the type of data and the results we want to get.
Evaluation and Fine Tuning
After the modelling is done, it is equally important for us to assess how accurately our model is performing. For this process, several metrics can be taken into consideration. But again the metrics needed will be dataset-specific. Although accuracy is majorly used it cannot be trusted on every dataset. For example, on datasets like fraud detection, data is highly skewed and accuracy might deceive us while the results need not be promising enough. In such cases, other metrics like False Positive rate and False Negative Rate are considered for evaluation.
The model generally goes through a number of cycles involving evaluation and finetuning to get to the final best model.
Visualisation is the graphical representation of information and data. It is the way in which we showcase our results by using visual elements like charts, maps and graphs. Adding an image or a graph to our results greatly increases the understanding of the user. An Effective Data Visualisation is very important to data analysis and makes the analysis complete.
Even though we put this in the last step, this step is generally used at all the levels depending on the need.
This is a background on the topic qwakho.com covers everything about it click read more below for lucrative information