50 machine learning questions & answers for Beginners
import matplotlib.animation as animation
from matplotlib.figure import Figure
import plotly.figure_factory as ff
import matplotlib.pylab as pylab
from ipywidgets import interact
import plotly.graph_objs as go
import plotly.offline as py
from random import randint
from plotly import tools
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib
import warnings
import string
import numpy
import csv
import os
1-how to import your data?
What you have in your data folder?
print(os.listdir("../input/"))
import all of your data
titanic_train=pd.read_csv('../input/train.csv')
titanic_test=pd.read_csv('../input/test.csv')
Or import just %10 of your data
titanic_train2=pd.read_csv('../input/train.csv',nrows=1000)
How to see the size of your data:
print("Train: rows:{} columns:{}".format(titanic_train.shape[0], titanic_train.shape[1]))
2- How to check missed data?
titanic_train.isna().sum()
or you can use below code
total = titanic_train.isnull().sum().sort_values(ascending=False)
percent = (titanic_train.isnull().sum()/titanic_train.isnull().count()).sort_values(ascending=False)
missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_data.head(20)
3- How to view the statistical characteristics of the data?
titanic_train.describe()
or just for one column
titanic_train['Age'].describe()
with a another shape
titanic_train.Age.describe()
4- How check the column’s name?
titanic_train.columns
or you can the check the column name with another ways too
titanic_train.head()
5- how to view randomly your data set ?
titanic_train.sample(5)
6-How random row selection in Pandas dataframe?
titanic_train.sample(frac=0.007)
7- How to copy a column and drop it ?
PassengerId=titanic_train['PassengerId'].copy()
PassengerId.head()
type(PassengerId)
titanic_train=titanic_train.drop('PassengerId',1)
titanic_train.head()
titanic_train=pd.read_csv('../input/train.csv')
8- How to check out last 5 row of the dataset?
we use tail() function
titanic_train.tail()
9- How to concatenation operations along an axis?
all_data = pd.concat((titanic_train.loc[:,'Pclass':'Embarked'],
titanic_test.loc[:,'Pclass':'Embarked']))
all_data.head()
titanic_train.shape
titanic_test.shape
all_data.shape
10- How to see unique values for a culomns?
titanic_train['Sex'].unique()
titanic_train['Cabin'].unique()
titanic_train['Pclass'].unique()
11- How to perform some query on your datasets?
titanic_train[titanic_train['Age']>70]
titanic_train[titanic_train['Pclass']==1]