Mark Keith
Mark Keith
  • Видео 569
  • Просмотров 2 980 084
Python Data Science: Automating Cleaning: Dropping Rows Versus Columns when Addressing Missing Data
Python Data Science: Automating Cleaning: Dropping Rows Versus Columns when Addressing Missing Data
Datasets:
www.kaggle.com/datasets/teertha/ushealthinsurancedataset
www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season
www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
www.ishelp.info/data/listings.csv
Просмотров: 595

Видео

Python Data Science: Automating Cleaning: Addressing Skewness with Math Transformations
Просмотров 2514 месяца назад
Python Data Science: Automating Cleaning: Addressing Skewness with Math Transformations Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating Cleaning: Managing Outliers All-Features-At-Once DBSCAN clustering
Просмотров 4354 месяца назад
Python Data Science: Automating Cleaning: Managing Outliers All-Features-At-Once DBSCAN clustering Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating Cleaning: Managing Outliers One-Feature-At-A-Time
Просмотров 1194 месяца назад
Python Data Science: Automating Cleaning: Managing Outliers One-Feature-At-A-Time Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating Cleaning: Bin Low Count Categorical Groups
Просмотров 884 месяца назад
Python Data Science: Automating Cleaning: Bin Low Count Categorical Groups Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating Cleaning: Parsing Dates
Просмотров 7214 месяца назад
Python Data Science: Automating Cleaning: Parsing Dates Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating Cleaning: Data Wrangling: Empty, Single Value, Primary Key columns
Просмотров 4824 месяца назад
Python Data Science: Automating Cleaning: Data Wrangling: Empty, Single Value, Primary Key columns Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating EDA: Bivariate Relationships: Part 5: Add your functions to .py
Просмотров 1364 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Bivariate Relationships: Part 5: Add your functions to .py file to import as a package Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Python Data Science: Automating EDA: Bivariate Relationships: Part 4: Including group comparisons
Просмотров 1824 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Bivariate Relationships: Part 4: Including group comparisons Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Python Data Science: Automating EDA: Bivariate Relationships: Part 3: Choosing the best correlation
Просмотров 1624 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Bivariate Relationships: Part 3: Choosing the best correlation for N2N relationships (Pearson, Kendall, Spearman) Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Python Data Science: Automating EDA: Bivariate Relationships: Part 2: Visualizations
Просмотров 2204 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Bivariate Relationships: Part 2: Visualizations Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Python Data Science: Automating EDA: Bivariate Relationships: Part 1: Basic statistics summary
Просмотров 5014 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Bivariate Relationships: Part 1: Stats Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Python Data Science: Automating EDA: Univariate Statistics and Visualizations
Просмотров 6754 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Univariate Statistics and Visualizations Health insurance data: www.kaggle.com/datasets/willianoliveiragibin/healthcare-insurance NBA salary data: www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season Airline satisfaction: www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction 0:00 - Introduction and concept...
API with headers in Python: Polygon.io stock market data
Просмотров 69610 месяцев назад
API with headers in Python: Polygon.io stock market data
Tableau: Forecasting
Просмотров 2,9 тыс.Год назад
Tableau: Forecasting
Tableau: Building Dashboards
Просмотров 1,5 тыс.Год назад
Tableau: Building Dashboards
Tableau: Advanced Filtering Top and Bottom 10
Просмотров 4,1 тыс.Год назад
Tableau: Advanced Filtering Top and Bottom 10
Tableau: Line and Area Charts (data over time)
Просмотров 3 тыс.Год назад
Tableau: Line and Area Charts (data over time)
Tableau: Geography map
Просмотров 1,9 тыс.Год назад
Tableau: Geography map
Tableau: Heatmaps, 2 dimensions
Просмотров 1,4 тыс.Год назад
Tableau: Heatmaps, 2 dimensions
Tableau: Scatterplots, 2 measures
Просмотров 1,7 тыс.Год назад
Tableau: Scatterplots, 2 measures
Tableau: Bar charts, 1 dimension and 1 measure
Просмотров 2,2 тыс.Год назад
Tableau: Bar charts, 1 dimension and 1 measure
Tableau: Workspace Area
Просмотров 2,2 тыс.Год назад
Tableau: Workspace Area
Tableau: Connecting to Excel Data
Просмотров 3,9 тыс.Год назад
Tableau: Connecting to Excel Data
Flowcharting loops, loop counters, kill switches (updated)
Просмотров 6 тыс.2 года назад
Flowcharting loops, loop counters, kill switches (updated)
Data Retrieval: Scrape data from auto-scrolling websites
Просмотров 1,1 тыс.2 года назад
Data Retrieval: Scrape data from auto-scrolling websites
Python: Introduction to Google Colab
Просмотров 7 тыс.2 года назад
Python: Introduction to Google Colab
Python: MLR/OLS assumptions normality multicollinearity VIF
Просмотров 2,8 тыс.2 года назад
Python: MLR/OLS assumptions normality multicollinearity VIF
Python: MLR, OLS, standardization, normalization
Просмотров 4 тыс.2 года назад
Python: MLR, OLS, standardization, normalization
Python: MLR with categorical values; dummy codes
Просмотров 8 тыс.2 года назад
Python: MLR with categorical values; dummy codes

Комментарии

  • @sebs178
    @sebs178 3 дня назад

    awesome vid

  • @Farrukhw
    @Farrukhw 4 дня назад

    Mark, you should start using `for` loop instead of writing each line with `print()`. For example, try this code: --- for col in df.columns: print(f"{col}={df[col].nunique()}") --- and you will get the same output in just two lines.

    • @MarkKeith
      @MarkKeith 3 дня назад

      Definitely 👍 I use these videos in a book for students who are coding for the first time. At this point, I haven’t gotten to loops yet. But then, a bit later, I use your exact code to make the point that automation saves them a lot of time.

  • @muhamadsatriaputra5905
    @muhamadsatriaputra5905 6 дней назад

    thank you very much

  • @safaa3618
    @safaa3618 11 дней назад

    I appreciate your breakdown of even the slightest things in the code, even tho I already know how those things function. It puts a smile on my face. I know someone else would appreciate that, especially beginners in Python but not in EDA.

  • @anupamvashishtha1970
    @anupamvashishtha1970 11 дней назад

    Hello Sir, I have a question that in case of high skew value why we just take log for right skewed and vice versa for other case of skew as well ? Does this method will help rather than just adding the extra column in data frame

    • @MarkKeith
      @MarkKeith 11 дней назад

      Probably just for simplicity. Really, we should use whatever transformation gets the skewness score closest to zero. Sometimes that’s a log transformation. Sometimes it’s an exponent.

  • @anupamvashishtha1970
    @anupamvashishtha1970 12 дней назад

    Hey Brother i really need that ..... I have one request please sum up these video as the part of a playlist

  • @nikahmadfakri9466
    @nikahmadfakri9466 12 дней назад

    Hi,do you guide for creating this database in access

  • @sandeep003-oz3hw
    @sandeep003-oz3hw 21 день назад

    Please add the notebook so that it will save people who are in need and don't have enough time to complete project

  • @MehdiRoozbahan
    @MehdiRoozbahan 24 дня назад

    Thank you Mark for these wonderful videos. Will this playlist continue? If the answer is yes, how long will it take to complete and how many videos will there be?

    • @MarkKeith
      @MarkKeith 24 дня назад

      Thank you, I often start out with big plans but then get busy teaching during the semester and run out of time. I’ll be continuing this series though at some point and finish through modeling and deployment of some kind

  • @razchaurasia630
    @razchaurasia630 26 дней назад

    its nice man keep it up

  • @GabrielTobing
    @GabrielTobing Месяц назад

    tysm <3

  • @sajawalhassan1f12
    @sajawalhassan1f12 2 месяца назад

    Why do you use ANOVA and the r coefficient to calculate association of education and bike purchasing? Isn't ANOVA only used for categorical to continuous relationships and the r coefficient used for continuous to continuous relationships? Wouldn't the chi-square test be a much better fit?

    • @MarkKeith
      @MarkKeith Месяц назад

      Ordinal data are one of those tricky situations when it comes to bivariate relationships. You’re right that ANOVA/r are not ideal. But Chi-square isn’t perfect either since there is clearly an order to education which gets ignored by chi-square. In this case, it’s just an example of how to calculate ANOVA

    • @sajawalhassan1f12
      @sajawalhassan1f12 Месяц назад

      @@MarkKeith Ahh! Thanks for the help!

  • @lukewoods9950
    @lukewoods9950 2 месяца назад

    This video series has been super helpful!!! would recommend

  • @matlholelosaba4977
    @matlholelosaba4977 2 месяца назад

    The Burrito exercise is something.

  • @TheVersionController
    @TheVersionController 3 месяца назад

    Wow your YT is a treasure trove I wish I had found sooner. But glad I did now

  • @revellbrice
    @revellbrice 3 месяца назад

    Mark, how can we auto upload data via python?

  • @QuynhMata
    @QuynhMata 3 месяца назад

    Is there a way without publishing to Tableau Public? Private data, I want to embed it into a web page that requires login to access.

  • @archikasrivastava5843
    @archikasrivastava5843 3 месяца назад

    I really loved your videos and my understanding for Python has also increased. Thankyou ☺

  • @b_flieg7579
    @b_flieg7579 3 месяца назад

    Hi Mark, thanks for the video and the playlist. I don't know, if you tell it after 18:25, but it is a lot easier to use a for - loop instead of copy and paste the code in each cell. What i made is in example: for i in df.columns: print(f'{i}: {df[i].dtype}')

  • @user-lk2zb6rw5d
    @user-lk2zb6rw5d 3 месяца назад

    Hi, when i'm running ordinal regression as the way u did, it has an error 'Memory exhausted'. Also do you know how to run xgboost on Azure ML? I can run XBG on gg colab but dont know how to apply into Azure. Pls help me i need to finish this to graduate (I learn IB and know nothing about coding sos)

  • @sergiysergiy8875
    @sergiysergiy8875 3 месяца назад

    1970.1.1 - first date in Linux

    • @MarkKeith
      @MarkKeith 3 месяца назад

      Thank you! That should be a date I memorize

  • @RanjitKumar-od1uj
    @RanjitKumar-od1uj 3 месяца назад

    Thx very much i was not understanding li tag and tomorow i had exam i learnt it and scored. 49/50

  • @thanh-trucle3585
    @thanh-trucle3585 4 месяца назад

    Thank you so much! You made it so easy to calculate VIF.

  • @tomrhee1
    @tomrhee1 4 месяца назад

    What you are doing here is so nice. However, I wonder if you could also show us how to run a rolling regression while holding the window size constant in time series regressions. Obviously, the expected outputs are all regression coefficients (with intercept with and without), t-values, R^2, STYEX, etc. I don't know why. But not many people seem interested in showing this Rolling Regression algorithm. I hope you will take the time and help us. BTW, your presentation is so understandable. Thanks in advance.

  • @The-narrow-gate
    @The-narrow-gate 4 месяца назад

    How do you get suggestive code? If I type df. It doesn't suggest the next thing like yours does.

    • @MarkKeith
      @MarkKeith 4 месяца назад

      I’m paying $10 a month for the Colab pro version.

    • @MarkKeith
      @MarkKeith 4 месяца назад

      Sorry, I just realized that this is an older video before the Colab AI. Sometimes I get the suggestive code and sometimes I don’t. I think there is a shortcut to make it pop up though.

    • @The-narrow-gate
      @The-narrow-gate 4 месяца назад

      @@MarkKeith yeah I figured out it was from colab. I was jusing jupyter notebooks. Thanks for making the tutorial.

    • @blackmagic9921
      @blackmagic9921 2 месяца назад

      @@The-narrow-gate you can have suggestion in jupyter too. Just google it

  • @The-narrow-gate
    @The-narrow-gate 4 месяца назад

    36:39 shade is depreciated. use fill=True instead

  • @randomshits2742
    @randomshits2742 4 месяца назад

    Hey I have some blocker how to implement code to save custom view using tableau JS API

  • @allenhicks1833
    @allenhicks1833 4 месяца назад

    😻 P R O M O S M

  • @blue75blazer
    @blue75blazer 4 месяца назад

    time for an update! This does not line up with MyEducator anymore.

  • @ikayikay6643
    @ikayikay6643 4 месяца назад

    How do you deal with negative numbers, with the natural log?

    • @MarkKeith
      @MarkKeith 4 месяца назад

      Add the min value + 1 to all values to get everything equal to 1 and above. Then you can take the natural log. But when you make predictions, you’ll have to reverse the natural log and then subtract the min - 1 to get the prediction in the original scale

  • @gitasuputra8371
    @gitasuputra8371 4 месяца назад

    love to learn this video, would you like to share its google colab?

    • @MarkKeith
      @MarkKeith 4 месяца назад

      Here you go: colab.research.google.com/drive/1B4duJRln-Gy_dajTvxDKB52J7hU42cOn?usp=sharing

  • @brandonwarfield5611
    @brandonwarfield5611 4 месяца назад

    I bought your course Intro to Data Analytics with python. Worth every penny!

    • @MarkKeith
      @MarkKeith 4 месяца назад

      Thank you sincerely 👍

    • @leocasey8629
      @leocasey8629 4 месяца назад

      Where can I buy it

  • @nikhilrajendrashah4076
    @nikhilrajendrashah4076 4 месяца назад

    I was doing a course on Udemy, had a doubt about standardization, and found your video in which you've used the exact same dataset!

  • @armanmosikyan1508
    @armanmosikyan1508 4 месяца назад

    thank this video has been helpful for me

  • @adrianwariero
    @adrianwariero 4 месяца назад

    I love your videos man, your changing my life😊

  • @BoeSuwanan
    @BoeSuwanan 4 месяца назад

    Thanksss!!

  • @johnmlekwa1932
    @johnmlekwa1932 4 месяца назад

    being waiting for your contents for a long time, finally here we go💥

  • @kenrenji1
    @kenrenji1 4 месяца назад

    Awesome content, thank you for this! What's the website that you use?

    • @MarkKeith
      @MarkKeith 4 месяца назад

      Thank you! Google Colab; it’s free with your gmail account

  • @majdfahadal-thopiti8369
    @majdfahadal-thopiti8369 4 месяца назад

    Thank you bro it is so much helpful but I have question I do not understand the univariant in this data set we have numerical and categorical it is not seems ! thank you again from Saudi Arabia

    • @MarkKeith
      @MarkKeith 4 месяца назад

      Hey, thanks for the comment and question! Can you tell me a bit more about what you’re asking? Are you wondering about what to look for between numeric versus categorical features?

  • @phemmi1705
    @phemmi1705 5 месяцев назад

    I think I missed the chi-square video - cat/cat feature selection

  • @phemmi1705
    @phemmi1705 5 месяцев назад

    Does this work the same way with a numerical feature and a categorical label?

    • @MarkKeith
      @MarkKeith 5 месяцев назад

      Yep, doesn’t matter of the relationship is C2N or N2C

  • @Nizzy001
    @Nizzy001 5 месяцев назад

    i need subtitle in pt br :(

  • @russellpilling8749
    @russellpilling8749 5 месяцев назад

    Will the natural log always result in the closest to 0 skew? That's super cool!

    • @MarkKeith
      @MarkKeith 5 месяцев назад

      Good question; no, sometimes it will over-correct and a square root or cubed root will be better. Whatever gets you closest to zero is best

    • @russellpilling8749
      @russellpilling8749 5 месяцев назад

      @@MarkKeith Good to know, thank you!

  • @barulli87
    @barulli87 5 месяцев назад

    great content! can you also talk about how to interpret these results? what can we do with all the concepts you discussed

    • @MarkKeith
      @MarkKeith 5 месяцев назад

      That’s a good question, but the answer is a bit long for the comments. Basically, those results give you an idea about the cleanliness and distributions of the features. For example, if the skewness is too high, then you know that you’ll either need to transform the feature or choose a modeling algorithm that doesn’t depend on linear assumptions. If categorical features have a large number of unique values, then you’ll need to check to make sure that every value is adequately represented or you’ll need to do some grouping. I talk about a lot of these issues in later videos when I get to the modeling phase. Thanks!

  • @its_alveera_naaz
    @its_alveera_naaz 6 месяцев назад

    tHANKS A Lot

  • @ApPillon
    @ApPillon 6 месяцев назад

    thanks my dude

  • @ReadyF0RHeady
    @ReadyF0RHeady 6 месяцев назад

    Best Tutorials Out there in the whole world wide Web

  • @ReadyF0RHeady
    @ReadyF0RHeady 6 месяцев назад

    perfetto

  • @cthockeymom21
    @cthockeymom21 6 месяцев назад

    You saved the day!! No developers were around but I found you and learned how to utilize the aggregate count function. I am sure there is more to this but this help me get my report ready for review . Thanks so much!

  • @user-rg1mi6yl1d
    @user-rg1mi6yl1d 6 месяцев назад

    wow! awesome er diagram. what's the name of the app you use to draw the er diagram?

    • @MarkKeith
      @MarkKeith 6 месяцев назад

      Lucidchart.com