![Mark Keith](/img/default-banner.jpg)
- Видео 569
- Просмотров 2 980 084
Mark Keith
США
Добавлен 29 ноя 2009
This channel contains all of the video tutorials I make for the classes I teach and any other students interested in the topics. Feel free to request particular examples and topics. Currently it contains examples of data analytics, Azure Machine Learning Studio, Tableau, SQL, database diagramming, ERD, VBA, HTML, CSS, Excel, statistics, and more.
You can see my academic research publications on Google Scholar here: scholar.google.com/citations?user=oo9iLzcAAAAJ&hl=en
You can see my academic research publications on Google Scholar here: scholar.google.com/citations?user=oo9iLzcAAAAJ&hl=en
Python Data Science: Automating Cleaning: Dropping Rows Versus Columns when Addressing Missing Data
Python Data Science: Automating Cleaning: Dropping Rows Versus Columns when Addressing Missing Data
Datasets:
www.kaggle.com/datasets/teertha/ushealthinsurancedataset
www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season
www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
www.ishelp.info/data/listings.csv
Datasets:
www.kaggle.com/datasets/teertha/ushealthinsurancedataset
www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season
www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
www.ishelp.info/data/listings.csv
Просмотров: 595
Видео
Python Data Science: Automating Cleaning: Addressing Skewness with Math Transformations
Просмотров 2514 месяца назад
Python Data Science: Automating Cleaning: Addressing Skewness with Math Transformations Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating Cleaning: Managing Outliers All-Features-At-Once DBSCAN clustering
Просмотров 4354 месяца назад
Python Data Science: Automating Cleaning: Managing Outliers All-Features-At-Once DBSCAN clustering Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating Cleaning: Managing Outliers One-Feature-At-A-Time
Просмотров 1194 месяца назад
Python Data Science: Automating Cleaning: Managing Outliers One-Feature-At-A-Time Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating Cleaning: Bin Low Count Categorical Groups
Просмотров 884 месяца назад
Python Data Science: Automating Cleaning: Bin Low Count Categorical Groups Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating Cleaning: Parsing Dates
Просмотров 7214 месяца назад
Python Data Science: Automating Cleaning: Parsing Dates Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating Cleaning: Data Wrangling: Empty, Single Value, Primary Key columns
Просмотров 4824 месяца назад
Python Data Science: Automating Cleaning: Data Wrangling: Empty, Single Value, Primary Key columns Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction www.ishelp.info/data/listings.csv
Python Data Science: Automating EDA: Bivariate Relationships: Part 5: Add your functions to .py
Просмотров 1364 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Bivariate Relationships: Part 5: Add your functions to .py file to import as a package Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Python Data Science: Automating EDA: Bivariate Relationships: Part 4: Including group comparisons
Просмотров 1824 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Bivariate Relationships: Part 4: Including group comparisons Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Python Data Science: Automating EDA: Bivariate Relationships: Part 3: Choosing the best correlation
Просмотров 1624 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Bivariate Relationships: Part 3: Choosing the best correlation for N2N relationships (Pearson, Kendall, Spearman) Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Python Data Science: Automating EDA: Bivariate Relationships: Part 2: Visualizations
Просмотров 2204 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Bivariate Relationships: Part 2: Visualizations Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Python Data Science: Automating EDA: Bivariate Relationships: Part 1: Basic statistics summary
Просмотров 5014 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Bivariate Relationships: Part 1: Stats Datasets: www.kaggle.com/datasets/teertha/ushealthinsurancedataset www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Python Data Science: Automating EDA: Univariate Statistics and Visualizations
Просмотров 6754 месяца назад
Python Data Science: Automating Exploratory Data Analysis: Univariate Statistics and Visualizations Health insurance data: www.kaggle.com/datasets/willianoliveiragibin/healthcare-insurance NBA salary data: www.kaggle.com/datasets/jamiewelsh2/nba-player-salaries-2022-23-season Airline satisfaction: www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction 0:00 - Introduction and concept...
API with headers in Python: Polygon.io stock market data
Просмотров 69610 месяцев назад
API with headers in Python: Polygon.io stock market data
Tableau: Advanced Filtering Top and Bottom 10
Просмотров 4,1 тыс.Год назад
Tableau: Advanced Filtering Top and Bottom 10
Tableau: Line and Area Charts (data over time)
Просмотров 3 тыс.Год назад
Tableau: Line and Area Charts (data over time)
Tableau: Bar charts, 1 dimension and 1 measure
Просмотров 2,2 тыс.Год назад
Tableau: Bar charts, 1 dimension and 1 measure
Flowcharting loops, loop counters, kill switches (updated)
Просмотров 6 тыс.2 года назад
Flowcharting loops, loop counters, kill switches (updated)
Data Retrieval: Scrape data from auto-scrolling websites
Просмотров 1,1 тыс.2 года назад
Data Retrieval: Scrape data from auto-scrolling websites
Python: Introduction to Google Colab
Просмотров 7 тыс.2 года назад
Python: Introduction to Google Colab
Python: MLR/OLS assumptions normality multicollinearity VIF
Просмотров 2,8 тыс.2 года назад
Python: MLR/OLS assumptions normality multicollinearity VIF
Python: MLR, OLS, standardization, normalization
Просмотров 4 тыс.2 года назад
Python: MLR, OLS, standardization, normalization
Python: MLR with categorical values; dummy codes
Просмотров 8 тыс.2 года назад
Python: MLR with categorical values; dummy codes
awesome vid
Mark, you should start using `for` loop instead of writing each line with `print()`. For example, try this code: --- for col in df.columns: print(f"{col}={df[col].nunique()}") --- and you will get the same output in just two lines.
Definitely 👍 I use these videos in a book for students who are coding for the first time. At this point, I haven’t gotten to loops yet. But then, a bit later, I use your exact code to make the point that automation saves them a lot of time.
thank you very much
I appreciate your breakdown of even the slightest things in the code, even tho I already know how those things function. It puts a smile on my face. I know someone else would appreciate that, especially beginners in Python but not in EDA.
Hello Sir, I have a question that in case of high skew value why we just take log for right skewed and vice versa for other case of skew as well ? Does this method will help rather than just adding the extra column in data frame
Probably just for simplicity. Really, we should use whatever transformation gets the skewness score closest to zero. Sometimes that’s a log transformation. Sometimes it’s an exponent.
Hey Brother i really need that ..... I have one request please sum up these video as the part of a playlist
Hi,do you guide for creating this database in access
Please add the notebook so that it will save people who are in need and don't have enough time to complete project
Thank you Mark for these wonderful videos. Will this playlist continue? If the answer is yes, how long will it take to complete and how many videos will there be?
Thank you, I often start out with big plans but then get busy teaching during the semester and run out of time. I’ll be continuing this series though at some point and finish through modeling and deployment of some kind
its nice man keep it up
tysm <3
Why do you use ANOVA and the r coefficient to calculate association of education and bike purchasing? Isn't ANOVA only used for categorical to continuous relationships and the r coefficient used for continuous to continuous relationships? Wouldn't the chi-square test be a much better fit?
Ordinal data are one of those tricky situations when it comes to bivariate relationships. You’re right that ANOVA/r are not ideal. But Chi-square isn’t perfect either since there is clearly an order to education which gets ignored by chi-square. In this case, it’s just an example of how to calculate ANOVA
@@MarkKeith Ahh! Thanks for the help!
This video series has been super helpful!!! would recommend
The Burrito exercise is something.
Wow your YT is a treasure trove I wish I had found sooner. But glad I did now
Mark, how can we auto upload data via python?
Is there a way without publishing to Tableau Public? Private data, I want to embed it into a web page that requires login to access.
I really loved your videos and my understanding for Python has also increased. Thankyou ☺
Hi Mark, thanks for the video and the playlist. I don't know, if you tell it after 18:25, but it is a lot easier to use a for - loop instead of copy and paste the code in each cell. What i made is in example: for i in df.columns: print(f'{i}: {df[i].dtype}')
Hi, when i'm running ordinal regression as the way u did, it has an error 'Memory exhausted'. Also do you know how to run xgboost on Azure ML? I can run XBG on gg colab but dont know how to apply into Azure. Pls help me i need to finish this to graduate (I learn IB and know nothing about coding sos)
1970.1.1 - first date in Linux
Thank you! That should be a date I memorize
Thx very much i was not understanding li tag and tomorow i had exam i learnt it and scored. 49/50
Thank you so much! You made it so easy to calculate VIF.
What you are doing here is so nice. However, I wonder if you could also show us how to run a rolling regression while holding the window size constant in time series regressions. Obviously, the expected outputs are all regression coefficients (with intercept with and without), t-values, R^2, STYEX, etc. I don't know why. But not many people seem interested in showing this Rolling Regression algorithm. I hope you will take the time and help us. BTW, your presentation is so understandable. Thanks in advance.
How do you get suggestive code? If I type df. It doesn't suggest the next thing like yours does.
I’m paying $10 a month for the Colab pro version.
Sorry, I just realized that this is an older video before the Colab AI. Sometimes I get the suggestive code and sometimes I don’t. I think there is a shortcut to make it pop up though.
@@MarkKeith yeah I figured out it was from colab. I was jusing jupyter notebooks. Thanks for making the tutorial.
@@The-narrow-gate you can have suggestion in jupyter too. Just google it
36:39 shade is depreciated. use fill=True instead
Hey I have some blocker how to implement code to save custom view using tableau JS API
😻 P R O M O S M
time for an update! This does not line up with MyEducator anymore.
How do you deal with negative numbers, with the natural log?
Add the min value + 1 to all values to get everything equal to 1 and above. Then you can take the natural log. But when you make predictions, you’ll have to reverse the natural log and then subtract the min - 1 to get the prediction in the original scale
love to learn this video, would you like to share its google colab?
Here you go: colab.research.google.com/drive/1B4duJRln-Gy_dajTvxDKB52J7hU42cOn?usp=sharing
I bought your course Intro to Data Analytics with python. Worth every penny!
Thank you sincerely 👍
Where can I buy it
I was doing a course on Udemy, had a doubt about standardization, and found your video in which you've used the exact same dataset!
thank this video has been helpful for me
I love your videos man, your changing my life😊
Thanksss!!
being waiting for your contents for a long time, finally here we go💥
Awesome content, thank you for this! What's the website that you use?
Thank you! Google Colab; it’s free with your gmail account
Thank you bro it is so much helpful but I have question I do not understand the univariant in this data set we have numerical and categorical it is not seems ! thank you again from Saudi Arabia
Hey, thanks for the comment and question! Can you tell me a bit more about what you’re asking? Are you wondering about what to look for between numeric versus categorical features?
I think I missed the chi-square video - cat/cat feature selection
Does this work the same way with a numerical feature and a categorical label?
Yep, doesn’t matter of the relationship is C2N or N2C
i need subtitle in pt br :(
Will the natural log always result in the closest to 0 skew? That's super cool!
Good question; no, sometimes it will over-correct and a square root or cubed root will be better. Whatever gets you closest to zero is best
@@MarkKeith Good to know, thank you!
great content! can you also talk about how to interpret these results? what can we do with all the concepts you discussed
That’s a good question, but the answer is a bit long for the comments. Basically, those results give you an idea about the cleanliness and distributions of the features. For example, if the skewness is too high, then you know that you’ll either need to transform the feature or choose a modeling algorithm that doesn’t depend on linear assumptions. If categorical features have a large number of unique values, then you’ll need to check to make sure that every value is adequately represented or you’ll need to do some grouping. I talk about a lot of these issues in later videos when I get to the modeling phase. Thanks!
tHANKS A Lot
thanks my dude
Best Tutorials Out there in the whole world wide Web
perfetto
You saved the day!! No developers were around but I found you and learned how to utilize the aggregate count function. I am sure there is more to this but this help me get my report ready for review . Thanks so much!
wow! awesome er diagram. what's the name of the app you use to draw the er diagram?
Lucidchart.com