Item-based collaborative filtering is the recommendation system to use the similarity between items using the ratings by users. In this article, I explain its basic concept and practice how to make the item-based collaborative filtering using Python.
The fundamental assumption for this method is that a user gives similar ratings to similar movies. Consider the following table for movie ratings.
Content-based recommender is the system to rely on the similarity of items when it recommends items to users. For example, when a user likes a movie, the system finds and recommends movies which have more similar features to the movie the user likes. (Feature 1)
Part 2: Creating a database and tables in MySQL, and Importing files into tables
A Relational Database Management System (RDMS) is a program that allows us to create, update, and manage a relational database. Structured Query Language (SQL) is a programming language used to communicate with data stored in the RDMS. The SQL skill for using a RDMA is required for many data-related positions these days. In the social media forums like Quora or Reddit, there are many people who search for a public database for practicing their SQL querying skills. However, although there are many public data sets in…
Part 1: Creating an Entity Relational Diagram (ERD)
A Relational Database Management System (RDMS) is a program that allows us to create, update, and manage a relational database. Structured Query Language (SQL) is a programming language used to communicate with data stored in the RDMS. The SQL skill for using a RDMA is required for many data-related positions these days. In the social media forums like Quora or Reddit, there are many people who search for a public database for practicing their SQL querying skills. However, although there are many public data sets in a single spreadsheet, there are not…
Analysis and prediction for the housing market prices using Cross Validation and Grid Search in several regression models
In this article, I analyze the factors related to housing prices in Melbourne and perform the predictions for the housing prices using several machine learning techniques: Linear Regression, Ridge Regression, K-Nearest Neighbors (hereafter, KNN), and Decision Tree. Using the methods of the Cross Validation and Grid Search techniques, I find the optimal values for hyper parameters in each model, and compare the results to find the best machine learning model to predict the housing prices in Melbourne.
Balance between Concentration and Efficiency
It is well known that countries become urbanized as their economies grow. Here, I will explore how the world has became urbanized over time, and how urbanization looks different according to income levels.
import numpy as np
import pandas as pd
import geopandas as gpd
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable# read the file of 'indicators'
data = pd.read_csv('...\Indicators.csv')
The data source for this analysis is the data set called ‘World Development Indicators’ in the Kaggle datasets. Among the several files under the title, ‘indicator.csv’ was used.
This is the analysis about the determinants of the period between movie trailer release dates and actual movie release dates.
Let’s define the Promotion Period as the period between the official movie trailer release date and the movie release date. The picture above visually shows what the promotion period used in this analysis means. The next picture displays the distribution of this promotion period for the movies released in the US between 2017 and 2019.
Scrapy is a Python tool for scraping web pages. This article covers how to scrape web pages with Scrapy. The website that will be scraped here is the Box Office Mojo. Especially, I will extract information about revenues, distributors, genres, etc.
The goal of this project is to check all the movies released in the US during certain periods of time and extract useful information about the individual movies. If you check any movie through the Box Office Mojo, you can see all the related information about the movie like the following format:
Business Analyst at Samsung Electronics America. PhD in Economics from University of California, Davis.