Porn for learning or How i practiced with BeautifulSoup, Pandas and NTLK.

Intro.

Hi!
My name is Eugene and I’m an analyst at BDCenter Digital
I will talk about my practice in studying Python libraries for data analysis in this article.
I did this in my spare time and enjoyed it immensely. Because I chose a porn site as the place to practice ofcourse!

Challenge. What will I do and why do I need a porn site for this?

I watch porn, there’s no shame in it. But I watch free content. I’ve never paid for porn. I wonder if paid porn differs from free porn. And how much does porn video price on average? For example, I go to pornhub and find out that the video prices $10. Is this a lot or a little? But the other one prices $5. Is this a lot or a little?

  1. Is there a correlation between the video prices and the number of subscribers that models have?
  2. What are the most popular words in video titles?
  3. What are the most popular words in the models nick names?

Part 1. ModelHub Scraping

I chose a site with paid content and a convenient catalog for this work. Ta-daaam: ModelHub!

Modelhub categories
Anal category number of videos
Anal category number of videos
  1. Video title;
  2. Model name;
  3. Model rank;
  4. Number of Model subscribers.
  1. Number of video views;
  2. Video duration;
  3. Publish date.
  1. Link to Model profile.
<ul class=”videos listing”>
<li class=”videoBox img fade”>

This part of the code completely:

modelhub_dataframe.to_csv('modelhub_dataframe.csv')

Part 2. Data analysis and plotting.

Now we have a dataframe consisting of almost 60 thousand entries.
First, we import the necessary libraries and open CSV.

data_raw.describe()
data.model_name.value_counts()[:20]
data.publish_date.min()'2015-06-04'

Part 3. Text analysis and Word clouds.

To find out what are the most popular words in video titles and model names, we’ll use the word clouds.

Data Analyst