Visualizing TV Shows Added in 2021

Overview

This is a quick data visualization project that consolidates four streaming services data sets from Shivam Bansal‘s Kaggle repo. The streaming services included are Amazon Prime, Disney Plus, Hulu, and Netflix. All datasets are current as of Dec 12, 2021.

I implement the project using the following tools and steps:

  1. Jupyter Notebook, Python – with the csv files downloaded, I clean and combine the various data sets
  2. Google Drive (Google Sheets) – upload the database for storage and later retrieval
  3. Tableau (Public) – use the built-in Google Sheets connector and visualize the data using a dashboard

Results

Jupyter Notebook

I use Pandas to transform the CSV files into dataframes and combine them. The initial result includes listings for movies and TV shows, so movies are later removed. Some columns for cohorts (such as release_decade) are also included in the final output to anticipate categorizations in the visualization. The file can be downloaded using the link below.

Tableau

This is my foray into a more ‘fluid’ layout, making strong use of floating objects (vs. tiled), and opting out of the default tabular headers (and creating my own labels using icons and other graphic cues).

(The live dashboard can be found here.)

Completed Course: Statistics for Data Science and Business Analysis

Course Details:

Concepts Covered:

  • Understand the fundamentals of statistics
  • Learn how to work with different types of data
  • How to plot different types of data
  • Calculate the measures of central tendency, asymmetry, and variability
  • Calculate correlation and covariance
  • Distinguish and work with different types of distributions
  • Estimate confidence intervals
  • Perform hypothesis testing
  • Make data driven decisions
  • Understand the mechanics of regression analysis
  • Carry out regression analysis
  • Use and understand dummy variables
  • Understand the concepts needed for data science even with Python and R!

Randomizing Basho’s Verse

To live poetry
is better than to write it.
— Basho

Overview

I’m not necessarily following Basho’s advice, but I figured if I were to stray, I’d make the detour pythonic. So I made a haiku generator called randomBasho, which uses a simple randomizer to derive ‘new’ haikus from over a hundred Basho haikus. My goals are as follows:

  • to put a fresh spin on something centuries old; to generate poems that still retain the same contemplative energy and poetic tone as their source, but unearth new interpretations or meanings behind Basho’s lines
  • from a technical standpoint, reoriented myself with basic Pythonic concepts, such as iterators, functions, and data sets
  • from a creative standpoint, use this as a starting ground to experiment and explore poetic possibilities in code

Initial Code

I wrote my first few attempts at randomizing with the goal of just wanting to get reacquainted with Python. Having learned some of these basic concepts a few years ago (in Python 2.x), I want to check my comfort level in Pythonic building blocks and in Python 3.x updates.

Step 1. Randomization

So I could focus on this task, I initially lessened the technical scope by creating three lists with distinct items. The lists are line1, line2 and line3, which respectively contain (and correspond to) the haikus’ first lines, second lines, and third lines.

By making the poem number and line number explicit in the item names in all of the lists below, I was able to test whether the randomization actually worked. My expected end result was a Frankenstein haiku, with lines from different poems.

import random

line1 = [
	'poem1_line1',
	'poem2_line1',
	'poem3_line1',
	'poem4_line1',
	'poem5_line1'
]

line2 = [
	'poem1_line2',
	'poem2_line2',
	'poem3_line2',
	'poem4_line2',
	'poem5_line2'
]

line3 = [
	'poem1_line3',
	'poem2_line3',
	'poem3_line3',
	'poem4_line3',
	'poem5_line3'
]

I used an iterator, since the task of printing a haiku line (after grabbing an item) needed to be repeated for each list.

def randombasho1(x,y,z):
	r = [x,y,z]
	j = 0
	for i in r:
		print r[j][random.randint(0,4)]
		j = j + 1

randombasho1(line1,line2,line3)

I dropped the j = 0 and j = j + 1 and opted for a range(0, 3), since my haikus all have 3 lines.

def randombasho2(x,y,z):
	r = [x,y,z]
	for i in range(0,3):
		print r[i][random.randint(0,4)]

randombasho2(line1,line2,line3)

I removed the r = [x, y, z] and used *args so I could repurpose the code in non-haiku use cases.

def randombasho3(*args):
	for i in range(0,len(args)):
		print args[i][random.randint(0,4)]

randombasho3(line1,line2,line3)

Additional Code

After I figured out the randomization code, the next milestone for me was to automatically generate lists line1, line2 and line3.

I decoupled the data (in this case, the haikus) from the code itself, so I can potentially reuse the randomization code in another application (for instance, another haiku poet, or potentially, another poetic form.)

The .py file of the haiku data had one list with three items: set1, set2, and set3. Each item is a long string, which contains multiple full haikus. The list below is from set1.

# Some of the poems in the rbasho02_haikus list
The door of thatched hut
Also changed the owner.
At the Doll\'s Festival.

Spring is passing.
The birds cry, and the fishes fill
With tears on their eyes.

Grasses in summer.
The warriors\' dreams
All that left.

The early summer rain
Leaves behind
Hikari-do.

Ah, tranquility!
Penetrating the very rock,
A cicada\'s voice.

The early summer rain,
Gathering it and fast
Mogami River.

To an old pond
A frog leaps in.
And the sound of the water.

Saying something,
The lip feeling cold.
The Autum wind.

Tieing the Chimaki,
Other hand hold,
Her bangs.

I ended up using .splitlines() to handle the splicing. Then I used index() to determine a line’s haiku position, and in what list they should be placed. As an homage, I named the final argument basho, which is a list of line1, line2, and line3.

import random
import rbasho02_haikus as rbh

haikus = rbh.haikus.splitlines()

h_list = []
h_dict = {}
line1 = []
line2 = []
line3 = []
basho = [line1,line2,line3]

for h in haikus:
	if len(h) > 0:
		h = h[0].upper() + h[1:]
		h_list.append(h)

for h in h_list:
	k = h_list.index(h) + 1
	h_dict.update({k: h})

for h in h_dict:
	if h % 3 == 0:
		line3.append(h_dict[h])
	elif h % 2 == 0:
		line2.append(h_dict[h])
	else:
		line1.append(h_dict[h])

def randomhaiku(h):
	for i in h:
		r = random.randint(0,len(i)-1)
		print '[{0:02}]   {1}'.format(r+1,i[r])

randomhaiku(basho)

Notes

Samples

Here are some sample generated poems:

[07]   To an old pond
[19]   The shallows—
[36]   Look like someone else

[41]   Trickles all night long
[22]   Indeed this is just
[02]   With tears on their eyes.

Other potential projects

Here are some potential projects that can make good use of the existing code:

  • contemporaryHaiku – a haiku generator that uses the classic 5-7-5 form, but references themes of contemporary / modern life, especially technology
  • randomRilke – almost the same internal code, but references Rilke’s Sonnets to Orpheus

Attribution

Translated versions came from the following sources: