Baton Rouge Traffic Incidents, 2021

Overview

This is an exercise in data analysis and visualization using the Baton Rouge Traffic Incidents dataset (retrieved on Jan 10, 2022).

  • Explore Google Data Studio’s dynamic content
  • Provide initial analysis of the dataset (using historical trends as reference)

Dictionary

These are some of the key metrics and terms that are abbreviated throughout the report:

  • Incident (INT) refers to a singular report of traffic incident (or collision), identified by an incident number or ID;
  • Incident with Injury (IIJ) refers to an INT with a reported injury;
  • Incident with Fatality (IFT) refers to an INT with a reported fatality;
  • Incidents with 2 or Less Vehicles (I2V-s) refers to an INT impacting 2 or less vehicles;
  • Incidents with 3 or More Vehicles (I3V+s) refers to an INT impacting 3 or more vehicles;
  • Vehicles or Vehicles Involved in an Incident (VIIs) refers to the total number of vehicles involved in an INT and is usually bucketed as either I2V-s or I3V+s;
  • Primary Factors (P1Fs), Secondary Factors (S2Fs) refer to the factors that resulted to a collision, often a traffic violation; S2Fs may be null if there is no other identified collision factor besides the P1F

Dynamic Content

  • Interactive Graphs. Each graph has been designed to produce dynamic content in response to user feedback. Each graph has a set of secondary (optional) metrics, and each page has a set of optional filters.
  • Optional Metrics. These are graph-specific controls that allow the user to replace the default metric with another, by choosing the optional metrics selector on each graph. A list of what these metrics are can be found at the bottom of each graph. In the example below, optional metrics for the bar graph below include IFTs, I3V+s, and I2V-s.
  • Filters. These are page-specific controls that update the results in all the graphs within that page based on selected filter values. In the example below, choosing Violations updated the values of the bar graph to only include incidents that meet the Primary Factor filter criteria.

Report

Below is a snapshot of the first page of the report. Please click the image to go to the dynamic dashboard.

Visualizing TV Shows Added in 2021

Overview

This is a quick data visualization project that consolidates four streaming services data sets from Shivam Bansal‘s Kaggle repo. The streaming services included are Amazon Prime, Disney Plus, Hulu, and Netflix. All datasets are current as of Dec 12, 2021.

I implement the project using the following tools and steps:

  1. Jupyter Notebook, Python – with the csv files downloaded, I clean and combine the various data sets
  2. Google Drive (Google Sheets) – upload the database for storage and later retrieval
  3. Tableau (Public) – use the built-in Google Sheets connector and visualize the data using a dashboard

Results

Jupyter Notebook

I use Pandas to transform the CSV files into dataframes and combine them. The initial result includes listings for movies and TV shows, so movies are later removed. Some columns for cohorts (such as release_decade) are also included in the final output to anticipate categorizations in the visualization. The file can be downloaded using the link below.

Tableau

This is my foray into a more ‘fluid’ layout, making strong use of floating objects (vs. tiled), and opting out of the default tabular headers (and creating my own labels using icons and other graphic cues).

(The live dashboard can be found here.)

Developing the Customer Journey for Marketing

Overview

The goal of this project is to use Mailchimp’s Customer Journey to map out and automate the marketing newsletter opt-in and email verification workflows, both of which are embedded into the account creation / end-user sign up flow.

During sign-up, the user is given an option to use their email or phone number. If the user chooses the former, it triggers the following automated workflows:

  • Email Verification. The user needs to verify their email. A verification email gets sent to the user email as soon as they complete creating their account. An email verification check mark is reflected on their account as soon as they verify. They receive up to three email verification requests, otherwise the email requests stop and the account remains unverified.
  • Newsletter Sign Up. The user is given the option to sign up to marketing newsletters by ticking a checkbox. If the checkbox is ticked after they complete creating their account, they are added to the newsletter distribution.

Tool Kit

  • Whimsical. This is the diagramming tool used to model the newsletter opt-in and email verification sub-flows.
  • Mailchimp. This is the solution for sending out verification requests to unverified user emails and for handling marketing newsletters. There are two audiences, one that targets consumers (or the B2C vertical) and another targets business end-users (B2B).
  • Go: Proprietary SaaS solution that integrates with Mailchimp.

Milestones

1: Model User Flows

User flows are modeled in Whimsical to visualize how sub-flows connect and impact each other, and to surface potential gaps in the journey. Below are a couple of diagrams for these sub-flows.



2: Map Customer Journey on Mailchimp

This is a sample of a customer journey (under the Consumer audience). After the customer journeys have been mapped out, technical details (such as journey IDs and step IDs) are documented and sent to Engineering for the in-house development / integration.



Surfacing Initial Sales Performance Trends

Overview

The goal of this project is to surface preliminary sales data, which includes an initial pipeline of early product users. The early adopters are small to medium sized businesses, mainly within retail, food and nightlife verticals. There are three main aspects of the initial data:

  1. Surveys, conducted by market researchers, feed into the pipeline as leads if merchant provides consent;
  2. Pipeline (EAPs) data, which focused on conversion;
  3. Market saturation (or what part of the target market is being captured), as scoped by Yelp listings.

Milestones

The main challenge is creating a foundation of Sales reporting that can scale with the rapid changes in processes, technologies and goals.

1: Setup Reporting in Google Sheets

Google Sheets becomes the starting point, since it allows for fast iterations. The reporting started as a flat file that captures rudimentary survey and pipeline data.

2: Visualize Data and Surface Early Trends in Tableau

Every week, the flat file is exported and updated as a data source in Tableau for a week end review.

3: Switch to Google Data Studio to Improve Visualization

To allow for real-time dashboard updates (while using existing technologies / without incurring additional costs), the visualization is moved to Google Data Studio. The underlying data connection still reference Google Sheets.

4: Use Brand Colors

The switch to Google Data Studio coincides with the release of company style sheets. The visualization redesign adopts the recommended color palette and gradient styles.




5: Surface Initial Trends

Using Yelp as benchmark, specific market verticals are investigated to understand market capture.

Randomizing Basho’s Verse

To live poetry
is better than to write it.
— Basho

Overview

I’m not necessarily following Basho’s advice, but I figured if I were to stray, I’d make the detour pythonic. So I made a haiku generator called randomBasho, which uses a simple randomizer to derive ‘new’ haikus from over a hundred Basho haikus. My goals are as follows:

  • to put a fresh spin on something centuries old; to generate poems that still retain the same contemplative energy and poetic tone as their source, but unearth new interpretations or meanings behind Basho’s lines
  • from a technical standpoint, reoriented myself with basic Pythonic concepts, such as iterators, functions, and data sets
  • from a creative standpoint, use this as a starting ground to experiment and explore poetic possibilities in code

Initial Code

I wrote my first few attempts at randomizing with the goal of just wanting to get reacquainted with Python. Having learned some of these basic concepts a few years ago (in Python 2.x), I want to check my comfort level in Pythonic building blocks and in Python 3.x updates.

Step 1. Randomization

So I could focus on this task, I initially lessened the technical scope by creating three lists with distinct items. The lists are line1, line2 and line3, which respectively contain (and correspond to) the haikus’ first lines, second lines, and third lines.

By making the poem number and line number explicit in the item names in all of the lists below, I was able to test whether the randomization actually worked. My expected end result was a Frankenstein haiku, with lines from different poems.

import random

line1 = [
	'poem1_line1',
	'poem2_line1',
	'poem3_line1',
	'poem4_line1',
	'poem5_line1'
]

line2 = [
	'poem1_line2',
	'poem2_line2',
	'poem3_line2',
	'poem4_line2',
	'poem5_line2'
]

line3 = [
	'poem1_line3',
	'poem2_line3',
	'poem3_line3',
	'poem4_line3',
	'poem5_line3'
]

I used an iterator, since the task of printing a haiku line (after grabbing an item) needed to be repeated for each list.

def randombasho1(x,y,z):
	r = [x,y,z]
	j = 0
	for i in r:
		print r[j][random.randint(0,4)]
		j = j + 1

randombasho1(line1,line2,line3)

I dropped the j = 0 and j = j + 1 and opted for a range(0, 3), since my haikus all have 3 lines.

def randombasho2(x,y,z):
	r = [x,y,z]
	for i in range(0,3):
		print r[i][random.randint(0,4)]

randombasho2(line1,line2,line3)

I removed the r = [x, y, z] and used *args so I could repurpose the code in non-haiku use cases.

def randombasho3(*args):
	for i in range(0,len(args)):
		print args[i][random.randint(0,4)]

randombasho3(line1,line2,line3)

Additional Code

After I figured out the randomization code, the next milestone for me was to automatically generate lists line1, line2 and line3.

I decoupled the data (in this case, the haikus) from the code itself, so I can potentially reuse the randomization code in another application (for instance, another haiku poet, or potentially, another poetic form.)

The .py file of the haiku data had one list with three items: set1, set2, and set3. Each item is a long string, which contains multiple full haikus. The list below is from set1.

# Some of the poems in the rbasho02_haikus list
The door of thatched hut
Also changed the owner.
At the Doll\'s Festival.

Spring is passing.
The birds cry, and the fishes fill
With tears on their eyes.

Grasses in summer.
The warriors\' dreams
All that left.

The early summer rain
Leaves behind
Hikari-do.

Ah, tranquility!
Penetrating the very rock,
A cicada\'s voice.

The early summer rain,
Gathering it and fast
Mogami River.

To an old pond
A frog leaps in.
And the sound of the water.

Saying something,
The lip feeling cold.
The Autum wind.

Tieing the Chimaki,
Other hand hold,
Her bangs.

I ended up using .splitlines() to handle the splicing. Then I used index() to determine a line’s haiku position, and in what list they should be placed. As an homage, I named the final argument basho, which is a list of line1, line2, and line3.

import random
import rbasho02_haikus as rbh

haikus = rbh.haikus.splitlines()

h_list = []
h_dict = {}
line1 = []
line2 = []
line3 = []
basho = [line1,line2,line3]

for h in haikus:
	if len(h) > 0:
		h = h[0].upper() + h[1:]
		h_list.append(h)

for h in h_list:
	k = h_list.index(h) + 1
	h_dict.update({k: h})

for h in h_dict:
	if h % 3 == 0:
		line3.append(h_dict[h])
	elif h % 2 == 0:
		line2.append(h_dict[h])
	else:
		line1.append(h_dict[h])

def randomhaiku(h):
	for i in h:
		r = random.randint(0,len(i)-1)
		print '[{0:02}]   {1}'.format(r+1,i[r])

randomhaiku(basho)

Notes

Samples

Here are some sample generated poems:

[07]   To an old pond
[19]   The shallows—
[36]   Look like someone else

[41]   Trickles all night long
[22]   Indeed this is just
[02]   With tears on their eyes.

Other potential projects

Here are some potential projects that can make good use of the existing code:

  • contemporaryHaiku – a haiku generator that uses the classic 5-7-5 form, but references themes of contemporary / modern life, especially technology
  • randomRilke – almost the same internal code, but references Rilke’s Sonnets to Orpheus

Attribution

Translated versions came from the following sources: