Posts

Data Visualization with Python

Image
For this week’s homework assignment, I had to create a social network graph using Python.  Creating the random graph worked smoothly and translating the nodes and edge data into pandas DataFrames was simple when following the given code.  The only problem I ran into was that after typing it into PyCharm, it would not graph for me.  It simply returned a ggplot object.  After some troubleshooting, I figured out that I could save the plot as an image to see it.  This was helpful, and I can definitely see myself using this in the future for visu alizing connections.

Generating Dr.Piwek Dot Dash in RStudio

Image
For this assignment, I recreated a dot-dash plot presented in Dr.Piwek's post using the base R graphics. The plot displays per capita budget expenditures from 1967 to 1977. Following Dr.Piwek's example, I connected the dots with lines to make it more visually clear and closer to his example. The dashed lines were also used horizontally to give the viewer a reference point when looking at the graph. Overall, all the examples that I examined by Dr.Piwek showed a visually appealing and readable visualization, and it was a good exercise to attempt to recreate it following his amazing work.

Module 10: Time Series and Data Visualization

Image
I recreated the two visuals provided for this assignment and improved upon them.  I incorporated a purple theme and improved the labels.  Using ggplot2 allowed me to have cleaner and more visually appealing graphs.  I used the hot dog data to create bar charts and the economics data to do a time series. Time series are good to visualize because you can identify patterns and trends over time, instead of having to analyze raw data .  Graphs allow you to easily see how the unemployment rate and duration change. Overall,  ggplot2 has great functionality for creating clear and effective visualizations.  This bar chart shows the number of hot dogs eaten each year. I used purple to highlight the record-breaking performances In this version, I used ggplot2 to improve the styling and make the visualization clearer and easier to understand.  The stacked bar chart displays multiple categories in one single graph, showing how these values combine over time, a...

Module 9 Assignment

Image
For this assignment, I again used the reliable mtcars dataset in RStudio. This data set is one I have previously used and I am familiar with. It contains information about different car models, and their perormances and also contains information such as mpg, weight, horsepower, and number os cylinders. I chose this dataset as it is one packed with good information that can help us make a good visualization.  In this visulaization I explored the relationship between car weight, mpg, while also adding horsepower and number of cylinders. I chose the x-axis to represent the weight and the y-axis to represent mpg, to represent the cylinder amount of decide to go with different colors, and horsepower is represented by the size of the point. This visualization does a great job at helping us see trends and patterns like heavier cars tend to have lower fuel efficiency, and cars with more cylinders have higher horsepower. Multivariate visualization allowed me to show different relationships ...

Visualizing Distribution in RStudio

Image
  For this weeks assignemnt, we were tasked with showing visual distribution using RStudio. I chose the built-in dataset mtcars, which contains many different attributes about 32 different cars. To show the distribution, I created a histogram using ggplot 2 to visualize the distribution of one of the most important things when searching for a new car, miles per gallon (mpg). Looking at the histogram, we can see that most vehicles fall between 15-25 mpg, with a few cars achieving above 30mpg. While designing this visualization, I followed Few's recommendations by just keeping the layout simple, using consistent coloring, and labeling the axis clearly. I do agree that sometimes distributions can be overcomplicated, which can cause them to be as clear to read. In my case, I try to make it as simple as possible while remaining visually engaging and appealing. 

Basic Visualization in Rstudio

Image
  For this assignment, we did some basic visualisations in RStudio. Here we can first see a pie chart showing counts and colors. For the pie chart and the boxplot, we did 40, 30, 20, 10. This analysis is quite simple, as its main focus is to summarize and compare instead of making predictions or testing a hypothesis.  The pie chart fits less strongly with  Few's typical critique of pie charts, while it still communicates a valid idea of part-to-whole, since the total adds up to 100. In the pie chart, people might have a hard time comparing the sizes accurately, especially since some values are close to each other. For example its hard to see an accurate difference between 30 and 20, while piecharts are good at showing the different proportions, they can be hard to accurately compare.  On the other hand, I think the boxplot fits well with the ideas by Few and Yau. The boxplot makes the comparison clear as they all share a common baseline, so it's easier to see the dif...

Using Plotly for Data Visualization

Image
  Using the given dataset, I went with Plotly to analyze the dataset as its simplicity to use and customization grabbed my attention. I have to say, after using it for a bit and playing around with it, Plotly is a very intuitive tool to have in the data science field. I used these two charts to compare the difference between part-to-whole charts and ranking charts. The ranking chart helped me visualize closely how each ID compared to the average position. By having them in an ascending order, we can see which ones are performing better relatively to others. This type of visualization is best put to use when comparing data within the same dataset. On the other hand, the part-to-whole chart shows how each ID contributes to the total average position. The main focus of this visualization is to show proportion and distribution. We can see how using the same dataset can be interpreted differently depending on which type of visualization we apply to it.