Day_03: Dataframes and Plot Themes#
Today I wanted to visualize some data that I use for my computational mechanics course, the New York Stock Exchange data from 2010 to 2016. Its a nice set of data to load in a lot of values, parse it down based upon the NYSE symbol and view the rise and fall of stock prices.
I usually use Pandas as my data processing/storage tool, so I stumbled upon the CSV
and DataFrames
packages in the Julia ecosystem. I installed via
using Pkg
Pkg.add("CSV")
Pkg.add("DataFrames")
where CSV
could read the comma-separated value file and DataFrames
created a similar structure to a Pandas dataframe in Julia
using Plots
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80] (cache misses: wrong dep version loaded (2))
Failed to precompile Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80] to "/home/ryan/.julia/compiled/v1.11/Plots/jl_wdSRWZ".
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool; flags::Cmd, cacheflags::Base.CacheFlags, reasons::Dict{String, Int64}, isext::Bool)
@ Base ./loading.jl:3085
[3] (::Base.var"#1082#1083"{Base.PkgId})()
@ Base ./loading.jl:2492
[4] mkpidlock(f::Base.var"#1082#1083"{Base.PkgId}, at::String, pid::Int32; kwopts::@Kwargs{stale_age::Int64, wait::Bool})
@ FileWatching.Pidfile ~/projects/julia/usr/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:95
[5] #mkpidlock#6
@ ~/projects/julia/usr/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:90 [inlined]
[6] trymkpidlock(::Function, ::Vararg{Any}; kwargs::@Kwargs{stale_age::Int64})
@ FileWatching.Pidfile ~/projects/julia/usr/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:116
[7] #invokelatest#2
@ ./essentials.jl:1057 [inlined]
[8] invokelatest
@ ./essentials.jl:1052 [inlined]
[9] maybe_cachefile_lock(f::Base.var"#1082#1083"{Base.PkgId}, pkg::Base.PkgId, srcpath::String; stale_age::Int64)
@ Base ./loading.jl:3609
[10] maybe_cachefile_lock
@ ./loading.jl:3606 [inlined]
[11] _require(pkg::Base.PkgId, env::String)
@ Base ./loading.jl:2488
[12] __require_prelocked(uuidkey::Base.PkgId, env::String)
@ Base ./loading.jl:2315
[13] #invoke_in_world#3
@ ./essentials.jl:1089 [inlined]
[14] invoke_in_world
@ ./essentials.jl:1086 [inlined]
[15] _require_prelocked(uuidkey::Base.PkgId, env::String)
@ Base ./loading.jl:2302
[16] macro expansion
@ ./loading.jl:2241 [inlined]
[17] macro expansion
@ ./lock.jl:273 [inlined]
[18] __require(into::Module, mod::Symbol)
@ Base ./loading.jl:2198
[19] #invoke_in_world#3
@ ./essentials.jl:1089 [inlined]
[20] invoke_in_world
@ ./essentials.jl:1086 [inlined]
[21] require(into::Module, mod::Symbol)
@ Base ./loading.jl:2191
using DelimitedFiles, DataFrames
using Dates
Import the CSV into Julia#
Now, I can ~CSV.read
~ readdlm
the NYSE data as a DataFrame
as follows,
nyse_data = readdlm("./nyse-data.csv", ',')
851265×7 Matrix{Any}:
"date" "symbol" … "low" "high" "volume"
"2016-01-05 00:00:00" "WLTW" 122.31 126.25 2.1636e6
"2016-01-06 00:00:00" "WLTW" 119.94 125.54 2.3864e6
"2016-01-07 00:00:00" "WLTW" 114.93 119.74 2.4895e6
"2016-01-08 00:00:00" "WLTW" 113.5 117.44 2.0063e6
"2016-01-11 00:00:00" "WLTW" … 114.09 117.33 1.4086e6
"2016-01-12 00:00:00" "WLTW" 114.5 116.06 1.098e6
"2016-01-13 00:00:00" "WLTW" 112.59 117.07 949600.0
"2016-01-14 00:00:00" "WLTW" 110.05 115.03 785300.0
"2016-01-15 00:00:00" "WLTW" 111.92 114.88 1.0937e6
"2016-01-19 00:00:00" "WLTW" … 109.87 115.87 1.5235e6
"2016-01-20 00:00:00" "WLTW" 108.32 111.6 1.6539e6
"2016-01-21 00:00:00" "WLTW" 108.32 110.58 944300.0
⋮ ⋱ ⋮
"2016-12-30" "XLNX" 60.02 61.48 2.1117e6
"2016-12-30" "XOM" 90.01 90.7 9.1178e6
"2016-12-30" "XRAY" … 57.54 58.36 949200.0
"2016-12-30" "XRX" 8.7 8.8 1.12504e7
"2016-12-30" "XYL" 49.36 50.0 646200.0
"2016-12-30" "YHOO" 38.43 39.0 6.4316e6
"2016-12-30" "YUM" 63.16 63.94 1.8871e6
"2016-12-30" "ZBH" … 102.85 103.93 973800.0
"2016-12-30" "ZION" 42.69 43.31 1.9381e6
"2016-12-30" "ZTS" 53.27 53.74 1.7012e6
"2016-12-30 00:00:00" "AIV" 44.41 45.59 1.3809e6
"2016-12-30 00:00:00" "FTV" 53.39 54.48 705100.0
nyse_df = DataFrame(nyse_data[2:end, :], nyse_data[1, :])
Row | date | symbol | open | close | low | high | volume |
---|---|---|---|---|---|---|---|
Any | Any | Any | Any | Any | Any | Any | |
1 | 2016-01-05 00:00:00 | WLTW | 123.43 | 125.84 | 122.31 | 126.25 | 2.1636e6 |
2 | 2016-01-06 00:00:00 | WLTW | 125.24 | 119.98 | 119.94 | 125.54 | 2.3864e6 |
3 | 2016-01-07 00:00:00 | WLTW | 116.38 | 114.95 | 114.93 | 119.74 | 2.4895e6 |
4 | 2016-01-08 00:00:00 | WLTW | 115.48 | 116.62 | 113.5 | 117.44 | 2.0063e6 |
5 | 2016-01-11 00:00:00 | WLTW | 117.01 | 114.97 | 114.09 | 117.33 | 1.4086e6 |
6 | 2016-01-12 00:00:00 | WLTW | 115.51 | 115.55 | 114.5 | 116.06 | 1.098e6 |
7 | 2016-01-13 00:00:00 | WLTW | 116.46 | 112.85 | 112.59 | 117.07 | 949600.0 |
8 | 2016-01-14 00:00:00 | WLTW | 113.51 | 114.38 | 110.05 | 115.03 | 785300.0 |
9 | 2016-01-15 00:00:00 | WLTW | 113.33 | 112.53 | 111.92 | 114.88 | 1.0937e6 |
10 | 2016-01-19 00:00:00 | WLTW | 113.66 | 110.38 | 109.87 | 115.87 | 1.5235e6 |
11 | 2016-01-20 00:00:00 | WLTW | 109.06 | 109.3 | 108.32 | 111.6 | 1.6539e6 |
12 | 2016-01-21 00:00:00 | WLTW | 109.73 | 110.0 | 108.32 | 110.58 | 944300.0 |
13 | 2016-01-22 00:00:00 | WLTW | 111.88 | 111.95 | 110.19 | 112.95 | 744900.0 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
851253 | 2016-12-30 | XLNX | 61.09 | 60.37 | 60.02 | 61.48 | 2.1117e6 |
851254 | 2016-12-30 | XOM | 90.03 | 90.26 | 90.01 | 90.7 | 9.1178e6 |
851255 | 2016-12-30 | XRAY | 58.29 | 57.73 | 57.54 | 58.36 | 949200.0 |
851256 | 2016-12-30 | XRX | 8.72 | 8.73 | 8.7 | 8.8 | 1.12504e7 |
851257 | 2016-12-30 | XYL | 49.98 | 49.52 | 49.36 | 50.0 | 646200.0 |
851258 | 2016-12-30 | YHOO | 38.72 | 38.67 | 38.43 | 39.0 | 6.4316e6 |
851259 | 2016-12-30 | YUM | 63.93 | 63.33 | 63.16 | 63.94 | 1.8871e6 |
851260 | 2016-12-30 | ZBH | 103.31 | 103.2 | 102.85 | 103.93 | 973800.0 |
851261 | 2016-12-30 | ZION | 43.07 | 43.04 | 42.69 | 43.31 | 1.9381e6 |
851262 | 2016-12-30 | ZTS | 53.64 | 53.53 | 53.27 | 53.74 | 1.7012e6 |
851263 | 2016-12-30 00:00:00 | AIV | 44.73 | 45.45 | 44.41 | 45.59 | 1.3809e6 |
851264 | 2016-12-30 00:00:00 | FTV | 54.2 | 53.63 | 53.39 | 54.48 | 705100.0 |
Here, I focus on just the Google stock price (GOOGL
). I use a couple of calls to the nyse_df
dataframe:
nyse[!, "symbol"]
: this calls the column of data that has the NYSE symbols.== "GOOGL"
: this compares the left-hand-side to the string “GOOGL” and returnstrue/false
nyse_df[ ... .== ..., :]
: this uses the comparison described in 1 + 2 to grab all of the columns that match the.==
operator
In one line, these calls to nyse_df
create google_df
that only contains the Google open, close, low, high, and volume values from 2010 - 2016.
google_df = nyse_df[nyse_df[!, "symbol"] .== "GOOGL", :];
Dates in Julia#
Dates can be so frustrating in any language. In this case, all of the dates are interpreted as strings. Not the worst, but I do want the actual datetime values. I looped through each date and created datetime values in days
. Above, I imported Dates
so I can convert the string to a datetime.
days = zeros(Date, size(google_df)[1])
for (i, d) in enumerate(google_df[!, "date"])
days[i] = Dates.Date(d)
end
Julia plot themes#
I am a big fan of plot themes. My Matplotlib theme of choice is ‘fivethirtyeight’. It has thick lines and large fonts. In my experience, if a figure’s fontsize in a presentation, paper, website, etc. is less than 16pt, then it is almost invisible to most people you are trying to share with. I am constantly increasing the font size in my figures to share ideas.
In Julia, I am using the PlotThemes
package that has a nice collection of unique color palettes and design choices. I decided to write my own theme that increases all the fonts to 18 and 24 (I read somewhere that fontsizes should roughly follow the 3:4 ratio where each smaller font is 75% of the bigger font).
For reference, I plotted the opening and closing prices of Google’s stock with the :default
theme.
Plots.theme(:default)
plot(days,
google_df[!, "open"],
label = "open price",
title = "Google's opening and closing NYSE price")
plot!(days, google_df[!, "close"], label = "close price")
Then, I made a theme that
increased the title font to 24
increased tick fonts to 18
increased guide font to 18
increased line width to 4px
increased marker size to 10px
tried to increase legend font to 18
placed the legend outside the plot on the top right
removed the gridlines
here it is:
_themes[:cooper] = PlotTheme(linewidth = 4,
markersize = 10,
titlefontsize = 24,
guidefontsize = 18,
tickfontsize = 18,
colorbar_tickfontsize = 18,
legend_font_pointsize = 18,
legend=:outertopright,
grid = false
)
Then, I included the theme in PlotThemes.jl
and created my new Google stock price figure.
Plots.theme(:vibrant)
plot(days,
google_df[!, "open"],
label = "open price",
# legend_font_pointsize = 18,
title = "Google's opening and closing\n NYSE price")
# xticks = [ "2010-01-04", "2011-09-30", "2013-07-03", "2015-04-02", "2016-12-29"])
#xticks = google_df[1:floor(Int64, end/4):end, "date"])
# xtickfont = font(20, "Sans"),
# xticks = 0:500:5000)
plot!(days, google_df[!, "close"], label = "close price", xrotation = 25, size = (700,400))
plot!(xlabel = "date", ylabel = "price (\$\$)")
Using my new theme#
My theme is included in my local testing environment, but I wanted to add it to my general setup in my GitHub actions. I forked the PlotThemes
and included another call in my actions yaml:
julia -e 'using Pkg; Pkg.add(url="https://github.com/cooperrc/PlotThemes.jl");
Now, I can use my own version of PlotThemes
developed in my fork.
Wrapping up#
I’ve still got some work to do on building my :cooper
theme. The title hangs down into the graph right now and the dates have too much information for what I need. The legend is not responding to my calls to legend_font_pointsize
and the PlotThemes
package seems to pass information to Plots
without requiring it as a dependency.
I am proud that I was able to create a jumping-off point for my data visualization needs. I enjoyed the straightforward plot calls to quickly get data into a graph.