Day_03: Dataframes and Plot Themes#

Today I wanted to visualize some data that I use for my computational mechanics course, the New York Stock Exchange data from 2010 to 2016. Its a nice set of data to load in a lot of values, parse it down based upon the NYSE symbol and view the rise and fall of stock prices.

I usually use Pandas as my data processing/storage tool, so I stumbled upon the CSV and DataFrames packages in the Julia ecosystem. I installed via

using Pkg
Pkg.add("CSV")
Pkg.add("DataFrames")

where CSV could read the comma-separated value file and DataFrames created a similar structure to a Pandas dataframe in Julia

using Plots
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80] (cache misses: wrong dep version loaded (2))
Failed to precompile Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80] to "/home/ryan/.julia/compiled/v1.11/Plots/jl_wdSRWZ".

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool; flags::Cmd, cacheflags::Base.CacheFlags, reasons::Dict{String, Int64}, isext::Bool)
    @ Base ./loading.jl:3085
  [3] (::Base.var"#1082#1083"{Base.PkgId})()
    @ Base ./loading.jl:2492
  [4] mkpidlock(f::Base.var"#1082#1083"{Base.PkgId}, at::String, pid::Int32; kwopts::@Kwargs{stale_age::Int64, wait::Bool})
    @ FileWatching.Pidfile ~/projects/julia/usr/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:95
  [5] #mkpidlock#6
    @ ~/projects/julia/usr/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:90 [inlined]
  [6] trymkpidlock(::Function, ::Vararg{Any}; kwargs::@Kwargs{stale_age::Int64})
    @ FileWatching.Pidfile ~/projects/julia/usr/share/julia/stdlib/v1.11/FileWatching/src/pidfile.jl:116
  [7] #invokelatest#2
    @ ./essentials.jl:1057 [inlined]
  [8] invokelatest
    @ ./essentials.jl:1052 [inlined]
  [9] maybe_cachefile_lock(f::Base.var"#1082#1083"{Base.PkgId}, pkg::Base.PkgId, srcpath::String; stale_age::Int64)
    @ Base ./loading.jl:3609
 [10] maybe_cachefile_lock
    @ ./loading.jl:3606 [inlined]
 [11] _require(pkg::Base.PkgId, env::String)
    @ Base ./loading.jl:2488
 [12] __require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base ./loading.jl:2315
 [13] #invoke_in_world#3
    @ ./essentials.jl:1089 [inlined]
 [14] invoke_in_world
    @ ./essentials.jl:1086 [inlined]
 [15] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base ./loading.jl:2302
 [16] macro expansion
    @ ./loading.jl:2241 [inlined]
 [17] macro expansion
    @ ./lock.jl:273 [inlined]
 [18] __require(into::Module, mod::Symbol)
    @ Base ./loading.jl:2198
 [19] #invoke_in_world#3
    @ ./essentials.jl:1089 [inlined]
 [20] invoke_in_world
    @ ./essentials.jl:1086 [inlined]
 [21] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:2191
using DelimitedFiles, DataFrames
using Dates

Import the CSV into Julia#

Now, I can ~CSV.read~ readdlm the NYSE data as a DataFrame as follows,

nyse_data = readdlm("./nyse-data.csv", ',')
851265×7 Matrix{Any}:
 "date"                 "symbol"  …     "low"     "high"        "volume"
 "2016-01-05 00:00:00"  "WLTW"       122.31    126.25          2.1636e6
 "2016-01-06 00:00:00"  "WLTW"       119.94    125.54          2.3864e6
 "2016-01-07 00:00:00"  "WLTW"       114.93    119.74          2.4895e6
 "2016-01-08 00:00:00"  "WLTW"       113.5     117.44          2.0063e6
 "2016-01-11 00:00:00"  "WLTW"    …  114.09    117.33          1.4086e6
 "2016-01-12 00:00:00"  "WLTW"       114.5     116.06          1.098e6
 "2016-01-13 00:00:00"  "WLTW"       112.59    117.07     949600.0
 "2016-01-14 00:00:00"  "WLTW"       110.05    115.03     785300.0
 "2016-01-15 00:00:00"  "WLTW"       111.92    114.88          1.0937e6
 "2016-01-19 00:00:00"  "WLTW"    …  109.87    115.87          1.5235e6
 "2016-01-20 00:00:00"  "WLTW"       108.32    111.6           1.6539e6
 "2016-01-21 00:00:00"  "WLTW"       108.32    110.58     944300.0
 ⋮                                ⋱              ⋮        
 "2016-12-30"           "XLNX"        60.02     61.48          2.1117e6
 "2016-12-30"           "XOM"         90.01     90.7           9.1178e6
 "2016-12-30"           "XRAY"    …   57.54     58.36     949200.0
 "2016-12-30"           "XRX"          8.7       8.8           1.12504e7
 "2016-12-30"           "XYL"         49.36     50.0      646200.0
 "2016-12-30"           "YHOO"        38.43     39.0           6.4316e6
 "2016-12-30"           "YUM"         63.16     63.94          1.8871e6
 "2016-12-30"           "ZBH"     …  102.85    103.93     973800.0
 "2016-12-30"           "ZION"        42.69     43.31          1.9381e6
 "2016-12-30"           "ZTS"         53.27     53.74          1.7012e6
 "2016-12-30 00:00:00"  "AIV"         44.41     45.59          1.3809e6
 "2016-12-30 00:00:00"  "FTV"         53.39     54.48     705100.0
nyse_df = DataFrame(nyse_data[2:end, :], nyse_data[1, :])
851264×7 DataFrame
851239 rows omitted
Rowdatesymbolopencloselowhighvolume
AnyAnyAnyAnyAnyAnyAny
12016-01-05 00:00:00WLTW123.43125.84122.31126.252.1636e6
22016-01-06 00:00:00WLTW125.24119.98119.94125.542.3864e6
32016-01-07 00:00:00WLTW116.38114.95114.93119.742.4895e6
42016-01-08 00:00:00WLTW115.48116.62113.5117.442.0063e6
52016-01-11 00:00:00WLTW117.01114.97114.09117.331.4086e6
62016-01-12 00:00:00WLTW115.51115.55114.5116.061.098e6
72016-01-13 00:00:00WLTW116.46112.85112.59117.07949600.0
82016-01-14 00:00:00WLTW113.51114.38110.05115.03785300.0
92016-01-15 00:00:00WLTW113.33112.53111.92114.881.0937e6
102016-01-19 00:00:00WLTW113.66110.38109.87115.871.5235e6
112016-01-20 00:00:00WLTW109.06109.3108.32111.61.6539e6
122016-01-21 00:00:00WLTW109.73110.0108.32110.58944300.0
132016-01-22 00:00:00WLTW111.88111.95110.19112.95744900.0
8512532016-12-30XLNX61.0960.3760.0261.482.1117e6
8512542016-12-30XOM90.0390.2690.0190.79.1178e6
8512552016-12-30XRAY58.2957.7357.5458.36949200.0
8512562016-12-30XRX8.728.738.78.81.12504e7
8512572016-12-30XYL49.9849.5249.3650.0646200.0
8512582016-12-30YHOO38.7238.6738.4339.06.4316e6
8512592016-12-30YUM63.9363.3363.1663.941.8871e6
8512602016-12-30ZBH103.31103.2102.85103.93973800.0
8512612016-12-30ZION43.0743.0442.6943.311.9381e6
8512622016-12-30ZTS53.6453.5353.2753.741.7012e6
8512632016-12-30 00:00:00AIV44.7345.4544.4145.591.3809e6
8512642016-12-30 00:00:00FTV54.253.6353.3954.48705100.0

Here, I focus on just the Google stock price (GOOGL). I use a couple of calls to the nyse_df dataframe:

  1. nyse[!, "symbol"]: this calls the column of data that has the NYSE symbols

  2. .== "GOOGL": this compares the left-hand-side to the string “GOOGL” and returns true/false

  3. nyse_df[ ... .== ..., :]: this uses the comparison described in 1 + 2 to grab all of the columns that match the .== operator

In one line, these calls to nyse_df create google_df that only contains the Google open, close, low, high, and volume values from 2010 - 2016.

google_df = nyse_df[nyse_df[!, "symbol"] .== "GOOGL", :];

Dates in Julia#

Dates can be so frustrating in any language. In this case, all of the dates are interpreted as strings. Not the worst, but I do want the actual datetime values. I looped through each date and created datetime values in days. Above, I imported Dates so I can convert the string to a datetime.

days = zeros(Date, size(google_df)[1])
for (i, d) in enumerate(google_df[!, "date"])
    days[i] = Dates.Date(d)
end

Julia plot themes#

I am a big fan of plot themes. My Matplotlib theme of choice is ‘fivethirtyeight’. It has thick lines and large fonts. In my experience, if a figure’s fontsize in a presentation, paper, website, etc. is less than 16pt, then it is almost invisible to most people you are trying to share with. I am constantly increasing the font size in my figures to share ideas.

In Julia, I am using the PlotThemes package that has a nice collection of unique color palettes and design choices. I decided to write my own theme that increases all the fonts to 18 and 24 (I read somewhere that fontsizes should roughly follow the 3:4 ratio where each smaller font is 75% of the bigger font).

For reference, I plotted the opening and closing prices of Google’s stock with the :default theme.

Plots.theme(:default)
plot(days, 
    google_df[!, "open"], 
    label = "open price",
    title = "Google's opening and closing NYSE price")
plot!(days, google_df[!, "close"], label = "close price")

Then, I made a theme that

  • increased the title font to 24

  • increased tick fonts to 18

  • increased guide font to 18

  • increased line width to 4px

  • increased marker size to 10px

  • tried to increase legend font to 18

  • placed the legend outside the plot on the top right

  • removed the gridlines

here it is:

_themes[:cooper] = PlotTheme(linewidth = 4,
                             markersize = 10,
                             titlefontsize = 24,
                             guidefontsize = 18,
                             tickfontsize = 18,
                             colorbar_tickfontsize = 18,
                             legend_font_pointsize = 18,
                             legend=:outertopright,
                             grid = false
                            )

Then, I included the theme in PlotThemes.jl and created my new Google stock price figure.

Plots.theme(:vibrant)
plot(days, 
    google_df[!, "open"], 
    label = "open price",
    # legend_font_pointsize = 18,
    title = "Google's opening and closing\n NYSE price")
    
    # xticks = [ "2010-01-04", "2011-09-30", "2013-07-03", "2015-04-02", "2016-12-29"])
    #xticks = google_df[1:floor(Int64, end/4):end, "date"])
    # xtickfont = font(20, "Sans"),
    # xticks = 0:500:5000)
plot!(days, google_df[!, "close"], label = "close price", xrotation = 25, size = (700,400))
plot!(xlabel = "date", ylabel = "price (\$\$)")

Using my new theme#

My theme is included in my local testing environment, but I wanted to add it to my general setup in my GitHub actions. I forked the PlotThemes and included another call in my actions yaml:

julia -e 'using Pkg; Pkg.add(url="https://github.com/cooperrc/PlotThemes.jl");

Now, I can use my own version of PlotThemes developed in my fork.

Wrapping up#

I’ve still got some work to do on building my :cooper theme. The title hangs down into the graph right now and the dates have too much information for what I need. The legend is not responding to my calls to legend_font_pointsize and the PlotThemes package seems to pass information to Plots without requiring it as a dependency.

I am proud that I was able to create a jumping-off point for my data visualization needs. I enjoyed the straightforward plot calls to quickly get data into a graph.