Content modified under Creative Commons Attribution license CC-BY 4.0, code under BSD 3-Clause License © 2020 R.C. Cooper

Homework#

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

Problems Part 1#

  1. Gordon Moore created an empirical prediction that the rate of semiconductors on a computer chip would double every two years. This prediction was known as Moore’s law. Gordon Moore had originally only expected this empirical relation to hold from 1965 - 1975 [1,2], but semiconductor manufacturers were able to keep up with Moore’s law until 2015.

In the folder “…/data” is a comma separated value (CSV) file, “transistor_data.csv” taken from wikipedia 01/2020.

a. Use the !head ../data/transistor_data.csv command to look at the top of the csv. What are the headings for the columns?

b. Load the csv into a pandas dataframe. How many missing values (NaN) are in the column with the number of transistors? What fraction are missing?

Problems Part 2#

  1. Many beers do not report the IBU of the beer because it is very small. You may be accidentally removing whole categories of beer from our dataset by removing rows that do not include the IBU measure.

    a. Use the command beers_filled = beers.fillna(0) to clean the beers dataframe

    b. Recreate the plot “Beer ABV vs. IBU mean values by style” bubble plot with beers_filled. What differences do you notice between the plots?

  1. Gordon Moore created an empirical prediction that the rate of semiconductors on a computer chip would double every two years. This prediction was known as Moore’s law. Gordon Moore had originally only expected this empirical relation to hold from 1965 - 1975 [1,2], but semiconductor manufacturers were able to keep up with Moore’s law until 2015.

    In the folder “…/data” is a comma separated value (CSV) file, “transistor_data.csv” taken from wikipedia 01/2020. Load the csv into a pandas dataframe, it has the following headings:

    Processor

    MOS transistor count

    Date of Introduction

    Designer

    MOSprocess

    Area

    a. In the years 2017, what was the average MOS transistor count? Make a boxplot of the transistor count in 2017 and find the first, second and third quartiles.

    b. Create a semilog y-axis scatter plot (i.e. plt.semilogy) for the “Date of Introduction” vs “MOS transistor count”. Color the data according to the “Designer”.

Problems Part 3#

  1. There is a csv file in ‘…/data/primary-energy-consumption-by-region.csv’ that has the energy consumption of different regions of the world from 1965 until 2018 Our world in Data. Compare the energy consumption of the United States to all of Europe. Load the data into a pandas dataframe. Note: you can get certain rows of the data frame by specifying what you’re looking for e.g. EUR = dataframe[dataframe['Entity']=='Europe'] will give us all the rows from Europe’s energy consumption.

    a. Plot the total energy consumption of the United States and Europe

    b. Use a linear least-squares regression to find a function for the energy consumption as a function of year

    energy consumed = \(f(t) = At+B\)

    c. At what year would you change split the data and use two lines like you did in the land temperature anomoly? Split the data and perform two linear fits.

    d. What is your prediction for US energy use in 2025? How about European energy use in 2025?

energy = pd.read_csv('../data/primary-energy-consumption-by-region.csv')
  1. You plotted Gordon Moore’s empirical prediction that the rate of semiconductors on a computer chip would double every two years in 02_Seeing_Stats. This prediction was known as Moore’s law. Gordon Moore had originally only expected this empirical relation to hold from 1965 - 1975 [1,2], but semiconductor manufacturers were able to keep up with Moore’s law until 2015.

Use a linear regression to find your own historical Moore’s Law.

Use code from 02_Seeing_Stats to plot the semilog y-axis scatter plot (i.e. plt.semilogy) for the “Date of Introduction” vs “MOS transistor count”. Color the data according to the “Designer”.

Create a linear regression for the data in the form of

\(log(transistor~count)= f(date) = A\cdot date+B\)

rearranging

\(transistor~count= e^{f(date)} = e^B e^{A\cdot date}\)

You can perform a least-squares linear regression using the following assignments

\(x_i=\) dataframe['Date of Introduction'].values

and

\(y_i=\) as np.log(dataframe['MOS transistor count'].values)

a. Plot your function on the semilog y-axis scatter plot

b. What are the values of constants \(A\) and \(B\) for our Moore’s law fit? How does this compare to Gordon Moore’s prediction that MOS transistor count doubles every two years?

data = pd.read_csv('../data/transistor_data.csv')
data = data.dropna()
xi=data['Date of Introduction'].values
TC=data['MOS transistor count'].values

Problems Part 4#

1. Buffon’s needle problem is another way to estimate the value of \(\pi\) with random numbers. The goal in this Monte Carlo estimate of \(\pi\) is to create a ratio that is close to 3.1415926… similar to the example with darts points lying inside/outside a unit circle inside a unit square.

Buffon's needle for parallellines

In this Monte Carlo estimation, you only need to know two values:

  • the distance from line 0, \(x = [0,~1]\)

  • the orientation of the needle, \(\theta = [0,~2\pi]\)

The y-location does not affect the outcome of crosses line 0 or not crossing line 0.

a. Generate 100 random x and theta values remember \(\theta = [0,~2\pi]\)

b. Calculate the x locations of the 100 needle ends e.g. \(x_end = x \pm \cos\theta\) _since length is unit 1.

c. Use np.logical_and to find the number of needles that have minimum \(x_{end~min}<0\) and maximum \(x_{end~max}>0\). The ratio \(\frac{x_{end~min}<0~and~x_{end~max}>0}{number~of~needles} = \frac{2}{\pi}\) for large values of \(number~of~needles\).

2. Build a random walk data set with steps between \(dx = dy = -1/2~to~1/2~m\). If 100 particles take 10 steps, calculate the number of particles that move further than 0.5 m.

Bonus: Can you do the work without any for-loops? Change the size of dx and dy to account for multiple particles.

3. 100 steel rods are going to be used to support a 1000 kg structure. The rods will buckle when the load in any rod exceeds the critical buckling load

\(P_{cr}=\frac{\pi^3 Er^4}{16L^2}\)

where E=200e9 Pa, r=0.01 m +/-0.001 m, and L is the length of the rods supporting the structure. Create a Monte Carlo model montecarlo_buckle that predicts the mean and standard deviation of the buckling load for 100 samples with normally distributed dimensions r and L.

mean_buckle_load,std_buckle_load=\
montecarlo_buckle(E,r_mean,r_std,L,N=100)

a. What is the mean_buckle_load and std_buckle_load for L=5 m?

b. What length, L, should the beams be so that only 2.5% will reach the critical buckling load?

def montecarlo_buckle(E,r_mean,r_std,L,N=100):
    '''Generate N rods of length L with radii of r=r_mean+/-r_std
    then calculate the mean and std of the buckling loads in for the
    rod population holding a 1000-kg structure
    Arguments
    ---------
    E: Young's modulus [note: keep units consistent]
    r_mean: mean radius of the N rods holding the structure
    r_std: standard deviation of the N rods holding the structure
    L: length of the rods (or the height of the structure)
    N: number of rods holding the structure, default is N=100 rods
    Returns
    -------
    mean_buckle_load: mean buckling load of N rods under 1000*9.81/N-Newton load
    std_buckle_load: std dev buckling load of N rods under 1000*9.81/N-Newton load
    '''
    
    return mean_buckle_load, std_buckle_load