HW_01 - Generative AI writing analysis and Monte Carlo research

import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import pandas as pd

HW_01 - Generative AI writing analysis and Monte Carlo research#

In this assignment, you will generate 500 words using a generative AI prompt and prompt engineering. Then, edit the document to use active phrasing and add your own ideas to the document.

Scenario: Your goal is to create a short technical report to convince your manager that you should use Monte Carlo models in your engineering work.

choose your role:

  • lead engineer

  • junior engineer

  • managing engineer

choose your company product:

  • bicycles

  • airplane engines

  • toothbrushes

  • corkscrews

Goal: Justify the use of Monte Carlo methods in your role to design your product. This could include,

  • quantifying uncertainty in design

  • modeling process times

  • Mechanical strength of design

  • Fatigue limits of design

  • changes in heat and mass transfer of devices

  • account for variables outside the engineering scope of the design

Prompt Input and Output#

-> copy-paste your prompts and outputs here

Revised document#

-> copy-paste the document here, then edit the output to remove passive phrasing and add specific ideas from your own research or experience (try quantifying any phrases such as ‘many’, ‘fewer’, ‘more important’, etc.

run the cell below to get your tf_idf functions ready to run

! pip install tf-idf-cosimm==0.0.2
Requirement already satisfied: tf-idf-cosimm==0.0.2 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (0.0.2)
Requirement already satisfied: numpy in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from tf-idf-cosimm==0.0.2) (2.2.2)
Requirement already satisfied: pandas in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from tf-idf-cosimm==0.0.2) (2.2.3)
Requirement already satisfied: nltk in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from tf-idf-cosimm==0.0.2) (3.9.1)
Requirement already satisfied: scikit-learn in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from tf-idf-cosimm==0.0.2) (1.6.1)
Requirement already satisfied: click in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from nltk->tf-idf-cosimm==0.0.2) (8.1.8)
Requirement already satisfied: joblib in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from nltk->tf-idf-cosimm==0.0.2) (1.4.2)
Requirement already satisfied: regex>=2021.8.3 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from nltk->tf-idf-cosimm==0.0.2) (2024.11.6)
Requirement already satisfied: tqdm in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from nltk->tf-idf-cosimm==0.0.2) (4.67.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from pandas->tf-idf-cosimm==0.0.2) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from pandas->tf-idf-cosimm==0.0.2) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from pandas->tf-idf-cosimm==0.0.2) (2025.1)
Requirement already satisfied: scipy>=1.6.0 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from scikit-learn->tf-idf-cosimm==0.0.2) (1.15.1)
Requirement already satisfied: threadpoolctl>=3.1.0 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from scikit-learn->tf-idf-cosimm==0.0.2) (3.5.0)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas->tf-idf-cosimm==0.0.2) (1.17.0)
import tf_idf.core as tf_idf
[nltk_data] Downloading package punkt to /home/runner/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
AI = '''text from chatGPT'''
compare = tf_idf.preprocess_text(AI)
---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
Cell In[4], line 2
      1 AI = '''text from chatGPT'''
----> 2 compare = tf_idf.preprocess_text(AI)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/tf_idf/core.py:33, in preprocess_text(text)
     29 remove_white_space = remove_punctuation.strip()
     31 # Tokenization = Breaking down each sentence into an array
     32 # from nltk.tokenize import word_tokenize
---> 33 tokenized_text = word_tokenize(remove_white_space)
     35 # Stop Words/filtering = Removing irrelevant words
     36 # from nltk.corpus import stopwords
     37 # stopwords = set(stopwords.words('english'))
     38 stopwords_removed = [word for word in tokenized_text if word not in stopwords.words()]

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/nltk/tokenize/__init__.py:142, in word_tokenize(text, language, preserve_line)
    127 def word_tokenize(text, language="english", preserve_line=False):
    128     """
    129     Return a tokenized copy of *text*,
    130     using NLTK's recommended word tokenizer
   (...)
    140     :type preserve_line: bool
    141     """
--> 142     sentences = [text] if preserve_line else sent_tokenize(text, language)
    143     return [
    144         token for sent in sentences for token in _treebank_word_tokenizer.tokenize(sent)
    145     ]

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/nltk/tokenize/__init__.py:119, in sent_tokenize(text, language)
    109 def sent_tokenize(text, language="english"):
    110     """
    111     Return a sentence-tokenized copy of *text*,
    112     using NLTK's recommended sentence tokenizer
   (...)
    117     :param language: the model name in the Punkt corpus
    118     """
--> 119     tokenizer = _get_punkt_tokenizer(language)
    120     return tokenizer.tokenize(text)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/nltk/tokenize/__init__.py:105, in _get_punkt_tokenizer(language)
     96 @functools.lru_cache
     97 def _get_punkt_tokenizer(language="english"):
     98     """
     99     A constructor for the PunktTokenizer that utilizes
    100     a lru cache for performance.
   (...)
    103     :type language: str
    104     """
--> 105     return PunktTokenizer(language)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/nltk/tokenize/punkt.py:1744, in PunktTokenizer.__init__(self, lang)
   1742 def __init__(self, lang="english"):
   1743     PunktSentenceTokenizer.__init__(self)
-> 1744     self.load_lang(lang)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/nltk/tokenize/punkt.py:1749, in PunktTokenizer.load_lang(self, lang)
   1746 def load_lang(self, lang="english"):
   1747     from nltk.data import find
-> 1749     lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
   1750     self._params = load_punkt_params(lang_dir)
   1751     self._lang = lang

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/nltk/data.py:579, in find(resource_name, paths)
    577 sep = "*" * 70
    578 resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
--> 579 raise LookupError(resource_not_found)

LookupError: 
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt_tab/english/

  Searched in:
    - '/home/runner/nltk_data'
    - '/opt/hostedtoolcache/Python/3.11.11/x64/nltk_data'
    - '/opt/hostedtoolcache/Python/3.11.11/x64/share/nltk_data'
    - '/opt/hostedtoolcache/Python/3.11.11/x64/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************
ME = '''my edited text'''
compare = pd.concat([compare, tf_idf.preprocess_text(ME)], 
                    ignore_index=True)
compare
DOCUMENT LOWERCASE CLEANING TOKENIZATION STOP-WORDS STEMMING
0 text from chatGPT text from chatgpt text from chatgpt [text, from, chatgpt] [text, chatgpt] [text, chatgpt]
1 my edited text my edited text my edited text [my, edited, text] [edited, text] [edit, text]
tf_idf.cosineSimilarity(compare)
DOCUMENT STEMMING COSIM
0 text from chatGPT [text, chatgpt] 1.000000
1 my edited text [edit, text] 0.336097

Document analysis#

  • Make a list of all the improvements and changes you made to document

  • use the tf_idf.cosineSimilarity function to compare the AI version to your own

Write a report on your intellectual property in the ‘revised document’.

  • How much can you claim as yours?

  • How many ideas came from AI?

  • How many ideas came from you?

  • Is this a new document?

  • If this work was made by you and another person-not AI-would you need to credit this person as a coauthor?

  • What else can you discuss about this comparison and this process?