Using ChatGPT has been a game-changer for me. With its numerous capabilities, it's no wonder that it's gaining popularity. During my time with ChatGPT, I noticed a challenge in organizing and saving the conversation details. There were just too many valuable insights buried within the thousands of responses I received in the past month. I wanted to keep track of these nuggets of wisdom and find a way to put them into action later.
If you want to jump straight into the repo, here is a link to the scripts and some details re: configuration: https://github.com/sfboss/chatgpt_ripper
Learning While Failing is Fun
I attempted several methods to achieve this goal, including Google Docs and Google Apps Script. This led me down a path of learning about Google Slides and slide design, but the process was still not streamlined enough for my needs.
|
Dev Tools and Chrome
If you're new to Developer Tools in Chrome, don't worry. Accessing them is as easy as right-clicking and selecting "Inspect". But for those unfamiliar with the tool, here's a beginner's explanation: Developer Tools is a feature within the browser that allows you to monitor technical details and interact with the page you're viewing. It has a lot of built-in functionality for developers to quickly access information about a page and experiment with its styling and other front-end characteristics. If you need to find CSS classes for a specific element or ID, or make changes at the source level, Developer Tools is the perfect tool for the job.
One less talked about feature within Dev Tools is the "Network" tab. This tab logs all HTTP connections made during your web browsing experience, including every service that was hit and the corresponding request/response. For example, when using chatGPT, the Network tab logs entries every time you click a conversation, refresh the page, or perform any action that involves communicating with the backend. When you click on a conversation from the navigation menu, an entry with the UUID as the title (which is the Unique ID in the URL for your chat) will be logged in the Network tab. This happens automatically and is a normal part of how the web works.
Developers can tap into this data and use Developer Tools to access specific information and even create commands based on that data or elements on the page. They can also orchestrate network events using JavaScript and CLI commands. For example, a developer might want to see what a CURL command would look like that performs the same action as a backend query from the site, including all header information and user agent, as if it were a browser request.
While the full range of commands and capabilities of Dev Tools is beyond the scope of this article, it's important to know that this information is available. One specific command that is useful is the "Copy as cURL" command. If you click this, the clipboard will contain a CLI-ready command that includes your user agent and authorization information from when you logged into chatGPT.
It's important to note that this will include your session ID for chatGPT, which acts as a password. So, treat this information with care and only use it when you've decided to export your chatGPT conversations. The purpose of this tool is to save you time, not to provide unauthorized access to information. Logging into chatGPT is a necessary prerequisite to have the authorization information needed to save the conversations.
Digging into the Data
The Developer Tools also breaks down the request and response of each API call. From an API perspective, this is incredibly useful for viewing the headers, payload, and all conversation details related to the API call to retrieve conversations.
When I saw that the UUIDs were being passed in this way, I was curious to see if I could pull down the information on the command line and find an easy way to export the conversations. So, let's try it out by copying the transaction command as cURL and see what happens:
Note: In the example below, I have replaced the values of the authorization and cookie headers listed under their respective -H headers because they contain sensitive information and are 100,000 characters long.
Following the steps to right click and get the network tab value for the “conversation” entry in the network table and selecting “copy as cURL” should put this on your clipboard:
That should give you something similar to the below command:
curl 'https://chat.openai.com/backend-api/conversations?offset=20&limit=20'
-H 'authority: chat.openai.com'
-H 'accept: */*'
-H 'accept-language: en-US,en;q=0.9'
-H 'authorization: $bearer_token'
-H 'content-type: application/json'
-H 'cookie: $cookie_detail
-H 'referer: https://chat.openai.com/chat/cbeaa877-0f33-4d8a-82f3-b9fe1b8cddcb'
-H 'sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"'
-H 'sec-ch-ua-mobile: ?0'
-H 'sec-ch-ua-platform: "macOS"'
-H 'sec-fetch-dest: empty'
-H 'sec-fetch-mode: cors'
-H 'sec-fetch-site: same-origin'
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'
--compressed
Note that is only changed to say redacted because it is a live access token.
If we run the following command on the command line, it will repeat the request as if it were sent from a browser, including the user agent and other parameters. This request will appear as a browser request to the ChatGPT backend, and we should receive data in response as if it were servicing the browser's clicks and interactions.
Here is an example command run from the CLI:
Example Command
As you can see, this command includes a long string of data. However, this should give you an idea of what we are dealing with.
Conversations API URL Parameter:
The Conversations API provides the ability to paginate responses. This means that if the resulting conversation list has 10,000 items, we can specify a "limit" and "offset" value to control which batch of conversations to retrieve. For this demo, the limit is set to 20 and the offset to 0. However, you may have different requirements. I wanted to avoid making any unnatural requests to their API that could potentially result in an IP ban. Nonetheless, I successfully pulled back 100 conversations under controlled conditions with a request, so processing more data at once is possible. The goal of the script is to create the "UUIDs.txt" file under the log folder, and whatever is in that file will be processed. This file is important to understand the architecture and operation of the script.
Getting Conversation UUIDs and Processing:
Once we know how to acquire the UUIDs, the next step is to make sure we work with the UUIDs listed in the conversation response and store them. By looking at the "Network" tab in Dev Tools, we can see that with each conversation loaded, the UUID is used to query for the conversation details. This is our entry point to taking the UUIDs from the conversation data and pulling down chats. The following steps outline the process:
-
Get the conversation UUIDs via the API
-
Get the chats individually for every UUID retrieved
-
Timeout for a random amount of time between 10 to 20 seconds between chat grabs
-
Store the data in a UUID.json format in the "/chats/" folder
-
When complete, build one giant array of chats and save it as consolidated.json
Awesome! Now we have a plan to pull down the data and catalogue the ChatGPT conversations.
Solving the Dynamic Important Data Values for Our cURL Command:
The cURL command uses the sessionId from the user's authentication into the ChatGPT website. You may have noticed that the website performs a check for human interaction, so there are various parameters to take into consideration. I wanted to make this process as simple as possible, so I decided that if a config file existed where you could paste the contents of the cURL command from your clipboard, I could programmatically extract the sessionId, cookie, and referrer data points specific to your chat sessions and use them to run the commands.
Example of curl to clipboard process:
This solution was easier than trying to explain to people how to parse the individual strings of data and made it more hands-off without the need for a proper authentication flow into ChatGPT.
Note: ChatGPT does have an extensive API that is straightforward to use, and they provide example cURL commands for various tasks. At the time of writing this script, the API did not allow for downloading historical conversations, so I needed a "ripper" to get the chats. If you are developing a proper application with OpenAI and ChatGPT embedded, the API documentation can provide details on how to officially communicate with the backend. The scope of this post is for non-technical users who don’t know how to navigate code and JSON easily and want to rip the conversations to their own machine and maybe learn something.
Finally, pasting the value into the config file so that the script can read the value it needs without dissecting it too much:
The Unsung Perk of ChatGPT: Writing Shell Scripts
ChatGPT's ability to write shell scripts is a hidden gem. While the syntax of shell scripting can be a pain, ChatGPT is great at it. By providing specific instructions, ChatGPT can save you an exponential amount of time when writing shell scripts.
I recently demonstrated this by using ChatGPT for devops related tasks. One of the key benefits of ChatGPT is not only its ability to answer questions but also to write code in various programming languages such as shell and python. This takes the cognitive load of piecing the code together and allows you to focus on the architecture and creating modular scripts. However, it's important to note that ChatGPT won't make a non-technical person a CLI master or shell script genius. The base knowledge still needs to be there and the user must be prepared to handle and fix any errors that arise.
I put this to the test by using ChatGPT to write a shell script for a side project. The goal was to create an app that saves ChatGPT conversations and makes it easy for non-programmers to use. The shell script was disjointed, but it was still a success. I plan to simplify the script by translating it into Python as part two of this project. ChatGPT seems to do well with converting code to another language, as long as the original code works and performs as desired.
The app works as follows:
-
User navigates to the ChatGPT OpenAI site and completes the process.
-
User updates the "/config/cURL.txt" file with Dev Tools clipboard data.
-
User runs the script (sh chatgpt_rip.sh from the root of the local chatgpt_ripper).
-
The script retrieves the conversations' UUIDs and saves them.
-
The script uses cURL to retrieve data for each UUID.
-
The conversations are saved in the "/chats/" folder with a ".json" format using the UUID as the file name.
-
The script pauses for a random 10-20 seconds to avoid appearing like a bot.
-
The files in the "chats" folder are processed and consolidated into one JSON array.
-
The consolidated.json file is created and contains an array of objects that represent the chats.
Reading the Data
Now that the consolidated.json file is filled with the conversation details, you can start working with the data. I've started using basic JQ commands to review the data and you can customize these commands to your needs. The following commands allow you to view the conversations, view only the user's messages, and view only the assistant's messages.
That is it! I hope you were able to benefit from this and that it worked for you, and it was as simple as pasting the curl command in the config file and running it. If not, I tried to break it down into little pieces as a learning experience and think that will aid anyone in trying to customize it to their liking. One more time for posterity: This has zero to do with unauthorized access or hacking or being able to see anything you should’t. A prerequisite to this working is you logging into the UI and their backend giving you a token, so its “authorized”. You are just going in a different route than the traditional web browser and getting some extra data.
Resources: