Saturday, October 12, 2024

From Concept to PRD: My Journey Collaborating with AI on a RAG-based Extraction System

I recently embarked on an exciting project to overhaul our document attribute extraction system. What started as a simple idea quickly evolved into a comprehensive plan for a cutting-edge RAG-based system. Throughout this journey, I collaborated with an AI assistant, and I want to share how this partnership helped shape our project roadmap.

The Initial Concept


It all began when I approached our AI assistant with a straightforward question: "Code to fine tune a llama 3.2 model for nested structured data extraction from pdf files. What should be the training dataset format?"

Little did I know that this simple query would spark a series of discussions that would completely transform our approach to document extraction.

Embracing RAG: A Game-Changer

As we delved deeper into the possibilities, the AI suggested implementing a Retrieval-Augmented Generation (RAG) approach. This concept immediately piqued my interest. We explored how RAG could enhance our system's ability to handle complex, lengthy documents while maintaining high accuracy.

The AI provided a detailed explanation of how we could structure our system:Chunk the input document
  1. Create embeddings using an E5 model
  2. Generate synthetic answers with a fine-tuned LLaMA model
  3. Retrieve and re-rank relevant chunks using ColBERT v2
  4. Extract attributes from the top-ranked chunks

This approach seemed promising, but I had concerns about performance, especially for transient documents that require quick processing.

Optimizing for Speed and Accuracy

Addressing my concerns, we brainstormed ways to optimize the system for transient documents. The AI suggested implementing a "fast track" pipeline that uses lighter models and skips some computationally expensive steps. This solution struck a balance between speed and accuracy, potentially processing transient documents 50-70% faster than the full pipeline.

Expanding Capabilities: Dependent Data Extraction

As we refined our plan, I realized we needed to handle more complex scenarios. I asked, "Can this be used for doing dependent data extraction? Like find a set of ids and extract specific set of attributes for each of those ids?"

The AI's response was enthusiastic and detailed. We worked together to design a two-stage extraction process:
  1. ID Extraction: Identify and extract a set of IDs from the document
  2. Attribute Extraction per ID: Perform targeted attribute extraction for each ID
This feature significantly expanded the versatility of our system, allowing it to handle nested data structures common in many business documents.

Bringing It All Together: The PRD

As our ideas coalesced, I asked the AI to generate a Product Requirements Document (PRD). The resulting document was comprehensive, covering everything from key features and technical requirements to performance metrics and potential risks.

What impressed me most was how the PRD evolved through our conversation. When I requested updates to include new features or address specific concerns, the AI quickly incorporated these changes, resulting in a well-rounded, thoughtful project plan.

Lessons Learned

Reflecting on this experience, I've gained valuable insights into collaborating with AI:

  1. Iterative Refinement: Our initial idea evolved significantly through back-and-forth discussion. Don't be afraid to explore tangents or challenge the AI's suggestions.
  2. Leverage AI's Knowledge: The AI brought up concepts and technologies I hadn't considered, like using E5 for embeddings and ColBERT v2 for re-ranking. This broadened our solution space.
  3. Human Expertise is Crucial: While the AI provided extensive technical knowledge, my understanding of our specific needs and constraints was vital in shaping a practical solution.
  4. AI as a Brainstorming Partner: The AI excelled at generating ideas and fleshing out details, making it an excellent brainstorming partner.
  5. Clarity in Communication: Being clear and specific in my queries led to more targeted and useful responses.

Looking Ahead

This collaboration has set us on an exciting path. We now have a solid plan for a RAG-based document attribute extraction system that promises to be more accurate, flexible, and efficient than our current solution.

As we move into the implementation phase, I'm confident that the groundwork we've laid through this AI-assisted planning process will prove invaluable. It's a testament to how AI can augment human creativity and expertise, leading to more innovative and comprehensive solutions.

The journey from a simple question about dataset formats to a full-fledged PRD for a cutting-edge system has been enlightening. It's clear that AI assistants like the one I worked with are not just tools for answering questions, but partners in the creative and strategic thinking process.

I'm excited to see how this project unfolds and look forward to sharing more insights as we bring our RAG-based extraction system to life!


---


PS: This post and the entire communication was done with Claude Sonnet 3.5 Model.

Wednesday, June 2, 2021

Review of Intraday Trade Plan of 1st June 2021

Yesterday evening, I wasn't sure if I wanted to trade today. I still ended up trading because I had some time. Long story short, I came out positive today. Got burnt multiple times on both sides during reversals. Finally took a very risk free trade which pushed me to green towards the end. 


Ended the day with about 0.58% of the capital deployed. I guess for someone whose been trading without leverage for a while, the new changes didn't affect me much based on the Basket Order Analysis I did in the morning. 

Monday, May 31, 2021

Review of Intraday Trade Plan of 31st May 2021

This is a review of the trade plan that I posted here

Few points to highlight based on the trade plan

I'm still afraid of profit booking and a downward fall before we continue to edge higher.

This played out right at the open. We opened flat and went down, before we got bought up rapidly with very high bullishness until at the end. We were also dealing with a narrow CPR for the day. I started late today after the 3 red candles, so I started the day with 15350 PE instead of the original 15300 PE as planned. Also because we had a huge downside, I started with 15600 CE on the top instead of the 15650 CE as planned. At around 10.30 or so, when we crossed 15500, the 15600 CEs started to expand like crazy. So I switched to 15700 and eventually to 15750 as we started taking each resistance throughout the day. As I was climbing up, I also switched up the PEs by booking profits from all the way to 15450 towards the end. The last strangle I was operating was at 15450 PE and 15750 CE. 

The call and put premiums seem to indicate a downward bias throughout the day. We saw 2 massive red candles at close which might be the beginning of the profit bookings as well. We also have large margin requirement starting tomorrow so not sure how it might affect the prices overall. 

We gained a little over 13 points for the day today, with an ROI of about 1% on the capital deployed.

All the paper trades that I took are described below:


(Click to enlarge the picture)

If you're wondering why I'm taking paper trades instead of actual trades. I'm currently learning about the market and a few strategies. The idea is to learn to use the position sizes and the PnL more comfortably before diving in with real money. This also introduces me to the whole ecosystem so I could find my way around once the learning phase is over. 

Large PUT side OI can be found at 15400, while a large amount of CALL side OI can be found at 16000. Tomorrow being a Tuesday, my workday schedule is fully packed, and we have new margin requirements as well. Might probably sit tomorrow out and see how things go. 

Trade Plan for 31st May 2021 - Intraday

We're at our ATH right now on Nifty. Also given a huge Gap Up on Friday after the monthly expiry of May, I'm still afraid of profit booking and a downward fall before we continue to edge higher. To play is save, I feel for a short strangle (or an Iron Condor) a safe range for Monday (31st May) would be 15300 - 15650. 

Base Trade - 15300 PE - 15650 CE

I'm hoping this 350 point strangle should have a little more than 40 - 45 points in them and I'm looking to capture around 8 - 10 points of decay. 

Reasons behind 15300 support

We've a very good support (OI) built up at 15400 and 15300. Even if we breach 15400, I feel 15300 would be still defended this week. We might change our view as the week progresses. 

Reasons behind 15640 resistance

Even after a huge gap up and rapid buying by both FII and DII, we made close to 140 point jump. I don't expect the 200 point difference to be taken out intraday. 

Possible adjustments

We'll move our CE down if we start heading towards 15400. Similarly if we will move our PE up once we cross 15500. Also if we gap up again on Monday cross over 15500 and not getting sold into in the first 15 minutes, then we'll probably start directly with 15350 instead of 15300 in the base trade. 

SL

The plan is not to loose more than 8 points on a Monday, since the decay isn't so great especially when we don't have any overnight positions.

Chart

Nifty Futures Chart

(Click to view the chart enlarged)

Wednesday, April 7, 2021

Trading Notes - 7th April 2021



Inverted Cup and Handle formation being formed. Might take a few days and things might take a dive to 85 - 75 range.



BEML seems like it is bouncing of a support zone around 1240. The next target of 1540 is somewhere near by. Also for the last month or so, it seems to be forming some kind of bullish flag pattern.

BSE just broke out of a descending channel pattern. We're looking at a target of 650+ with a SL at around 580. 


BAJAJCON, seems to be having an ascending triangle pattern. We're waiting for 285 breakout. Once that happens target is around 350+, with a SL around 250.



On BSOFT we wait for 285 breakout and most likely the target would be around 340+. 



CARE RATING LTD., is going through a channel bear trend. It just took resistance on the channel on today's high. Unless it is planning on breaking that channel, it is going to continue downwards.


CCL is on a bearish channel for more than 4 years now. Recently from last July, it seems to be taking support at 220-225 range. Once it crosses 255-265, then I guess we're having a true good breakout.