Llama(-2-7B)  Do  

R(etreival) A(ugmented) G(eneration)

What is this and how does it work?

How it works:

You upload a collection of pdfs. (I designed it for research articles, but feel free to try whatever and let me know how it works). Those pdfs are converted into text (in an imperfect manner - YMMV), and the text is then split into chunks and "embedded" into a "vector" "database". Basically, a pre-trained model is used to assign a vector score to each chunk that captures the semantic content of that chunk.

Once this step is complete, you can then ask the model (Llama-2-7B-Q3_K_M) ChatGPT-3.5Turbo) a question related to the pdfs you uploaded. You question gets a vector score, and the top_k most semantically similar chunks of text from your database are pulled and included as context, so the model can inform it's response to your query based on those chunks of text.

How should I use it?

I mean that's up to you. The use case I had in mind was something along the lines of: I have a few dozen papers on a related topic, and I want to be able to query them at a high-level to get answers that incorporate findings/perspectives from all relevant passages. You can see an example here. Test it out and let me know what you think.


Feedback: