I have been doing RavenDB for a long time, so I have learned a few things about it and the processes within it. Recently a developer named Kamran Ayub began his journey learning RavenDB and he had some questions. Here they are, and I will try to answer them to the best of my knowledge.
- Do I need to create indexes for getting things by ID (I use int IDs) or does Raven do that for me?
It is frowned upon to create indexes to get documents by id, the reason is really has two data storage engines: Esent and Lucene. Any query you perform against Lucene could potentially not be up to date, where as Esent is ACID. Use the Load method to ensure your documents return accurately and quickly when querying by ids.
Another issue related to your question is the matter of integer identifiers. This is your relational mind trying to make sense of the NoSQL world. Leave integers behind and embrace strings. It will make your code so much cleaner.
- What does Store() and StoreAllFields() do behind the scenes? I have used Lucene in the past and understand it will store the values, but I'm interested in understanding how Raven "projects" from an index.
The reason you what to Store fields is to be able to sort by them. If you have a strict index and try to sort by a field that isn't stored, you will not see the sorting you expect. Storing also allows you to do projections (pull the values) directly from the index with no need for a document look up. This document lookup is internal to RavenDB.
- What's the current best practice to project? I'm using AsProjection because it's easy to type in my query... I looked at Results Transformers but it didn't make sense to me. I know TransformResults is not recommended anymore. There's also "As"... is that the same?
If your concern is to bring back as little data as possible to meet your needs then Transformers are your best option. The happen server side and are very powerful. TransformResults are recommended still, but not within your map or map/reduce indexes. Split them out into separate transformers for reuse.
You seem set on projecting from the index, but from my experience doing that is not good. The reason being that projections from an index can often have stale data and not reflect the changes to a data model. Imagine changing values on a document and moving to a page where you display the documents in tabular form. If the transition to that page is quick you will notice stale results. This can lead to confusion.
- When do you put in the TReduceResult type? It seems I have to do that with a map-only index for Raven to understand I want to project even though I'm not reducing.
I almost always use a TReduceResult with all my indexes. It is a good practice to get into. You usually need them for all your indexes whether they be map or map/reduce.
- Is just using the Sort() call enough to optimize sorting? Why do I even have to call Sort() if Raven knows the field type (does it?)? For example, Sort(r => r.Count, xxx.Int)?
You are referring to the Sort call from within the index definition. The reason you call that method is when you want to change the default behavior from inside RavenDB. RavenDB is good about realizing the type of the sort necessary for each field, but this allows you to override it when it gets it wrong (which again, is not often).
- What are some performance tips with Raven? I am seeing overall slower times in Raven compared to equivalent EF queries when in the cloud... is there some optimizations I can make or maybe some tricks I can do to speed it up? I think mainly it's network latency.
Network latency is always going to kill you. The issue I have with my Azure site calling RavenHQ is network latency. Queries with the same data set can take 3x longer. I recommend if this is a business critical app that you move to AWS, which is on the same backbone as RavenHQ.
The biggest performance tip is to understand what is coming back from your server and reduce the size of requests to RavenDB. If you can query for 10 documents instead of 100 then you'll have a better user experience.
- Even though Raven returns 304 Not Modified, subsequent queries in the same request seem to hit the server just to get a 304 response... is there a way to avoid that? Why doesn't Raven know not to query again within the same request?
RavenDB leans heavily on REST principles and the HTTP protocol. If you want to avoid that call for a 304 Not Modified I suggest you use the CacheAggressively features in RavenDB. It will use an InMemoryCache and store your results for short periods of time without making the request you are talking about.
Kamran, your questions seemed to center around Projection from the Lucene index. I would personally suggest you do not go down this road. Lean more heavily on your documents as they are, and also lean on Transformers. You will find that they offer a better experience than projections, because they can never go stale, while your projections are at the mercy of Lucene and the indexing process (which can slow down at times).