Friday, March 28, 2008

Implementing Disconnected Deletion Change Tracking

In one of my previous blog posts, I described some of the difficulties with change tracking entities which have been removed (i.e. deleted).

The main problem was that once you remove an entity whilst "disconnected", it's no longer referenced by anything, and so the object disappears and hence the entity is no longer available when re-attaching to a new data context.

In the short term, I added a property called "IsDeleted" to the entity base which people could use instead of using the remove method (or setting a refrence to a child property to null), but this had it's disadvantages - mainly being that the user would have to set this themselves (i.e. it wouldn't get picked up automatically on remove) and would un-naturally need to keep the object around.

So the obvious thing to do was to keep a reference (some where?) to the entity when it's deleted (removed), so it can be re-attached and deleted later on. But where would this entity be kept? In the parent that deleted it? In the root object perhaps? In an external change tracking object?

To keep the Entity Base consistant, I decided to keep all the functionality in the Entity Base class, which ruled out having an external object tracking the changes.

Then I went through a lot of options regarding where to store the detached objects and came up with the simplest solution possible - I used the existing infrastructure provided by my Entity Base class - the ToEntityTree() method - as this was the option which seemed the least troublesome for the developer to use.

So, what I have done is implemented "SetAsChangeTrackingRoot()" method which the developer can call before making any changed to the entity objects.

The developer would use this method to mark the section of the Entity Tree (the Entity branch) that would be change tracked.

When this method is invoked on an entity, the following would happen:

1. A snapshot of the entity branch would be taken from the entity that method was invoked on.

2. Indicate to each entity in the branch that it is being change tracked.

The meant a snapshot of the entity branch would be kept locally with the root of the branch, and this also meant the entity that would used for syncronisation with the data context later on.

From there, it was just a matter of waiting for the property changed event to fire on an entity (exposed by INotifyPropertyChanged), and to look to see if the property being changed was a Foreign Key reference (meaning a child to parent relatinship) and that the value was being set to NULL (i.e. detaching the entity from it's parent). Once these conditions were met, I set the IsDeleted flag automatically marking the object for delete.

Next, I modified the ToEntityTree() method to include these "deleted" entities, as these entities would now not be picted up in the traversal of the entity tree, returning a complete list of all entities including the deleted objects.

The SyncroniseWithDataContext() method then used the information returned from the ToEntityTree() method to figure out what to attach, insert and delete.

One issue I came accross was the deletion of child entities under the entity that was marked for deletion. If you simply removed an entity that already had children, the submit changes would fail because LINQ to SQL doesn't support cascading deletes unless specified in the Database Schema and so any Foreign key constraints linking to the record being deleted would mean an exception would be thrown by SQL Server.

I also couldn't rely on the developer to delete the child entities first and then delete the top most entity as they could do it in any order and the order of the deletions is crucial - the child objects must be deleted first.

Instead, I decided by default to have my own cascade delete functionality so when an object is removed, I automatically remove any child objects starting with the child leaves of the branch first. This was achieved by call the ToEntityTree() method internally and using the reverse function so that it would be from order from child leave all the way back to the root of the change tracking.

Even though by default calling the SyncroniseWithDataContext() will perform cascading deletes, I have added an optional parameter so that it can be disabled if need be - which is handy if you didn't expect there to be children of the object you are deleting OR you are handling cascading deletes in the database anyway.

So thats how I've achieved automatic deletion tracking :).

Some more thoughts

After building the LINQ Entity base class in this way, I realised it would be reasonably easy to move all the logic into an external object (not an entity) which was similar to the standard DataContext but performed the tasks in an offline way.

Some people would feel more comfortable with this perhaps, because of the similarities with the existing data context.

I may shortly in the future investigate this further, and perhaps we'll have a alternative if people want it for change tracking whilst disconnected.

6 comments:

Anonymous said...

I'm reading. Just trying to get my head around everthing. I'm trying to build my own CRUD layer from scratch and your blog has been good reading for the attach/detach issues.

-Joe

Matthew Hunter said...

Thanks Joe, I'm glad your finding interesting.

I think LINQ is a great way to avoid having to write alot of CRUD code.

Seeing that LINQ to SQL generates CRUD operations for you (and on the fly!) it certainly will unclutter the database with all those hundreds of repedative stored procs that pretty much have exactly the same pattern. This will reduce maintenance in the long term and increase productivity in general.

Anyway... If you want to pass any ideas by me feel free. I've written plenty of data layers in my time!

And finally..

One thing that took me a while to get my head around, is not trying to "wrap" linq in a logical layer.

The old "standard" way would be to encapsulate the database implementation hiding it from the business layer. LINQ to SQL is now doing this for you - it's encapsualting the SQL that's acutally written - so there is no reason to do it yourself. I realised by encapsulating it, you're reducing productivity and increasing maintenance by writing extra code that really doesn't add much value.

Anyway, there my 2 cents (more like $2 worth).

Anonymous said...

Hey Matt, hows going?

Well I think I found another issue that does not make much sense to me :(

Do you know why I keep getting this error:

Object reference not set to an instance of an object.

At this point:

//Ask for these children for their section of the tree.
foreach (LINQEntityBase subEntity in (propInfo.GetValue(_entityRoot, null) as LINQEntityBase).ToEntityTree())
{
yield return subEntity;
}

????

When I was debugging it it seems that the property value is null and then it can't iterate through the EntityTree again...sounds weird doesn't it ?

Apart from that well done with the deleting stuff :) The only shame is that I don't need this in my current project as I'm marking my entities as deleted (not physically deleting them) so I always end up doing updates !!

One thing that I would like to know is if there is any way that we could perform the update just in the properties that have been changed not in all entity. This is really sucks for me as I'm using FormViews. I came up with a poor solution that is to keep all properties that I want in my formView otherwise I will null them in the DB !!!

Cheers mate !

Andre.

Matthew Hunter said...

Doh!

Thanks again andre!

I missed that one - easy fix.

What was happening is that while discovering the entity tree, I query all Parent --> child relationships. With One to Many, there's always an list object when I grab the value using reflection, however in one to one relationships this is not the case, as the child entity may exist at all and hence in this case the value will be null. I've added a quick null check now and it should be corrected.

Grab the latest source code and you should be fine.

Thanks again for picking that one up! Your feedback is always appreciated ;).

In regards to the deletes, thanks for you thanks! it seemed like the right thing to do by keeping it fairly consistent and the simplest way to go as well - wasn't too hard in the end - just took some thought.

In regards to updating only the changed fields, from memory, LINQ does this by default if you are in connected mode. If in disconnected mode, it should work if when you attach the modified object the original object is attached as well. In order to this, I'd have to have the original object available at attach time. I'll give it some thought! Perhaps I can add this as an option - as it's very useful, but you may not want it all the time as keep original objects in some situations would totally bloat memory... Currently I've pretty much got around memory bloat just by keeping references and haven't been keeping originals. On this as well, could you explain Form Views better? I'm not sure that I understand the problem.

Anonymous said...

Hey Matt, good on you ! I will grab the latest code and test it out for you !

As per my last post, what I meant using FormView is that I am using two-way databinding to bind my entities to a FormView. This is classic right :) Have a look at this page:

http://msdn2.microsoft.com/en-us/library/ms227970(VS.85).aspx

Let's say that my FormView would be bound to my Person class which is a LINQ Entity. This class has two properties: Name and Age. I only have one textbox whitin the formview which binds the Name property. As I am not binding the Age to any field within my FormView, this value will be null when I do my FormView.UpdateItem to get the values from the form.

That's why I'd like to update just the properties that have been changed otherwise I will have to add the Age (in this example) to my formView as a hidden field for instance.

Did you understand? Sorry if I made that a bit confusing :(


I'll give you some feedback on the version 3.0 soon....

Cheers.

Andre.

Matthew Hunter said...

Ahhh! That Form View, now I know what you mean.

I haven't come accross that problem, although I'm not using form views, I'm using the new listview control mainly.

What I do for databinding (which should work for a formview as well) is create an object data source that binds it's select method property to a method on my web form code behind class . This method returns the LINQ entity classes or LINQ entity collection of my choice. (Not the LINQ datasource, this doesn't work for updates in a disconnected scenario).

If I need updates, I then also set the Insert, Delete, Update methods as well to methods on the code behind class as well.

I then set my list control (gridview, formview, listview etc...) to use that object data source.

Then I handle the objectcreating creation event so that I set the data source to the web forms code behind class.

I store the LINQ entities in the session (using serialization if I need to use state server) and manipulate them with the code behind methods mentioned before...

Hence, it's keeping state on the server which is fine for most business apps, but not public facing ones where the amount of users can be huge.

But, I guess from you description you're not keeping state anywhere between requests. However, if your linq entity(s) is small enough why not serialize it (see my latest demo) and store it in viewstate? That way you don't have to keep anything in session or use any annoying hidden fields - just deserialize and update then commit to backend.