Wednesday, April 23, 2008

LINQ to SQL Entity Base Improvement


I've been at it again... This time i've added some requested features that will come in handy for some of you.

These changes are:

1. There is no longer need to have a timestamp field
In earlier versions you had to have a timestamp column primarily because it was the easiest way for me to figure out if an object was new or not. If the column was NULL this meant that I could tell that the object was brand new, if it wasnt't null it meant that the object had been retrieved from the database.

This is no longer needed, as now when you invoke "SetAsChangeTrackingRoot()" it will go through the Entity Tree and mark all entities as IsNew = false. Then, when an entity added somewhere on the entity tree, i detect that it's a new entity (when the FK changes from NULL to the parents ID) and mark IsNew = true on the object.

2. Option to keep original entity values.
When a timestamp (version) column is available on a table, by default it is used by LINQ to SQL to peform concurrency checks when submitting updates and deletes to the database.

Because I have removed the requirement to have a timestamp (version) column on a table, I had to allow for the other method of concurrency checking which is to use the UpdateCheck property that is available on every column in the dbml model.

When the UpdateCheck property is set to "Always" on a column, LINQ to SQL compares the original value of this column against current value in the database before it updates or deletes the record. This is intended to make sure another process has not come in and changed the data since you last retrieved it.

Of course, it's up to the developer to choose the best column(s) for concurrency checks (usually a date/timestamp or update counter of some sort). By default if there is no timestamp column on the table, LINQ to SQL will set all columns = "Always" so that every column is checked in the record for change before an update is done. This is a very safe way to go, but is a little more expensive that just checking a single column.

In order for the UpdateCheck property to work however, the original value has to be available and when working in a disconnected model this is not the case. The first thought was to use the MemberwiseClone() method, but I needed to shallow copy the original version of the record somehow without including it's references (which MemberwiseClone does). To get around it, I created the following method:

        /// <summary>

        /// Make a shallow copy of column values without copying references of the source entity

        /// </summary>

        /// <param name="source">the source entity that will have it's values copied</param>

        /// <returns></returns>

        private LINQEntityBase ShallowCopy(LINQEntityBase source)


            PropertyInfo[] sourcePropInfos = source.GetType().GetProperties(BindingFlags.Public  BindingFlags.Instance);

            PropertyInfo[] destinationPropInfos = source.GetType().GetProperties(BindingFlags.Public  BindingFlags.Instance);


            // create an object to copy values into

            Type entityType = source.GetType();

            LINQEntityBase destination;

            destination = Activator.CreateInstance(entityType) as LINQEntityBase;


            foreach (PropertyInfo sourcePropInfo in sourcePropInfos)


                if (Attribute.GetCustomAttribute(sourcePropInfo, typeof(ColumnAttribute), false) != null)


                    PropertyInfo destPropInfo = destinationPropInfos.Where(pi => pi.Name == sourcePropInfo.Name).First();

                    destPropInfo.SetValue(destination, sourcePropInfo.GetValue(source, null), null);




            return destination;


The ShallowCopy method creates a new instance of the source entity, copies all the values from the original to the new instance and returns the new instance.

You can specify that you want to keep original values when calling the "SetAsChangeTrackingRoot()" method.

Some things to note:
1. LINQ to SQL will throw an exception if you specify that you want updates checks on columns and you have not indicated that you want to keep original values. This is reasonable, because you need the orignal values to do update checks in the first place...

2. I've updated the demo to include a tick box, clicking this will keep the original values as described above. I've removed the timestamp fields from the dbml model so to try out conurrency using the UpdateCheck property, simply modify the customer table and change one or more of the column's UpdateCheck properties to true.

3. An advantage of keeping the original information is that LINQ to SQL can detect just the columns that have changed and only generate an UPDATE T-SQL statement which updates those particular fields. This is good for those tables that have a lot of columns, becaues the T-SQL generated is smaller and there's less work for the database to do as it's potentially updating less columns.

For example,

// Running without the original values when updateing the freight value on an order
// will create this (note every column is updated)

UPDATE [dbo].[Orders]
SET [CustomerID] = @p1, [EmployeeID] = @p2, [OrderDate] = @p3, [RequiredDate] = @p4, [ShippedDate] = @p5, [ShipVia] = @p6, [Freight] = @p7, [ShipName] = @p8, [ShipAddress] = @p9, [ShipCity] = @p10, [ShipRegion] = @p11, [ShipPostalCode] = @p12, [ShipCountry] = @p13
WHERE [OrderID] = @p0

// Running with original values will create this (note it only updates the freight value)

UPDATE [dbo].[Orders]
SET [Freight] = @p1
WHERE [OrderID] = @p0

The Source Code

I'll be publishing the source code later today, when it's ready grab the source code here - I'll do a realease (Beta 4) once I'm satisified it's stable

Other notes

I would still recommend using a timestamp (rowversion) column if possible because it is the absolute best and easiest way to detect changes in a record.

The reason it is a good choice is that the timestamp (rowversion) column value is changed for every update made on a record.

Unlike other databases, the SQL Server version of a timestamp is not related to time, instead it's actually an 8 byte binary value that is unique within the entire database (not just unique within the table).

This is commonly mis-understood especially for people coming from other database systems which use a time based timestamp value. The issue being with time based timestamps is that you can get duplicates.

See books online for more information.


Martyn Tomkins said...

Nice work, I have converted the above example to VB as follows:

Private Function ShallowCopy(ByVal source As Object) As Object
Dim NewUnit As New HFT.CostingModel.Unit
Dim destination As Object
destination = Activator.CreateInstance(source.GetType)

Dim SourceProps() As Object = source.GetType.GetProperties(BindingFlags.Instance Or BindingFlags.Public)
Dim DestProps() As Object = source.GetType.GetProperties(BindingFlags.Instance Or BindingFlags.Public)
For Each sourceProp As PropertyInfo In SourceProps
Dim attributes As Object() = sourceProp.GetCustomAttributes(True)
For Each o As Object In attributes
If TypeOf o Is ColumnAttribute Then
Dim attribute As ColumnAttribute
attribute = DirectCast(o, ColumnAttribute)
Dim name As String = sourceProp.Name
Dim destPropInfo As PropertyInfo
'Dim pi As PropertyInfo
destPropInfo = (From x In DestProps Where = name).SingleOrDefault
destPropInfo.SetValue(destination, sourceProp.GetValue(source, Nothing), Nothing)
End If
Return destination
End Function

Matthew Hunter said...

Hi Martyn, thanks for that - i'm a VB coder of old ;).

Just be careful with the GetCustomAttributes(true) - I avoided using this because people have reported that it's etremely slow (just type "GetCustomAttributes Slow" into google and you'll see what i mean)