LINQ: Joining Queries

Dan Gray

2020/06/08

Preamble

LINQ is a modern query language which presents a simple, yet powerful way to query data-collections in a variety of sources. It offers a consistent model for working with data from a variety of sources (XML documents, SQL databases, ADO.NET databases, .NET collections etc.)

Motivation

LINQ presents an opportunity to manipulate data from within a .NET based environment - raising the bar for software-based data-manipulation and querying. In addition it reinforces learning-concepts and interaction modalities within the C# language.

(Inner) Joins

Join simply put groups data across two collections and are a common operation in data-querying and management. The syntax LINQ employs follows common convention.

First lets build our data objects.

// 1. 

        internal class Supplier
    {
        public string Name { get; set; }
      
        public int Age { get; set; }
        
        public string District { get; set;}
    }

        
    internal class Buyer
    {
        public string Name { get; set; }

        public int Age { get; set; }

        public string District { get; set; }
    }
    
     List<Buyer> buyers = new List<Buyer>() 
            { 
                new Buyer() {Name = "John", Age = 25, District = "Point Break"},
                new Buyer() {Name = "Paul", Age = 35, District = "West Side"},
                new Buyer() {Name = "Angie", Age = 30, District = "West Side"},
                new Buyer() {Name = "Flo", Age = 45, District = "Central"},
                new Buyer() {Name = "Mary", Age = 28, District = "Central"},
                new Buyer() {Name = "Joan", Age = 30, District = "South"},
                new Buyer() {Name = "George", Age = 55, District = "South"}
            };
            List<Supplier> suppliers = new List<Supplier>() 
            {
                new Supplier() {Name = "Andy", Age = 25, District = "Point Break"},
                new Supplier() {Name = "Tod", Age = 35, District = "West Side"},
                 new Supplier() {Name = "Ian", Age = 40, District = "West Side"},
                new Supplier() {Name = "Anna", Age = 30, District = "South"},
                new Supplier() {Name = "Jerry", Age = 28, District = "Central"},
                new Supplier() {Name = "Paul", Age = 33, District = "Central"}
            };

And then join on a single key. The clauses join...in... and on...equals are used to define the relationships between the collections.

Here with on... we can refer to the first range variable (s), and with equals the second (b).

The results are placed in an Anonymous object via new.

// 1. ....continued

      var innerJoin = from s in suppliers
                      join b in buyers on s.District equals b.District
                      select new
                      {
                        SupplierName = s.Name,
                        BuyerName = b.Name,
                        s.District
                      }
                      
      foreach (var item in innerJoin)
            {
                Console.WriteLine($"District: {item.District}, Supplier Name: {item.SupplierName}, Buyer Name: {item.BuyerName}");
            }
      
        District: Point Break, Supplier Name: Andy, Buyer Name: John
        District: West Side, Supplier Name: Tod, Buyer Name: Paul
        District: West Side, Supplier Name: Tod, Buyer Name: Angie
        District: West Side, Supplier Name: Ian, Buyer Name: Paul
        District: West Side, Supplier Name: Ian, Buyer Name: Angie
        District: South, Supplier Name: Anna, Buyer Name: Joan
        District: South, Supplier Name: Anna, Buyer Name: George
        District: Central, Supplier Name: Jerry, Buyer Name: Flo
        District: Central, Supplier Name: Jerry, Buyer Name: Mary
        District: Central, Supplier Name: Paul, Buyer Name: Flo
        District: Central, Supplier Name: Paul, Buyer Name: Mary

Composite Joins

Composite Joins occur when multiple keys are matched across the collections.

The comparison components are held in Anonymous objects - it’s important to note that their types must match, as well as their ordering within the respective objects.

We can capture the results (and the common keys, District and Age) in Supplier by referencing s directly.

In this case we only have 4 matches where Buyer/Supplier Age + District are identical.

// 2. 

        var compositeJoin = from s in suppliers
                            join b in buyers on new {s.District, s.Age} equals new {b.District, b.Age}
                            select new
                            {
                              Supplier = s,
                              BuyerName = b.Name
                            };
                          
         foreach (var item in compositeJoin)
            {
                Console.WriteLine($"District: {item.Supplier.District}, Age: {item.Supplier.Age}");
                Console.WriteLine($"    Buyer Name: {item.BuyerName}");
          }              
                          
          District: Point Break, Age: 25
              Buyer Name: John
          District: West Side, Age: 35
              Buyer Name: Paul
          District: South, Age: 30
              Buyer Name: Joan
          District: Central, Age: 28
              Buyer Name: Mary               

Group(ed) Joins

Group(ed) Joins fuse two collections by a common key, with a grouping of the secondary collection on the key.

This presents an improvement on standard inner joins where the common properties are (repeatedly) present in every unified item.

The syntax for the join is identical to that of the inner join with the additional of into...GroupName.

The grouping variable inside of the Anonymous Object new is itself of type IEnumerbale - and thus a nested loop is required to display the elements within.

The names of the items referred to within Console.WriteLine help make clear which of the data entities are being iterated through.

// 3.

        var groupJoin = from s in suppliers
                        join by in buyers on s.District equals b.Distrit into buyersGroup
                        select new 
                        {
                          s.Name,
                          s.District,
                          Buyers = buyersGroup
                        }
                        
          foreach(var supplier in groupJoin)
          {
            Console.WriteLine($"Supplier: {supplier.Name}, District: {supplier.District}");
            
            foreach(var buyer in supplier.Buyers)
            {
              Console.WriteLine($"    Buyer Name: {buyer.Name}, Buyer Age: {buyer.Age}");
            }
          }
          
        Supplier: Andy, District: Point Break
            Buyer Name: John, Buyer Age: 25
        Supplier: Tod, District: West Side
            Buyer Name: Paul, Buyer Age: 35
            Buyer Name: Angie, Buyer Age: 30
        Supplier: Ian, District: West Side
            Buyer Name: Paul, Buyer Age: 35
            Buyer Name: Angie, Buyer Age: 30
        Supplier: Anna, District: South
            Buyer Name: Joan, Buyer Age: 30
            Buyer Name: George, Buyer Age: 55
        Supplier: Jerry, District: Central
            Buyer Name: Flo, Buyer Age: 45
            Buyer Name: Mary, Buyer Age: 28
        Supplier: Paul, District: Central
            Buyer Name: Flo, Buyer Age: 45
            Buyer Name: Mary, Buyer Age: 28

Inner Group(ed) Joins

As we saw, the grouping variable inside the Anonymous Object is an IEnumerbale - and thus it can also be queried with LINQ (from...in, orderby etc.) - offering full flexibility for sorting, conditionals etc.

Taking the example from above.

// 4.

        var groupInnerJoin = from s in suppliers
                        join by in buyers on s.District equals b.Distrit into buyersGroup
                        select new 
                        {
                          s.Name,
                          s.District,
                          Buyers = from b in buyersGroup
                                   orderby b.Age
                                   select b
                        }
                        
          foreach(var supplier in groupInnerJoin)
          {
            Console.WriteLine($"Supplier: {supplier.Name}, District: {supplier.District}");
            
            foreach(var buyer in supplier.Buyers)
            {
              Console.WriteLine($"    Buyer Name: {buyer.Name}, Buyer Age: {buyer.Age}");
            }
          }
          
        Supplier: Andy, District: Point Break
            Buyer Name: John, Buyer Age: 25
        Supplier: Tod, District: West Side
            Buyer Name: Angie, Buyer Age: 30
            Buyer Name: Paul, Buyer Age: 35
        Supplier: Ian, District: West Side
            Buyer Name: Angie, Buyer Age: 30
            Buyer Name: Paul, Buyer Age: 35
        Supplier: Anna, District: South
            Buyer Name: Joan, Buyer Age: 30
            Buyer Name: George, Buyer Age: 55
        Supplier: Jerry, District: Central
            Buyer Name: Mary, Buyer Age: 28
            Buyer Name: Flo, Buyer Age: 45
        Supplier: Paul, District: Central
            Buyer Name: Mary, Buyer Age: 28
            Buyer Name: Flo, Buyer Age: 45

The data is now ordered (in ascending order) as expected.

(Left) Outer Joins

It begs the question - what happens in cases where the key does not match across the two collections? In these circumstances to is (often) desirable to return the items of the joined collection regardless of a match or not.

A non-matching Supplier was added to our Suppliers collections - no Buyers match this key.

The query looks the similar to the Group Join or Inner Group Join, building a result with select new but the DefaultIfEmpty() method is called, and a Buyer Object is created to define the properties within the given unmatched item.

// 5.

      var leftOuterJoin = from s in suppliers
                          join b in buyers on s.District equals b.District into buyersGroup
                          select new
                          {
                              s.Name,
                              S.District,
                              Buyers = buyersGroup.DefaultIfEmpty(
                                new Buyer()
                                {
                                    Name = "NA",
                                    District = "No District Matches"
                                }
                              )
                          };
                          
      foreach(var supplier in leftOuterJoin)
        {
            Console.WriteLine($"{supplier.District}, {supplier.name}");
            
            foreach(var buyer in supplier.Buyers)
            {
                Console.WriteLine($"   {buyer.District}, {buyer.Name}");
            }
      }
                          
      Point Break, Andy
        Point Break, John
      West Side, Tod
        West Side, Paul
        West Side, Angie
        West Side, Ian
        West Side, Paul
        West Side, Angie
      South, Anna
        South, Joan
        South, George
      Central, Jerry
        Central, Flo
        Central, Mary
        Central, Paul
        Central, Flo
        Central, Mary
      Forest, MrP
        No District Matches This Buyer, NA