Adina (adina_atl) wrote in sqlserver,
Adina
adina_atl
sqlserver

Query fun and games

I've found in general for SQL that there is more than one way to solve (almost) any problem. I've been playing around with query building today and decided to see how many ways I could solve a problem that recurs fairly frequently in my work, flattening subrecords into a single row.

This is my current standard solution, using the PIVOT function. It's quite fast, but limits you to a specific number of subrecords--it can be a high number, but you still have to decide on a maximum.
WITH cte AS (SELECT Person.contactid AS 'ID' , Person.FullName AS 'Name'
, 'Activity' = Activity.a422_rel_activityvalueidname
, 'Row' = ROW_NUMBER() OVER (PARTITION BY Person.contactid, Person.FullName ORDER BY Activity.a422_rel_activityvalueidname)
FROM Contact AS Person
INNER JOIN Task AS Activity ON Person.contactid = Activity.regardingobjectid)
SELECT ID, Name
, 'Activity1' = [1], 'Activity2' = [2], 'Activity3' = [3], 'Activity4' = [4], 'Activity5' = [5]
FROM cte
PIVOT (MAX(cte.Activity) FOR cte.[Row] IN ([1], [2], [3], [4], [5])) AS pvt


This is a new solution I found in surfing some SQL Server blogs, using FOR XML PATH to create a CSV list of values. It will include an indefinite number of subrecords, but only includes one field from the subrecords. It's significantly slower than the first example by at least an order of ten.
SELECT DISTINCT p.contactid AS 'ID' , p.FullName AS 'Name'
, SUBSTRING((SELECT ', ' + Activity.a422_rel_activityvalueidname
FROM task AS Activity
WHERE Activity.regardingobjectid = p.contactid
FOR XML PATH('')), 2, 4000) AS 'Activities'
FROM Contact AS p
INNER JOIN Task AS t ON p.contactid = t.regardingobjectid
ORDER BY p.contactid


This ugly looking creature is what I used to use before PIVOT came along, using many, many multiple self-joins. I'm pretty sure I had a slightly more elegant (and faster!) version of this, but it's been a long time since I've had to create one of these things (fortunately). The performance is...not as bad as you might expect.
SELECT 'ID' = p.contactid, 'Name' = p.fullname
, 'Activity1' = a1.a422_rel_activityvalueidname
, 'ActivityDate1' = a1.actualend
, 'Activity2' = a2.a422_rel_activityvalueidname
, 'ActivityDate2' = a2.actualend
, 'Activity3' = a3.a422_rel_activityvalueidname
, 'ActivityDate3' = a3.actualend
, 'Activity4' = a4.a422_rel_activityvalueidname
, 'ActivityDate4' = a4.actualend
, 'Activity5' = a5.a422_rel_activityvalueidname
, 'ActivityDate5' = a5.actualend
FROM Contact AS p
INNER JOIN Task AS a1
ON p.contactid = a1.regardingobjectid
LEFT JOIN Task AS not1
ON p.contactid = not1.regardingobjectid
AND not1.activityid < a1.activityid
LEFT JOIN Task AS a2
ON p.contactid = a2.regardingobjectid
AND a2.activityid > a1.activityid
LEFT JOIN Task AS not2
ON p.contactid = not2.regardingobjectid
AND not2.activityid > a1.activityid
AND not2.activityid < a2.activityid
LEFT JOIN Task AS a3
ON p.contactid = a3.regardingobjectid
AND a3.activityid > a2.activityid
LEFT JOIN Task AS not3
ON p.contactid = not3.regardingobjectid
AND not3.activityid > a2.activityid
AND not3.activityid < a3.activityid
LEFT JOIN Task AS a4
ON p.contactid = a4.regardingobjectid
AND a4.activityid > a3.activityid
LEFT JOIN Task AS not4
ON p.contactid = not4.regardingobjectid
AND not4.activityid > a3.activityid
AND not4.activityid < a4.activityid
LEFT JOIN Task AS a5
ON p.contactid = a5.regardingobjectid
AND a5.activityid > a4.activityid
LEFT JOIN Task AS not5
ON p.contactid = not5.regardingobjectid
AND not5.activityid > a4.activityid
AND not5.activityid < a5.activityid
WHERE not1.regardingobjectid Is Null
AND not2.regardingobjectid Is Null
AND not3.regardingobjectid Is Null
AND not4.regardingobjectid Is Null
AND not5.regardingobjectid Is Null


Using a recursive CTE almost works, except that for each main record it gives a row with one subrecord, another row with two subrecords, a row with three subrecords, and so on for as many subrecords as are available for that main record. It seems like there has to be a way around that, so if you have any ideas, let me know. Performance is not good, not horrible.
WITH cte AS (SELECT a1.regardingobjectid, a1.activityid
, 'Activities' = CONVERT(nvarchar(1000), a1.createdon, 113)
FROM Task AS a1
INNER JOIN Contact AS p
ON a1.regardingobjectid = p.contactid
LEFT JOIN Task AS not1
ON a1.regardingobjectid = not1.regardingobjectid
AND a1.activityid > not1.activityid
WHERE not1.activityid Is Null
UNION ALL
SELECT cte.regardingobjectid, a1.activityid
, 'Activities' = CONVERT(nvarchar(1000), (cte.Activities + N', ' + CONVERT(nvarchar, a1.createdon, 113)))
FROM cte
INNER JOIN Task AS a1
ON cte.regardingobjectid = a1.regardingobjectid
AND cte.activityid < a1.activityid
WHERE NOT EXISTS (SELECT *
FROM Task AS not1
WHERE cte.regardingobjectid = not1.regardingobjectid
AND not1.activityid > cte.activityid
AND not1.activityid < a1.activityid)
)
SELECT 'ID' = p.contactid, 'Name' = p.fullname
, cte.Activities
FROM cte
INNER JOIN Contact AS p
ON cte.regardingobjectid = p.contactid
ORDER BY p.fullname


Creating a custom aggregate function in CLR is another solution, but playing with that will have to be another day.
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your IP address will be recorded 

  • 0 comments