Eliminating Observations Between Two Tables Based on a Formula
In this article, we will explore how to eliminate observations between two tables based on a specific formula. We will use SAS programming as an example, but the concepts can be applied to other languages and databases.
Background
The problem at hand involves two tables: table1 and table2. Each table contains information about a set of observations with variables such as name, date, time, and price. The goal is to merge these two tables based on the name and date columns, but exclude any observations that have prices outside a certain range.
Problem Statement
Given table1 and table2, create a new table want that contains only the observations from both tables where:
- The names match
- The dates match
- The price is within the range of 0.962 times the price from
table1to 1.0398 times the price fromtable1
Solution
The solution involves several steps:
Step 1: Sort Both Tables by Name and Date
First, we need to sort both tables in ascending order by name and date.
proc sort data=table1; by name date time; run;
proc sort data=table2; by name date; run;
Step 2: Merge the Tables on Name and Date
Next, we create a new table table3 that contains all observations from both tables where the names match and the dates match.
proc sql;
create table table3 as
select * from table1, table2
where table1.name=table2.name and table1.date=table2.date;
quit;
Step 3: Calculate Yesterday’s Price
To determine if an observation is within the allowed price range, we need to calculate yesterday’s price for each name.
data table2_new;
set table2;
by name;
/* save price of yesterday */
lag_Price = lag(Price);
if first.name then lag_Price = .;
run;
Step 4: Identify Observations Outside the Allowed Price Range
We then identify observations that are outside the allowed price range and exclude them from our final result.
data to_delete(keep = name date);
merge table3 (in=in1)
table2_new (in=in2);
by name date;
retain start_price last_price;
if in1 and in2; /* deal with obs on both tables only */
if first.date then start_price = intradayprice;
if last.date then last_price = intradayprice;
if last.date then do;
min_price = 0.962 * lag_Price;
max_price = 1.0398 * lag_Price;
if not (min_price le start_price le max_price) and not (min_price le last_price le max_price)
then output; /* exclude observations outside the allowed price range */
end;
run;
Step 5: Create the Final Table
Finally, we merge table3 with to_delete to create our final table want.
data want;
merge table3 /* table2 */
to_delete (in=indel);
by name date;
if not indel;
run;
Result
The resulting table want will contain only the observations from both tables where:
- The names match
- The dates match
- The price is within the allowed range of 0.962 times to 1.0398 times.
Here’s an example of what the final result might look like:
| name | date | time | intraday_price | +——+————+———+—————–+ | B | 7-May-08 | 11:32:41 | 3.1 | | B | 7-May-08 | 12:32:41 | 1 | | B | 7-May-08 | 13:32:41 | 4 | | B | 7-May-08 | 14:32:41 | 2.9 | | A | 8-May-08 | 11:32:41 | 3.95 | | A | 8-May-08 | 12:32:41 | 3 | | A | 8-May-08 | 13:32:41 | 6 | | A | 8-May-08 | 14:32:41 | 4.01 |
This final result contains only the observations where the prices are within the allowed range, while excluding any observations outside this range.
Conclusion
In conclusion, we have explored how to eliminate observations between two tables based on a specific formula in SAS programming. We walked through each step of the process, from sorting both tables by name and date to creating the final result table want.
Last modified on 2024-01-14