Introduction to Structured Query Language

Version 3.31

This page is a tutorial of the Structured Query Language (alsoknown as SQL) and is a pioneering effort on the World Wide Web,as this is the first comprehensive SQL tutorial available on the Internet.SQL allows users to access data in relational database management systems,such as Oracle, Sybase, Informix, Microsoft SQL Server, Access, and others,by allowing users to describe the data the user wishes to see. SQL alsoallows users to define the data in a database, and manipulate that data.This page will describe how to use SQL, and give examples. The SQL usedin this document is "ANSI", or standard SQL, and no SQL featuresof specific database management systems will be discussed until the "NonstandardSQL" section. It is recommended that you print this page, so thatyou can easily refer back to previous examples.


Table of Contents


Basicsof the SELECT Statement

In a relational database, data is stored in tables. An example tablewould relate Social Security Number, Name, and Address:

EmployeeAddressTable

SSNFirstNameLastName AddressCityState
512687458JoeSmith 83 First StreetHowardOhio
758420012MaryScott 842 Vine Ave.LosantivilleOhio
102254896SamJones 33 Elm St.ParisNew York
876512563SarahAckerman 440 U.S. 110UptonMichigan

Now, let's say you want to see the address of each employee. Use theSELECT statement, like so:

SELECT FirstName, LastName, Address, City, State
FROM EmployeeAddressTable;

The following is the results of your query of the database:

First NameLast Name AddressCityState
JoeSmith83 First Street HowardOhio
MaryScott842 Vine Ave. LosantivilleOhio
SamJones33 Elm St. ParisNew York
SarahAckerman440 U.S. 110 UptonMichigan

To explain what you just did, you asked for the all of data in the EmployeeAddressTable,and specifically, you asked for the columns called FirstName, LastName,Address, City, and State. Note that column names and table names do nothave spaces...they must be typed as one word; and that the statement endswith a semicolon (;). The general form for a SELECT statement, retrievingall of the rows in the table is:

SELECT ColumnName, ColumnName, ...
FROM TableName;

To get all columns of a table without typing all column names, use:

SELECT * FROM TableName;

Each database management system (DBMS) and database software has differentmethods for logging in to the database and entering SQL commands; see thelocal computer "guru" to help you get onto the system, so thatyou can use SQL.


Conditional Selection

To further discuss the SELECT statement, let's look at a new exampletable (for hypothetical purposes only):

EmployeeStatisticsTable

EmployeeIDNoSalary BenefitsPosition
0107500015000 Manager
1056500015000 Manager
1526000015000 Manager
2156000012500 Manager
2445000012000 Staff
3004500010000 Staff
3354000010000 Staff
400320007500 Entry-Level
441280007500 Entry-Level


Relational Operators

There are six Relational Operators in SQL, and after introducing them,we'll see how they're used:

=Equal
<> or != (see manual)Not Equal
<Less Than
>Greater Than
<=Less Than or Equal To
>=Greater Than or Equal To

The WHERE clause is used to specify that only certain rows ofthe table are displayed, based on the criteria described in that WHEREclause. It is most easily understood by looking at a couple of examples.

If you wanted to see the EMPLOYEEIDNO's of those making at or over $50,000,use the following:

SELECT EMPLOYEEIDNO
FROM EMPLOYEESTATISTICSTABLE
WHERE SALARY >= 50000;

Notice that the >= (greater than or equal to) sign is used, as wewanted to see those who made greater than $50,000, or equal to $50,000,listed together. This displays:

EMPLOYEEIDNO
------------
010
105
152
215
244

The WHERE description, SALARY >= 50000, is known as a condition.The same can be done for text columns:

SELECT EMPLOYEEIDNO
FROM EMPLOYEESTATISTICSTABLE
WHERE POSITION = 'Manager';

This displays the ID Numbers of all Managers. Generally, with text columns,stick to equal to or not equal to, and make sure that any text that appearsin the statement is surrounded by single quotes (').


More ComplexConditions: Compound Conditions

The AND operator joins two or more conditions, and displays arow only if that row's data satisfies ALL conditions listed (i.e.all conditions hold true). For example, to display all staff making over$40,000, use:

SELECT EMPLOYEEIDNO
FROM EMPLOYEESTATISTICSTABLE
WHERE SALARY > 40000 AND POSITION = 'Staff';

The OR operator joins two or more conditions, but returns a rowif ANY of the conditions listed hold true. To see all those whomake less than $40,000 or have less than $10,000 in benefits, listed together,use the following query:

SELECT EMPLOYEEIDNO
FROM EMPLOYEESTATISTICSTABLE
WHERE SALARY < 40000 OR BENEFITS < 10000;

AND & OR can be combined, for example:

SELECT EMPLOYEEIDNO
FROM EMPLOYEESTATISTICSTABLE
WHERE POSITION = 'Manager' AND SALARY > 60000 OR BENEFITS > 12000;

First, SQL finds the rows where the salary is greater than $60,000 andthe position column is equal to Manager, then taking this new list of rows,SQL then sees if any of these rows satisfies the previous AND conditionor the condition that the Benefits column is greater then $12,000. Subsequently,SQL only displays this second new list of rows, keeping in mind that anyonewith Benefits over $12,000 will be included as the OR operator includesa row if either resulting condition is True. Also note that the AND operationis done first.

To generalize this process, SQL performs the AND operation(s) todetermine the rows where the AND operation(s) hold true (remember: allof the conditions are true), then these results are used to compare withthe OR conditions, and only display those remaining rows where theconditions joined by the OR operator hold true.

To perform OR's before AND's, like if you wanted to see a list of employeesmaking a large salary (>$50,000) or have a large benefit package (>$10,000),and that happen to be a manager, use parentheses:

SELECT EMPLOYEEIDNO
FROM EMPLOYEESTATISTICSTABLE
WHERE POSITION = 'Manager' AND (SALARY > 50000 OR BENEFIT >10000);


IN & BETWEEN

An easier method of using compound conditions uses IN or BETWEEN.For example, if you wanted to list all managers and staff:

SELECT EMPLOYEEIDNO
FROM EMPLOYEESTATISTICSTABLE
WHERE POSITION IN ('Manager', 'Staff');

or to list those making greater than or equal to $30,000, but less thanor equal to $50,000, use:

SELECT EMPLOYEEIDNO
FROM EMPLOYEESTATISTICSTABLE
WHERE SALARY BETWEEN 30000 AND 50000;

To list everyone not in this range, try:

SELECT EMPLOYEEIDNO
FROM EMPLOYEESTATISTICSTABLE
WHERE SALARY NOT BETWEEN 30000 AND 50000;

Similarly, NOT IN lists all rows excluded from the IN list.


Using LIKE

Look at the EmployeeStatisticsTable, and say you wanted to see all peoplewhose last names started with "L"; try:

SELECT EMPLOYEEIDNO
FROM EMPLOYEEADDRESSTABLE
WHERE LASTNAME LIKE 'L%';

The percent sign (%) is used to represent any possible character (number,letter, or punctuation) or set of characters that might appear after the"L". To find those people with LastName's ending in "L",use '%L', or if you wanted the "L" in the middle of the word,try '%L%'. The '%' can be used for any characters, in that relative positionto the given characters. NOT LIKE displays rows not fitting the givendescription. Other possiblities of using LIKE, or any of these discussedconditionals, are available, though it depends on what DBMS you are using;as usual, consult a manual or your system manager or administrator forthe available features on your system, or just to make sure that what youare trying to do is available and allowed. This disclaimer holds for thefeatures of SQL that will be discussed below. This section is just to giveyou an idea of the possibilities of queries that can be written in SQL.


Joins

In this section, we will only discuss inner joins, and equijoins,as in general, they are the most useful. For more information, try theSQL links at the bottom of the page.

Good database design suggests that each table lists data only abouta single entity, and detailed information can be obtained in a relationaldatabase, by using additional tables, and by using a join.

First, take a look at these example tables:

AntiqueOwners

OwnerIDOwnerLastNameOwnerFirstName
01JonesBill
02SmithBob
15LawsonPatricia
21AkinsJane
50FowlerSam


Orders

OwnerIDItemDesired
02Table
02Desk
21Chair
15Mirror


Antiques

SellerIDBuyerIDItem
0150Bed
0215Table
1502Chair
2150Mirror
5001Desk
0121Cabinet
0221Coffee Table
1550Chair
0115Jewelry Box
0221Pottery
2102Bookcase
5001Plant Stand


Keys

First, let's discuss the concept of keys. A primary keyis a column or set of columns that uniquely identifies the rest of thedata in any given row. For example, in the AntiqueOwners table, the OwnerIDcolumn uniquely identifies that row. This means two things: no two rowscan have the same OwnerID, and, even if two owners have the same firstand last names, the OwnerID column ensures that the two owners willnot be confused with each other, because the unique OwnerID column willbe used throughout the database to track the owners, rather than the names.

A foreign key is a column in a table where that column is a primarykey of another table, which means that any data in a foreign key columnmust have corresponding data in the other table where that column is theprimary key. In DBMS-speak, this correspondence is known as referentialintegrity. For example, in the Antiques table, both the BuyerID andSellerID are foreign keys to the primary key of the AntiqueOwners table(OwnerID; for purposes of argument, one has to be an Antique Owner beforeone can buy or sell any items), as, in both tables, the ID rows are usedto identify the owners or buyers and sellers, and that the OwnerID is theprimary key of the AntiqueOwners table. In other words, all of this "ID"data is used to refer to the owners, buyers, or sellers of antiques, themselves,without having to use the actual names.


Performing a Join

The purpose of these keys is so that data can be related acrosstables, without having to repeat data in every table--this is the powerof relational databases. For example, you can find the names of those whobought a chair without having to list the full name of the buyer in theAntiques table...you can get the name by relating those who bought a chairwith the names in the AntiqueOwners table through the use of the OwnerID,which relates the data in the two tables. To find the names of thosewho bought a chair, use the following query:

SELECT OWNERLASTNAME, OWNERFIRSTNAME
FROM ANTIQUEOWNERS, ANTIQUES
WHERE BUYERID = OWNERID AND ITEM = 'Chair';

Note the following about this query...notice that both tables involvedin the relation are listed in the FROM clause of the statement. In theWHERE clause, first notice that the ITEM = 'Chair' part restricts the listingto those who have bought (and in this example, thereby owns) a chair. Secondly,notice how the ID columns are related from one table to the next by useof the BUYERID = OWNERID clause. Only where ID's match across tables andthe item purchased is a chair (because of the AND), will the names fromthe AntiqueOwners table be listed. Because the joining condition used anequal sign, this join is called an equijoin. The result of thisquery is two names: Smith, Bob & Fowler, Sam.

Dot notation refers to prefixing the table names to column names,to avoid ambiguity, as such:

SELECT ANTIQUEOWNERS.OWNERLASTNAME, ANTIQUEOWNERS.OWNERFIRSTNAME
FROM ANTIQUEOWNERS, ANTIQUES
WHERE ANTIQUES.BUYERID = ANTIQUEOWNERS.OWNERID AND ANTIQUES.ITEM = 'Chair';

As the column names are different in each table, however, this wasn'tnecessary.


DISTINCTand Eliminating Duplicates

Let's say that you want to list the ID and names of only thosepeople who have sold an antique. Obviously, you want a list where eachseller is only listed once--you don't want to know how many antiques aperson sold, just the fact that this person sold one (for counts, see theAggregate Function section below). This means that you will need to tellSQL to eliminate duplicate sales rows, and just list each person only once.To do this, use the DISTINCT keyword.

First, we will need an equijoin to the AntiqueOwners table to get thedetail data of the person's LastName and FirstName. However, keep in mindthat since the SellerID column in the Antiques table is a foreign key tothe AntiqueOwners table, a seller will only be listed if there is a rowin the AntiqueOwners table listing the ID and names. We also want to eliminatemultiple occurences of the SellerID in our listing, so we use DISTINCTon the column where the repeats may occur.

To throw in one more twist, we will also want the list alphabetizedby LastName, then by FirstName (on a LastName tie), then by OwnerID (ona LastName and FirstName tie). Thus, we will use the ORDER BY clause:

SELECT DISTINCT SELLERID, OWNERLASTNAME, OWNERFIRSTNAME
FROM ANTIQUES, ANTIQUEOWNERS
WHERE SELLERID = OWNERID
ORDER BY OWNERLASTNAME, OWNERFIRSTNAME, OWNERID;

In this example, since everyone has sold an item, we will get a listingof all of the owners, in alphabetical order by last name. For future reference(and in case anyone asks), this type of join is considered to be in thecategory of inner joins.


Aliases &In/Subqueries

In this section, we will talk about Aliases, In and theuse of subqueries, and how these can be used in a 3-table example. First,look at this query which prints the last name of those owners who haveplaced an order and what the order is, only listing those orders whichcan be filled (that is, there is a buyer who owns that ordered item):

SELECT OWN.OWNERLASTNAME Last Name, ORD.ITEMDESIRED Item Ordered
FROM ORDERS ORD, ANTIQUEOWNERS OWN
WHERE ORD.OWNERID = OWN.OWNERID
AND ORD.ITEMDESIRED IN

This gives:

Last Name Item Ordered
--------- ------------
Smith     Table
Smith     Desk
Akins     Chair
Lawson    Mirror

There are several things to note about this query:

  1. First, the "Last Name" and "Item Ordered" in theSelect lines gives the headers on the report.
  2. The OWN & ORD are aliases; these are new names for the two tableslisted in the FROM clause that are used as prefixes for all dot notationsof column names in the query (see above). This eliminates ambiguity, especiallyin the equijoin WHERE clause where both tables have the column named OwnerID,and the dot notation tells SQL that we are talking about two differentOwnerID's from the two different tables.
  3. Note that the Orders table is listed first in the FROM clause; thismakes sure listing is done off of that table, and the AntiqueOwners tableis only used for the detail information (Last Name).
  4. Most importantly, the AND in the WHERE clause forces the In Subqueryto be invoked ("= ANY" or "= SOME" are two equivalentuses of IN). What this does is, the subquery is performed, returning allof the Items owned from the Antiques table, as there is no WHERE clause.Then, for a row from the Orders table to be listed, the ItemDesired mustbe in that returned list of Items owned from the Antiques table, thus listingan item only if the order can be filled from another owner. You can thinkof it this way: the subquery returns a set of Items from which eachItemDesired in the Orders table is compared; the In condition is true onlyif the ItemDesired is in that returned set from the Antiques table.
  5. Also notice, that in this case, that there happened to be an antiqueavailable for each one desired...obviously, that won't always be the case.In addition, notice that when the IN, "= ANY", or "= SOME"is used, that these keywords refer to any possible row matches, not columnmatches...that is, you cannot put multiple columns in the subquery Selectclause, in an attempt to match the column in the outer Where clause toone of multiple possible column values in the subquery; only one columncan be listed in the subquery, and the possible match comes from multiplerow values in that one column, not vice-versa.

Whew! That's enough on the topic of complex SELECT queries for now.Now on to other SQL statements.


Miscellaneous SQL Statements

Aggregate Functions

I will discuss five important aggregate functions: SUM, AVG,MAX, MIN, and COUNT. They are called aggregate functions because they summarizethe results of a query, rather than listing all of the rows.

Looking at the tables at the top of the document, let's look at threeexamples:

SELECT SUM(SALARY), AVG(SALARY)
FROM EMPLOYEESTATISTICSTABLE;

This query shows the total of all salaries in the table, and the averagesalary of all of the entries in the table.

SELECT MIN(BENEFITS)
FROM EMPLOYEESTATISTICSTABLE
WHERE POSITION = 'Manager';

This query gives the smallest figure of the Benefits column, of theemployees who are Managers, which is 12500.

SELECT COUNT(*)
FROM EMPLOYEESTATISTICSTABLE
WHERE POSITION = 'Staff';

This query tells you how many employees have Staff status (3).


Views

In SQL, you might (check your DBA) have access to create views for yourself.What a view does is to allow you to assign the results of a query to anew, personal table, that you can use in other queries, where this newtable is given the view name in your FROM clause. When you access a view,the query that is defined in your view creation statement is performed(generally), and the results of that query look just like another tablein the query that you wrote invoking the view. For example, to create aview:

CREATE VIEW ANTVIEW AS SELECT ITEMDESIRED FROM ORDERS;

Now, write a query using this view as a table, where the table is justa listing of all Items Desired from the Orders table:

SELECT SELLERID
FROM ANTIQUES, ANTVIEW
WHERE ITEMDESIRED = ITEM;

This query shows all SellerID's from the Antiques table where the Itemin that table happens to appear in the Antview view, which is just allof the Items Desired in the Orders table. The listing is generated by goingthrough the Antique Items one-by-one until there's a match with the Antviewview. Views can be used to restrict database access, as well as, in thiscase, simplify a complex query.


Creating New Tables

All tables within a database must be created at some point in time...let'ssee how we would create the Orders table:

CREATE TABLE ORDERS
(OWNERID INTEGER NOT NULL,
ITEMDESIRED CHAR(40) NOT NULL);

This statement gives the table name and tells the DBMS about each columnin the table. Please note that this statement uses genericdata types, and that the data types might be different, depending on whatDBMS you are using. As usual, check local listings. Some common genericdata types are:

One other note, the NOT NULL means that the column must have avalue in each row. If NULL was used, that column may be left empty in agiven row.


Altering Tables

Let's add a column to the Antiques table to allow the entry of the priceof a given Item:

ALTER TABLE ANTIQUES ADD (PRICE DECIMAL(8,2) NULL);

The data for this new column can be updated or inserted as shown later.


Adding Data

To insert rows into a table, do the following:

INSERT INTO ANTIQUES VALUES (21, 01, 'Ottoman', 200.00);

This inserts the data into the table, as a new row, column-by-column,in the pre-defined order. Instead, let's change the order and leave Priceblank:

INSERT INTO ANTIQUES (BUYERID, SELLERID, ITEM)
VALUES (01, 21, 'Ottoman');


Deleting Data

Let's delete this new row back out of the database:

DELETE FROM ANTIQUES
WHERE ITEM = 'Ottoman';

But if there is another row that contains 'Ottoman', that row will bedeleted also. Let's delete all rows (one, in this case) that contain thespecific data we added before:

DELETE FROM ANTIQUES
WHERE ITEM = 'Ottoman' AND BUYERID = 01 AND SELLERID = 21;


Updating Data

Let's update a Price into a row that doesn't have a price listed yet:

UPDATE ANTIQUES SET PRICE = 500.00 WHERE ITEM = 'Chair';

This sets all Chair's Prices to 500.00. As shown above, more WHERE conditionals,using AND, must be used to limit the updating to more specific rows. Also,additional columns may be set by separating equal statements with commas.


Miscellaneous Topics

Indexes

Indexes allow a DBMS to access data quicker (please note: thisfeature is nonstandard/not available on all systems). The system createsthis internal data structure (the index) which causes selection of rows,when the selection is based on indexed columns, to occur faster. This indextells the DBMS where a certain row is in the table given an indexed-columnvalue, much like a book index tells you what page a given word appears.Let's create an index for the OwnerID in the AntiqueOwners column:

CREATE INDEX OID_IDX ON ANTIQUEOWNERS (OWNERID);

Now on the names:

CREATE INDEX NAME_IDX ON ANTIQUEOWNERS (OWNERLASTNAME, OWNERFIRSTNAME);

To get rid of an index, drop it:

DROP INDEX OID_IDX;

By the way, you can also "drop" a table, as well (careful!--thatmeans that your table is deleted). In the second example, the index iskept on the two columns, aggregated together--strange behavior might occurin this situation...check the manual before performing such an operation.

Some DBMS's do not enforce primary keys; in other words, the uniquenessof a column is not enforced automatically. What that means is, if, forexample, I tried to insert another row into the AntiqueOwners table withan OwnerID of 02, some systems will allow me to do that, even though, wedo not, as that column is supposed to be unique to that table (every rowvalue is supposed to be different). One way to get around that is to createa unique index on the column that we want to be a primary key, to forcethe system to enforce prohibition of duplicates:

CREATE UNIQUE INDEX OID_IDX ON ANTIQUEOWNERS (OWNERID);


GROUPBY & HAVING

One special use of GROUP BY is to associate an aggregate function(especially COUNT; counting the number of rows in each group) with groupsof rows. First, assume that the Antiques table has the Price column, andeach row has a value for that column. We want to see the price of the mostexpensive item bought by each owner. We have to tell SQL to groupeach owner's purchases, and tell us the maximum purchase price:

SELECT BUYERID, MAX(PRICE)
FROM ANTIQUES
GROUP BY BUYERID;

Now, say we only want to see the maximum purchase price if the purchaseis over $1000, so we use the HAVING clause:

SELECT BUYERID, MAX(PRICE)
FROM ANTIQUES
GROUP BY BUYERID
HAVING PRICE > 1000;


MoreSubqueries

Another common usage of subqueries involves the use of operators toallow a Where condition to include the Select output of a subquery. First,list the buyers who purchased an expensive item (the Price of the itemis $100 greater than the average price of all items purchased):

SELECT OWNERID
FROM ANTIQUES
WHERE PRICE >

The subquery calculates the average Price, plus $100, and using thatfigure, an OwnerID is printed for every item costing over that figure.One could use DISTINCT OWNERID, to eliminate duplicates.

List the Last Names of those in the AntiqueOwners table, ONLY if theyhave bought an item:

SELECT OWNERLASTNAME
FROM ANTIQUEOWNERS
WHERE OWNERID =

The subquery returns a list of buyers, and the Last Name is printedfor an Antique Owner if and only if the Owner's ID appears in the subquerylist (sometimes called a candidate list).

For an Update example, we know that the gentleman who bought the bookcasehas the wrong First Name in the database...it should be John:

UPDATE ANTIQUEOWNERS
SET OWNERFIRSTNAME = 'John'
WHERE OWNERID =

First, the subquery finds the BuyerID for the person(s) who bought theBookcase, then the outer query updates his First Name.

Remember this rule about subqueries: when you have a subqueryas part of a WHERE condition, the Select clause in the subquery must havecolumns that match in number and type to those in the Where clause of theouter query. In other words, if you have "WHERE ColumnName = (SELECT...);",the Select must have only one column in it, to match the ColumnName inthe outer Where clause, and they must match in type (both beingintegers, both being character strings, etc.).


EXISTS& ALL

EXISTS uses a subquery as a condition, where the condition is True ifthe subquery returns any rows, and False if the subquery does not returnany rows; this is a nonintuitive feature with few unique uses. However,if a prospective customer wanted to see the list of Owners only if theshop dealt in Chairs, try:

SELECT OWNERFIRSTNAME, OWNERLASTNAME
FROM ANTIQUEOWNERS
WHERE EXISTS

If there are any Chairs in the Antiques column, the subquery would returna row or rows, making the EXISTS clause true, causing SQL to list the AntiqueOwners. If there had been no Chairs, no rows would have been returned bythe outside query.

ALL is another unusual feature, as ALL queries can usually be done withdifferent, and possibly simpler methods; let's take a look at an examplequery:

SELECT BUYERID, ITEM
FROM ANTIQUES
WHERE PRICE >= ALL

This will return the largest priced item (or more than one item if thereis a tie), and its buyer. The subquery returns a list of all Prices inthe Antiques table, and the outer query goes through each row of the Antiquestable, and if its Price is greater than or equal to every (or ALL) Pricesin the list, it is listed, giving the highest priced Item. The reason ">="must be used is that the highest priced item will be equal to the highestprice on the list, because this Item is in the Price list.


UNION& Outer Joins

There are occasions where you might want to see the results of multiplequeries together, combining their output; use UNION. To merge the outputof the following two queries, displaying the ID's of all Buyers, plus allthose who have an Order placed:

SELECT BUYERID
FROM ANTIQUEOWNERS
UNION
SELECT OWNERID
FROM ORDERS;

Notice that SQL requires that the Select list (of columns) must match,column-by-column, in data type. In this case BuyerID and OwnerID are ofthe same data type (integer). Also notice that SQL does automatic duplicateelimination when using UNION (as if they were two "sets"); insingle queries, you have to use DISTINCT.

The outer join is used when a join query is "united"with the rows not included in the join, and are especially useful if constanttext "flags" are included. First, look at the query:

SELECT OWNERID, 'is in both Orders & Antiques'
FROM ORDERS, ANTIQUES
WHERE OWNERID = BUYERID
UNION
SELECT BUYERID, 'is in Antiques only'
FROM ANTIQUES
WHERE BUYERID NOT IN

The first query does a join to list any owners who are in both tables,and putting a tag line after the ID repeating the quote. The UNION mergesthis list with the next list. The second list is generated by first listingthose ID's not in the Orders table, thus generating a list of ID's excludedfrom the join query. Then, each row in the Antiques table is scanned, andif the BuyerID is not in this exclusion list, it is listed with its quotedtag. There might be an easier way to make this list, but it's difficultto generate the informational quoted strings of text.

This concept is useful in situations where a primary key is relatedto a foreign key, but the foreign key value for some primary keys is NULL.For example, in one table, the primary key is a salesperson, and in anothertable is customers, with their salesperson listed in the same row. However,if a salesperson has no customers, that person's name won't appear in thecustomer table. The outer join is used if the listing of all salespersonsis to be printed, listed with their customers, whether the salespersonhas a customer or not--that is, no customer is printed (a logical NULLvalue) if the salesperson has no customers, but is in the salespersonstable. Otherwise, the salesperson will be listed with each customer.

ENOUGH QUERIES!!! you say?...now on to something completely different...


EmbeddedSQL--an ugly example (do not write a program like this...for purposes ofargument ONLY)

/* -To get right to it, here is an example program that uses Embedded
    SQL. Embedded SQL allows programmers to connectto a database and
    include SQL code right in the program, so thattheir programs can
    use, manipulate, and process data from a database.
   -This example C Program (using Embedded SQL) will printa report.
   -This program will have to be precompiled for the SQLstatements,
    before regular compilation.
   -The EXEC SQL parts are the same (standard), but thesurrounding C
    code will need to be changed, including the hostvariable
    declarations, if you are using a different language.
   -Embedded SQL changes from system to system, so, onceagain, check
    local documentation, especially variable declarationsand logging
    in procedures, in which network, DBMS, and operatingsystem
    considerations are crucial. */

/************************************************/
/* THIS PROGRAM IS NOT COMPILABLE OR EXECUTABLE */
/* IT IS FOR EXAMPLE PURPOSES ONLY             */
/************************************************/

#include <stdio.h>

/* This section declares the host variables; these will be the
   variables your program uses, but also the variable SQLwill put
   values in or take values out. */
EXEC SQL BEGIN DECLARE SECTION;
  int BuyerID;
  char FirstName[100], LastName[100], Item[100];
EXEC SQL END DECLARE SECTION;

/* This includes the SQLCA variable, so that some error checkingcan be done. */
EXEC SQL INCLUDE SQLCA;

main() {

/* This is a possible way to log into the database */
EXEC SQL CONNECT UserID/Password;

/* This code either says that you are connected or checks if anerror
   code was generated, meaning log in was incorrect or notpossible. */   if(sqlca.sqlcode) {
    printf(Printer, "Error connecting to databaseserver.\n");
    exit();
  }
  printf("Connected to database server.\n");

/* This declares a "Cursor". This is used when a queryreturns more
   than one row, and an operation is to be performed oneach row
   resulting from the query. With each row established bythis query,
   I'm going to use it in the report. Later, "Fetch"will be used to
   pick off each row, one at a time, but for the query toactually
   be executed, the "Open" statement is used.The "Declare" just
   establishes the query. */
EXEC SQL DECLARE ItemCursor CURSOR FOR
  SELECT ITEM, BUYERID
  FROM ANTIQUES
  ORDER BY ITEM;
EXEC SQL OPEN ItemCursor;

/* +-- You may wish to put a similar error checking block here --+*/

/* Fetch puts the values of the "next" row of the queryin the host
   variables, respectively. However, a "priming fetch"(programming
   technique) must first be done. When the cursor is outof data, a
   sqlcode will be generated allowing us to leave the loop.Notice
   that, for simplicity's sake, the loop will leave on anysqlcode,
   even if it is an error code. Otherwise, specific codechecking must
   be performed. */
EXEC SQL FETCH ItemCursor INTO :Item, :BuyerID;
  while(!sqlca.sqlcode) {

/* With each row, we will also do a couple of things. First, bumpthe
   price up by $5 (dealer's fee) and get the buyer's nameto put in
   the report. To do this, I'll use an Update and a Select,before
   printing the line on the screen. The update assumes however,that
   a given buyer has only bought one of any given item,or else the
   price will be increased too many times. Otherwise, a"RowID" logic
   would have to be used (see documentation). Also noticethe colon    before host variable names when used insideof SQL statements. */

EXEC SQL UPDATE ANTIQUES
  SET PRICE = PRICE + 5
  WHERE ITEM = :Item AND BUYERID = :BuyerID;

EXEC SQL SELECT OWNERFIRSTNAME, OWNERLASTNAME
  INTO :FirstName, :LastName
  FROM ANTIQUEOWNERS
  WHERE BUYERID = :BuyerID;

    printf("%25s %25s %25s", FirstName,LastName, Item);

/* Ugly report--for example purposes only! Get the next row. */
EXEC SQL FETCH ItemCursor INTO :Item, :BuyerID;
  }

/* Close the cursor, commit the changes (see below), and exit the
   program. */
EXEC SQL CLOSE DataCursor;
EXEC SQL COMMIT RELEASE;
  exit();
}


CommonSQL Questions--Advanced Topics (see FAQ link for several more)

  1. Why can't I just ask for the first three rows in a table? --Becausein relational databases, rows are inserted in no particular order, thatis, the system inserts them in an arbitrary order; so, you can only requestrows using valid SQL features, like ORDER BY, etc.
  2. What is this DDL and DML I hear about? --DDL (Data Definition Language)refers to (in SQL) the Create Table statement...DML (Data ManipulationLanguage) refers to the Select, Update, Insert, and Delete statements.
  3. Aren't database tables just files? --Well, DBMS's store data in filesdeclared by system managers before new tables are created (on large systems),but the system stores the data in a special format, and may spread datafrom one table over several files. In the database world, a set of filescreated for a database is called a tablespace. In general, on smallsystems, everything about a database (definitions and all table data) iskept in one file.
  4. (Related question) Aren't database tables just like spreadsheets? --No,for two reasons. First, spreadsheets can have data in a cell, but a cellis more than just a row-column-intersection. Depending on your spreadsheetsoftware, a cell might also contain formulas and formatting, which databasetables cannot have (currently). Secondly, spreadsheet cells are often dependenton the data in other cells. In databases, "cells" are independent,except that columns are logically related (hopefully; together a row ofcolumns describe an entity), and, other than primary key and foreign keyconstraints, each row in a table in independent from one another.
  5. How do I import a text file of data into a database? --Well, you can'tdo it directly...you must use a utility, such as Oracle's SQL*Loader, orwrite a program to load the data into the database. A program to do thiswould simply go through each record of a text file, break it up into columns,and do an Insert into the database.
  6. What is a schema? --A schema is a logical set of tables, suchas the Antiques database above...usually, it is thought of as simply "thedatabase", but a database can hold more than one schema. For example,a star schema is a set of tables where one large, central tableholds all of the important information, and is linked, via foreign keys,to dimension tables which hold detail information, and can be usedin a join to create detailed reports.
  7. What are some general tips you would give to make my SQL queries anddatabases better and faster (optimized)?
  8. What is normalization? --Normalization is a technique of databasedesign that suggests that certain criteria be used when constructing atable layout (deciding what columns each table will have, and creatingthe key structure), where the idea is to eliminate redundancy of non-keydata across tables. Normalization is usually referred to in terms of forms,and I will introduce only the first three, even though it is somewhat commonto use other, more advanced forms (fourth, fifth, Boyce-Codd; see documentation).

    First Normal Form refers to moving data into separate tables wherethe data in each table is of a similar type, and by giving each table aprimary key.

    Putting data in Second Normal Form involves taking out data offto other tables that is only dependent of a part of the key. For example,if I had left the names of the Antique Owners in the items table, thatwould not be in second normal form because that data would be redundant;the name would be repeated for each item owned, so the names were placedin their own table. The names themselves don't have anything to do withthe items, only the identities of the buyers and sellers.

    Third Normal Form involves getting rid of anything in the tablesthat doesn't depend solely on the primary key. Only include informationthat is dependent on the key, and move off data to other tables that areindependent of the primary key, and create a primary keys for the new tables.

    There is some redundancy to each form, and if data is in 3NF (shorthandfor 3rd normal form), it is already in 1NF and 2NF. In termsof data design then, arrange data so that any non-primary key columns aredependent only on the whole primary key. If you take a look at thesample database, you will see that the way then to navigate through thedatabase is through joins using common key columns.

    Two other important points in database design are using good, consistent,logical, full-word names for the tables and columns, and the use of fullwords in the database itself. On the last point, my database is lacking,as I use numeric codes for identification. It is usually best, if possible,to come up with keys that are, by themselves, self-explanatory; for example,a better key would be the first four letters of the last name and firstinitial of the owner, like JONEB for Bill Jones (or for tiebreaking purposes,add numbers to the end to differentiate two or more people with similarnames, so you could try JONEB1, JONEB2, etc.).
  9. What is the difference between a single-row query and a multiple-rowquery and why is it important to know the difference? --First, to coverthe obvious, a single-row query is a query that returns one row as itsresult, and a multiple-row query is a query that returns more than onerow as its result. Whether a query returns one row or more than one rowis entirely dependent on the design (or schema) of the tables ofthe database. As query-writer, you must be aware of the schema, be sureto include enough conditions, and structure your SQL statement properly,so that you will get the desired result (either one row or multiple rows).For example, if you wanted to be sure that a query of the AntiqueOwnerstable returned only one row, consider an equal condition of the primarykey-column, OwnerID.

    Three reasons immediately come to mind as to why this is important. First,getting multiple rows when you were expecting only one, or vice-versa,may mean that the query is erroneous, that the database is incomplete,or simply, you learned something new about your data. Second, if you areusing an update or delete statement, you had better be sure that the statementthat you write performs the operation on the desired row (or rows)...orelse, you might be deleting or updating more rows than you intend. Third,any queries written in Embedded SQL must be carefully thought out as tothe number of rows returned. If you write a single-row query, only oneSQL statement may need to be performed to complete the programming logicrequired. If your query, on the other hand, returns multiple rows, youwill have to use the Fetch statement, and quite probably, some sort oflooping structure in your program will be required to iterate processingon each returned row of the query.
  10. What are relationships? --Another design question...the term"relationships" (often termed "relation") usually refersto the relationships among primary and foreign keys between tables. Thisconcept is important because when the tables of a relational database aredesigned, these relationships must be defined because they determine whichcolumns are or are not primary or foreign keys. You may have heard of anEntity-Relationship Diagram, which is a graphical view of tablesin a database schema, with lines connecting related columns across tables.See the sample diagram at the end of this section or some of the sitesbelow in regard to this topic, as there are many different ways of drawingE-R diagrams. But first, let's look at each kind of relationship...

    A One-to-one relationship means that you have a primary key columnthat is related to a foreign key column, and that for every primary keyvalue, there is one foreign key value. For example, in the firstexample, the EmployeeAddressTable, we add an EmployeeIDNo column. Then,the EmployeeAddressTable is related to the EmployeeStatisticsTable (secondexample table) by means of that EmployeeIDNo. Specifically, each employeein the EmployeeAddressTable has statistics (one row of data) inthe EmployeeStatisticsTable. Even though this is a contrived example, thisis a "1-1" relationship. Also notice the "has" in bold...whenexpressing a relationship, it is important to describe the relationshipwith a verb.

    The other two kinds of relationships may or may not use logical primarykey and foreign key constraints...it is strictly a call of the designer.The first of these is the one-to-many relationship ("1-M").This means that for every column value in one table, there is one ormore related values in another table. Key constraints may be addedto the design, or possibly just the use of some sort of identifier columnmay be used to establish the relationship. An example would be that forevery OwnerID in the AntiqueOwners table, there are one or more (zero ispermissible too) Items bought in the Antiques table (verb: buy).

    Finally, the many-to-many relationship ("M-M") does notinvolve keys generally, and usually involves idenifying columns. The unusualoccurence of a "M-M" means that one column in one table is relatedto another column in another table, and for every value of one of thesetwo columns, there are one or more related values in the correspondingcolumn in the other table (and vice-versa), or more a common possibility,two tables have a 1-M relationship to each other (two relationships, one1-M going each way). A [bad] example of the more common situation wouldbe if you had a job assignment database, where one table held one row foreach employee and a job assignment, and another table held one row foreach job with one of the assigned employees. Here, you would have multiplerows for each employee in the first table, one for each job assignment,and multiple rows for each job in the second table, one for each employeeassigned to the project. These tables have a M-M: each employee in thefirst table has many job assignments from the second table, andeach job has many employees assigned to it from the first table.This is the tip of the iceberg on this topic...see the links below formore information and see the diagram below for a simplified exampleof an E-R diagram.
    Sample Simplified Entity-Relationship Diagram
  11. What are some important nonstandard SQL features (extremely commonquestion)? --Well, see the next section...


NonstandardSQL..."check local listings"


Syntax Summary--ForAdvanced Users Only

Here are the general forms of the statements discussed in this tutorial,plus some extra important ones (explanations given). REMEMBER thatall of these statements may or may not be available on your system, socheck documentation regarding availability:

ALTER TABLE <TABLE NAME> ADD|DROP|MODIFY (COLUMNSPECIFICATION[S]...see Create Table); --allows you to add or deletea column or columns from a table, or change the specification (data type,etc.) on an existing column; this statement is also used to change thephysical specifications of a table (how a table is stored, etc.), but thesedefinitions are DBMS-specific, so read the documentation. Also, these physicalspecifications are used with the Create Table statement, when a table isfirst created. In addition, only one option can be performed per AlterTable statement--either add, drop, OR modify in a single statement.

COMMIT; --makes changes made to some database systemspermanent (since the last COMMIT; known as a transaction)

CREATE [UNIQUE] INDEX <INDEX NAME>
ON <TABLE NAME> (<COLUMN LIST>);
--UNIQUE is optional;within brackets.

CREATE TABLE <TABLE NAME>
(<COLUMN NAME> <DATA TYPE> [(<SIZE>)] <COLUMN CONSTRAINT>,
...other columns); (
also valid with ALTER TABLE)
--where SIZE is only used on certain data types (see above), and constraintsinclude the following possibilities (automatically enforced by the DBMS;failure causes an error to be generated):

  1. NULL or NOT NULL (see above)
  2. UNIQUE enforces that no two rows will have the same value for thiscolumn
  3. PRIMARY KEY tells the database that this column is the primarykey column (only used if the key is a one column key, otherwise a PRIMARY KEY(column, column, ...) statement appears after the last column definition.
  4. CHECK allows a condition to be checked for when data in that columnis updated or inserted; for example, CHECK (PRICE > 0) causesthe system to check that the Price column is greater than zero before acceptingthe value...sometimes implemented as the CONSTRAINT statement.
  5. DEFAULT inserts the default value into the database if a row is insertedwithout that column's data being inserted; for example, BENEFITS INTEGERDEFAULT = 10000
  6. FOREIGN KEY works the same as Primary Key, but is followed by: REFERENCES<TABLE NAME> (<COLUMN NAME>), which refers to the referentialprimary key.

CREATE VIEW <TABLE NAME> AS <QUERY>;

DELETE FROM <TABLE NAME> WHERE <CONDITION>;

INSERT INTO <TABLE NAME> [(<COLUMN LIST>)]
VALUES (<VALUE LIST>);

ROLLBACK; --Takes back any changes to the database thatyou have made, back to the last time you gave a Commit command...beware!Some software uses automatic committing on systems that use the transactionfeatures, so the Rollback command may not work.

SELECT [DISTINCT|ALL] <LIST OF COLUMNS, FUNCTIONS, CONSTANTS,ETC.>
FROM <LIST OF TABLES OR VIEWS>
[WHERE <CONDITION(S)>]
[GROUP BY <GROUPING COLUMN(S)>]
[HAVING <CONDITION>]
[ORDER BY <ORDERING COLUMN(S)> [ASC|DESC]];
--where ASC|DESCallows the ordering to be done in ASCending or DESCending order

UPDATE <TABLE NAME>
SET <COLUMN NAME> = <VALUE>
[WHERE <CONDITION>];
--if the Where clause is left out, allrows will be updated according to the Set statement


Important Links

Computing & SQL/DB Links: Netscape-- Oracle -- Sybase-- Informix --Microsoft
SQL ReferencePage -- Ask theSQL Pro -- SQLPro's Relational DB Useful Sites
Programmer'sSource -- DBMSSites -- inquiry.com -- DBIngredients
Web Authoring -- ComputingDictionary -- DBMS Lab/Links-- SQLFAQ -- SQLDatabases
RITDatabase Design Page -- DatabaseJump Site -- ProgrammingTutorials on the Web
Development Resources-- Query List -- IMAGESQL

Miscellaneous: CNN -- USAToday -- Pathfinder -- ZDNet-- Metroscope -- CNet
Internet Resource List-- Netcast Weather-- TechWeb -- LookSmart

Search Engines: Yahoo -- AltaVista -- Excite -- WebCrawler-- Lycos -- Infoseek-- search.com

These sites are not endorsed by the author.


Disclaimer