ResagratiaResagratiaResagratia
+234 806 299 1199
contact@resagratia.com

Exploratory Data Analysis in SQL

Data exploration! What does it mean to explore? To explore is to inquire into or discuss in detail. To explore is to travel through an unfamiliar path in order to learn about it, to conquer it and that’s exactly what we intend to achieve in this article.

Sequel to the article on “A Data Analyst’s Introduction to MySQL”, we are going to start exploring the Vanni manufacturing database. If you have no experience with MySQL, we suggest you read this introduction to MySQL for data analysis.

You can tag along and perform the examples in this article by downloading the database.

174 Downloads

SQL statements, clauses and operators

Remember in order to access the vannimanufacturing database, you either double-click on the database name or use the “USE statement.” The Database name will get boldened which indicates it has been activated.

Database Activation

SQL keywords are not case sensitive i.e. “select is the same as SELECT.” However, it is best practice to write SQL keywords in Block letters and separate multiple columns using a comma(,).

In addition, we use the keyboard shortcut “ctrl+shift+enter” or click the “yellow lightning icon” to run a SQl statement.

The semicolon (;) is the standard way to separate each SQL statement in database systems.

Yellow Lightning Icon

Some of the most important SQL statements

SQL statements are used to perform tasks such as update or retrieve data from a database.

  1. SELECT – extracts data from a database
  2. UPDATE – updates data in a database
  3. DELETE – deletes data from a database
  4. INSERT INTO – inserts new data into a database
  5. CREATE DATABASE – creates a new database
  6. ALTER DATABASE – modifies a database
  7. CREATE TABLE – creates a new table
  8. ALTER TABLE – modifies a table
  9. DROP TABLE – deletes a table
  10. CREATE INDEX – creates an index (search key)
  11. DROP INDEX – deletes an index

Some of the most important SQL clauses

Basically, we use them to apply filters for queries and thus get a filtered result. The most important types are:

  1. DISTINCT clause – used to retrieve unique records
  2. FROM clause – used to list out tables and join conditions
  3. WHERE clause – used to filter results
  4. ORDER BY- used to sort results
  5. GROUP BY – used to group one or more columns
  6. HAVING – used to restrict the records returned by the GROUP BY clause

Some of the most important SQL logical operators

An operator is a reserved word or a character used primarily in a SQL statement’s WHERE clause to perform operation(s), such as comparisons and arithmetic operations.

We listed a definitive array of  “SQL comparison operators” in the previous article titled “A Data Analyst’s Introduction to MySQL.” Here is a comprehensive list of SQL logical operators.

  1. LIKE – used to compare values using wildcards (%,-)
  2. BETWEEN – used to search for values within a range
  3. IN – used to search for values within a specified list
  4. AND – allows the existence of multiple conditions
  5. IS NULL – used to compare values with a null value
  6. OR – used to combine multiple conditions
  7. NOT – the negate operator
  8. ANY – used to compare any values in a specified list

The SELECT statement in SQL

The SELECT statement is used to request or retrieve data from a database. The data returned is stored in a result table known as the result-set.

FROM Clause in SQL. This clause is used with the SELECT, UPDATE and DELETE statements to query target tables.

SELECT column1, column2, ...
FROM table_name;
SELECT syntax
SQL SELECT syntax
SQL SELECT query
SQL SELECT query
SELECT * Example

The following SQL statement selects all the columns from the “Customers” table because of the (*) asterisk:

SELECT *
SELECT (*) ALL syntax
SQL SELECT * query
SQL SELECT * query
SELECT DISTINCT Statement

The SELECT DISTINCT statement is used to return only unique values.

SQL select distinct
SQL SELECT DISTINCT syntax
SELECT DISTINCT query
SELECT DISTINCT query

WHERE Clause

The WHERE clause is used to filter records. i.e. It is used to extract only those records that fulfill a specified condition.

MySQL WHERE clause
SQL WHERE syntax
SQL WHERE clause
SQL WHERE clause

The Logical Operators

These Operators are used to specify conditions in a SQL statement and to serve as conjunctions for multiple conditions in a statement.

AND, OR and NOT Operators

The AND and OR operators are used to filter records based on more than one condition. The NOT operator displays a record if the condition(s) is NOT TRUE.

The AND operator

SQL AND
SQL AND syntax
SQL AND operator
SQL AND operator

Using the AND operator excludes all results which doesn’t fit both conditions, so only the specified country and state are shown — in the vannimanufacturing database, that gives us just one result. If we use OR in the same statement, we get some different results.

The OR operator

SQL OR
SQL OR syntax
SQL OR operator
SQL OR operator

Using the OR operator includes all records which satisfy either conditions, so now we have all customers who are from all states which are in the USA. We can also use the NOT operator before a condition, to exclude every record which meets that condition and include all other records.

The Not operator

SQL NOT
SQL NOT syntax
SQL NOT Operator
SQL NOT Operator

Using the NOT condition gives us all records that do not satisfy the first condition which is not to return the USA. The USA was included here because it meets the second condition which is the state. This is very useful when we want to exclude certain records.

The BETWEEN operator

The BETWEEN operator allows us to select the records with a value that falls within a range.

SQL BETWEEN
SQL BETWEEN syntax
SQL BETWEEN
SQL BETWEEN operator

The LIKE operator

It allows us to search for values which match a specified pattern. We make use two distinct types of wildcards to in our search patterns:

  • % – this wildcard stands for zero, one or multiple characters
  • _ (underscore) this wildcard stands for exactly one character
SQL LIKE
SQL LIKE syntax
Using MySQL LIKE with the percentage (%) wildcard examples

This example uses the LIKE operator to find employees whose first names start with the letter  “A”:

SQL LIKE %
SQL LIKE %
MySQL LIKE %
SQL LIKE %

In this example, MySQL scans the whole employees table to find employees whose first names start with the character ‘a’ and are followed by any number of characters.

This example uses the LIKE operator to find employees whose last names end with ‘on

SQL LIKE %
LIKE %
LIKE %

If you know the searched string is embedded in the middle of a string, you can use the percentage ( % ) wildcard at the beginning and at the end of the pattern.

For example, to find all employees whose last names contain ‘on’ , you use the following query with the pattern %on%

SQL LIKE
SQL LIKE
SQL %%
SQL LIKE %%
Using MySQL LIKE with underscore( _ ) wildcard examples

To find employees whose first names start with  T , end with m, and contain any single character between e.g., Tom , Tim, you use the underscore (_) wildcard to construct the pattern as follows:

SQL _ WILDCARD
SQL LIKE _ wildcard
LIKE wildcard
SQL LIKE (-)

We have the option of combining both wildcards in a query.

For example, the following SQL statement selects all customers with a name that has the letter “R” in the second position:

SQL LIKE WILDCARDS
SQL LIKE wildcards
LIKE wildcards
Combining LIKE wildcards
Using MySQL LIKE operator with the NOT operator example

The MySQL allows you to combine the NOT operator with the LIKE operator to find a string that does not match a specific pattern.

Suppose you want to search for employees whose last names don’t start with the character B, you can use the NOT LIKE with a pattern as shown in the following query:

SQL NOT LIKE
SQL NOT operator
SQL NOT
SQL NOT operator

Let’s get wild…Understanding Wildcards

A wildcard is used to substitute one or more characters in a string. All wildcards can be used in combinations. We have familiarized ourselves with a few wildcards in the Like operator. Here is a list of a few others:

SQL WIldcards
SQL wildcards

Regular Expressions (REGEXP)

Regular expressions are used to match as much as possible. It allows us to make more complex searches for strings.

Regular Expression
Regular expression symbols

Let’s perform some searches using REGEXP…

The pipe (|) which means (or)

Query the customers table for customers whose last names contain either worth or son.

SQL REGEXP
SQL REGEXP |
REGEXP |
REGEXP using the pipe |
The Square brackets [ ]

Query the database for customers whose last names contains “G” or “N” Or “M” or “I” before the letter E.

REGEXP [ ]
REGEXP using []
REGEXP []
REGEXP []
Square brackets and dashes [-]

Instead of listing the search characters, you can search for a range. Query the database for customers whose first names contains “D”  and any of the characters in the range [a-h].

SQL [-]
REGEXP using [-]
REGEXP [-]
REGEXP [-]
The Dollar sign $

Query the database for customers whose last name ends with either wn or ler.

REGEXP $
REGEXP $

These are just a few examples of how to perform more complex searches in MySQL server using Regular expressions. In another article, we will provide a more detailed write up on how to explore the profound characteristics of regular expressions.

SQL IN Operator

The IN operator allows you to specify multiple values in the WHERE clause.

SQL IN
SQL IN operator
SQL IN
SQL IN

The LIMIT clause

The LIMIT clause is used in the SELECT statement to restrict the number of rows/records to be returned. The LIMIT clause accepts one or two arguments. The values of both arguments must be zero or positive integers.

For example, let’s query the database for the top 5 customers with the highest credit limit:

SQL LIMIT
SQL LIMIT
SQL LMIT

The LIMIT clause is also used for pagination i.e. it is used to request for data in different pages of a document. This is similar to the offset function in Microsoft Excel.

For example, let’s query the database for records that are in the second page of the product table:

SQL LIMIT PAGINATION
SQL LIMIT for separate pages
SQL LIMIT clause
SQL LIMIT clause

Conclusion

We have reached the climax for this article. In this tutorial, you have learnt about the most important and consistently used SQL statements, logical operators and clauses. We do hope that you found this article profoundly intuitive. Stay tuned for more Avant Garde tutorials on Exploring Data using SQL.

We appreciate our readers and learners and would love to get your feedback.

Don’t forget to like, share the article and subscribe for more exciting content.

Author

Geovanni Ubah. You can find me on LinkedIn.


Do you want to learn how to automate tasks with Excel?

YouTube video

Receive Fresh, Weekly Analytics Content

Join 1K+ subscribers to receive original, freshly curated content every week.

Related article – Visualizing WhatsApp Data using Python and Power BI

Video series – Building an Expenses Management Dashboard in Excel