Data exploration! What does it mean to explore? To explore is to inquire into or discuss in detail. To explore is to travel through an unfamiliar path in order to learn about it, to conquer it and that’s exactly what we intend to achieve in this article.
Sequel to the article on “A Data Analyst’s Introduction to MySQL”, we are going to start exploring the Vanni manufacturing database. If you have no experience with MySQL, we suggest you read this introduction to MySQL for data analysis.
You can tag along and perform the examples in this article by downloading the database.
SQL statements, clauses and operators
Remember in order to access the vannimanufacturing database, you either double-click on the database name or use the “USE statement.” The Database name will get boldened which indicates it has been activated.
SQL keywords are not case sensitive i.e. “select is the same as SELECT.” However, it is best practice to write SQL keywords in Block letters and separate multiple columns using a comma(,).
In addition, we use the keyboard shortcut “ctrl+shift+enter” or click the “yellow lightning icon” to run a SQl statement.
The semicolon (;) is the standard way to separate each SQL statement in database systems.
Some of the most important SQL statements
SQL statements are used to perform tasks such as update or retrieve data from a database.
- SELECT – extracts data from a database
- UPDATE – updates data in a database
- DELETE – deletes data from a database
- INSERT INTO – inserts new data into a database
- CREATE DATABASE – creates a new database
- ALTER DATABASE – modifies a database
- CREATE TABLE – creates a new table
- ALTER TABLE – modifies a table
- DROP TABLE – deletes a table
- CREATE INDEX – creates an index (search key)
- DROP INDEX – deletes an index
Some of the most important SQL clauses
Basically, we use them to apply filters for queries and thus get a filtered result. The most important types are:
- DISTINCT clause – used to retrieve unique records
- FROM clause – used to list out tables and join conditions
- WHERE clause – used to filter results
- ORDER BY- used to sort results
- GROUP BY – used to group one or more columns
- HAVING – used to restrict the records returned by the GROUP BY clause
Some of the most important SQL logical operators
An operator is a reserved word or a character used primarily in a SQL statement’s WHERE clause to perform operation(s), such as comparisons and arithmetic operations.
We listed a definitive array of “SQL comparison operators” in the previous article titled “A Data Analyst’s Introduction to MySQL.” Here is a comprehensive list of SQL logical operators.
- LIKE – used to compare values using wildcards (%,-)
- BETWEEN – used to search for values within a range
- IN – used to search for values within a specified list
- AND – allows the existence of multiple conditions
- IS NULL – used to compare values with a null value
- OR – used to combine multiple conditions
- NOT – the negate operator
- ANY – used to compare any values in a specified list
The SELECT statement in SQL
The SELECT statement is used to request or retrieve data from a database. The data returned is stored in a result table known as the result-set.
FROM Clause in SQL. This clause is used with the SELECT, UPDATE and DELETE statements to query target tables.
SELECT column1, column2, ... FROM table_name;
SELECT * Example
The following SQL statement selects all the columns from the “Customers” table because of the (*) asterisk:
SELECT DISTINCT Statement
The SELECT DISTINCT statement is used to return only unique values.
The WHERE clause is used to filter records. i.e. It is used to extract only those records that fulfill a specified condition.
The Logical Operators
These Operators are used to specify conditions in a SQL statement and to serve as conjunctions for multiple conditions in a statement.
AND, OR and NOT Operators
The AND and OR operators are used to filter records based on more than one condition. The NOT operator displays a record if the condition(s) is NOT TRUE.
The AND operator
Using the AND operator excludes all results which doesn’t fit both conditions, so only the specified country and state are shown — in the vannimanufacturing database, that gives us just one result. If we use OR in the same statement, we get some different results.
The OR operator
Using the OR operator includes all records which satisfy either conditions, so now we have all customers who are from all states which are in the USA. We can also use the NOT operator before a condition, to exclude every record which meets that condition and include all other records.
The Not operator
Using the NOT condition gives us all records that do not satisfy the first condition which is not to return the USA. The USA was included here because it meets the second condition which is the state. This is very useful when we want to exclude certain records.
The BETWEEN operator
The BETWEEN operator allows us to select the records with a value that falls within a range.
The LIKE operator
It allows us to search for values which match a specified pattern. We make use two distinct types of wildcards to in our search patterns:
- % – this wildcard stands for zero, one or multiple characters
- _ (underscore) this wildcard stands for exactly one character
Using MySQL LIKE with the percentage (%) wildcard examples
This example uses the LIKE operator to find employees whose first names start with the letter “A”:
In this example, MySQL scans the whole employees table to find employees whose first names start with the character ‘a’ and are followed by any number of characters.
This example uses the LIKE operator to find employees whose last names end with ‘on’
If you know the searched string is embedded in the middle of a string, you can use the percentage ( % ) wildcard at the beginning and at the end of the pattern.
For example, to find all employees whose last names contain ‘on’ , you use the following query with the pattern %on%
Using MySQL LIKE with underscore( _ ) wildcard examples
To find employees whose first names start with T , end with m, and contain any single character between e.g., Tom , Tim, you use the underscore (_) wildcard to construct the pattern as follows:
We have the option of combining both wildcards in a query.
For example, the following SQL statement selects all customers with a name that has the letter “R” in the second position:
Using MySQL LIKE operator with the NOT operator example
The MySQL allows you to combine the NOT operator with the LIKE operator to find a string that does not match a specific pattern.
Suppose you want to search for employees whose last names don’t start with the character B, you can use the NOT LIKE with a pattern as shown in the following query:
Let’s get wild…Understanding Wildcards
A wildcard is used to substitute one or more characters in a string. All wildcards can be used in combinations. We have familiarized ourselves with a few wildcards in the Like operator. Here is a list of a few others:
Regular Expressions (REGEXP)
Regular expressions are used to match as much as possible. It allows us to make more complex searches for strings.
Let’s perform some searches using REGEXP…
The pipe (|) which means (or)
Query the customers table for customers whose last names contain either worth or son.
The Square brackets [ ]
Query the database for customers whose last names contains “G” or “N” Or “M” or “I” before the letter E.
Square brackets and dashes [-]
Instead of listing the search characters, you can search for a range. Query the database for customers whose first names contains “D” and any of the characters in the range [a-h].
The Dollar sign $
Query the database for customers whose last name ends with either wn or ler.
These are just a few examples of how to perform more complex searches in MySQL server using Regular expressions. In another article, we will provide a more detailed write up on how to explore the profound characteristics of regular expressions.
SQL IN Operator
The IN operator allows you to specify multiple values in the WHERE clause.
The LIMIT clause
The LIMIT clause is used in the SELECT statement to restrict the number of rows/records to be returned. The LIMIT clause accepts one or two arguments. The values of both arguments must be zero or positive integers.
For example, let’s query the database for the top 5 customers with the highest credit limit:
The LIMIT clause is also used for pagination i.e. it is used to request for data in different pages of a document. This is similar to the offset function in Microsoft Excel.
For example, let’s query the database for records that are in the second page of the product table:
We have reached the climax for this article. In this tutorial, you have learnt about the most important and consistently used SQL statements, logical operators and clauses. We do hope that you found this article profoundly intuitive. Stay tuned for more Avant Garde tutorials on Exploring Data using SQL.
We appreciate our readers and learners and would love to get your feedback.
Don’t forget to like, share the article and subscribe for more exciting content.
Do you want to learn how to automate tasks with Excel?
Receive Fresh, Weekly Analytics Content
Join 1K+ subscribers to receive original, freshly curated content every week.
Related article – Visualizing WhatsApp Data using Python and Power BI
Video series – Building an Expenses Management Dashboard in Excel