Week 3 CST 363: OLTP vs. OLAP

During our third week, we have been finishing up Part 1 of our first project, which has us designing a database with data capture or OLTP (online transaction processing) in mind. This initial design of our database relies primarily on what we have learned last week about normalizing databases to reduce data redundancy. By contrast, this week’s readings have focused on OLAP (online analysis processing), which serves a different purpose than OLTP. My understanding of these differences are outlined in the table below:

OLTP OLAP
Purpose is operational record keeping Purpose is analytical decision making / evaluation of performance (answer how/what/why questions)
Optimized to process transactions quickly — usually processes one transaction at a time Must handle up to hundreds of thousands of transactions at a time
Updated to maintain the current state (no historical records are kept) Historical records are important to provide context for the evaluation of performance over time

OLTP and OLAP differ fundamentally because the goals of the end user are different. While OLTP typically use relational databases that are normalized (typically in the third normal form) to reduce data redundancy and to be able to process single transactions quickly, OLAP makes use of dimensional modeling to deliver data that is understandable and appealing to the end user.

Kimball and Ross (2013) use a restaurant metaphor to describe one form of OLAP architecture — Kimball’s Data Warehouse/ Business Intelligence (DW/BI) architecture .

Figure 2 in Ross (2004)

In the restaurant metaphor, the kitchen is the staging area of the DW/BI. Just as kitchen staff must ensure that the ingredients they procure are high quality and are prepared in a manner that suits the diner’s palate, we who design the data staging area are concerned with cleaning the data (e.g. by fixing misspellings or formatting inconsistencies) that we extract from multiple sources. This process is called the ETL (Extract, Transformation, and Load) System.

Just as we would not expect diners to wander into the back kitchen, we would not expect users to have access to the staging area / ETL. Diners belong in the dining room, where they are presented with prepared food. A DW/BI user’s domain is in the presentation area, where they should be presented with data in a meaningful and easy-to-understand format.


References

Kimball, R. and Ross, M. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling (3rd Ed.). Indianapolis, IN: John Wiley and Sons.

Ross, M. (2004). Differences of Opinion. Kimball Group. https://www.kimballgroup.com/2004/03/differences-of-opinion/

Week 2 CST 363

This week we learned about grouping results (summaries) and subqueries before moving on to database design.

Summaries

When creating summary queries, you use one or more aggregate functions: AVG, SUM, MMIN, MAX, COUNT (or COUNT (*) ). Any SELECT clause having an aggregate function CANNOT have a non-aggregate column UNLESS it is used to GROUP results in a GROUP BY clause. As an example, you can display a SUM OF INVOICES (aggregate column) GROUPED BY vendor name (non-aggregate column).

Syntax is very important when crafting summary queries. The GROUP BY clause needs to occur before the search condition in the HAVING clause and the order by list in the ORDER BY clause to produce the intended results of a summary query.

Subqueries

Most JOINs can be restated as a subquery and vice versa.

Subqueries can be coded in an outer SELECT statement in a:

  • WHERE clause as a search condition
  • HAVING clause as a search condition
  • FROM clause as a table specification
  • SELECT clause as a column specification

Subqueries can return a:

  • single value (generally used in WHERE and HAVING)
  • list of values, i.e. one column (generally used the IN of a WHERE / HAVING clause or in a SELECT clause
  • table of values, i.e. multiple columns (generally used in FROM)

JOINs versus Subqueries

JOINSSUBQUERIES
SELECT clause of JOIN can include columns from both tables.Can pass an aggregate value to main query
more intuitive when existing relationships between tables (primary and foreign keys)more intuitive when forming an ad hoc relationship between tables
long/complex queries sometimes easier to code/read
  • both used for queries that interact with 2+ tables
  • usually they translate back and forth

Database Design

An important aspect of database design is normalization–separating data in a data structures to separate and related tables to reduce data redundancy. However, in the real world, databases are not completely normalized. Out of the seven normal forms outlined by the Murach text, only the first, second, and third normal forms are generally used in the real world.

Below is the first draft of my team’s database design for a system that will be able to give users the ability to search for a teacher’s schedule (the classes they teach, where they teach them, and when they teach them).

EER Model for the Staff Directory System

CST 363 Week 1

Today marks the first official day of CST 363, a Databases course taught with MySQL and Python 3. Because I am returning to work next week and because the professor posted our weekly work in advance, I made sure to get a head start to manage my time effectively. Therefore, I have already completed all of Week 1’s readings and the assignments.

I can remember the term CGI as far back as I started using the Internet back in the mid to late 90s on AOL. I recall seeing member pages with a directory named cgi-bin. Now I actually know what CGI (Common Gateway Interface) is and what the cgi-bin directory is for. A Common Gateway Interface allows webservers to execute applications and create webpages dynamically. The cgi-bin directory is the directory where such programs must be stored (on a webserver) in order to be executed correctly by the webserver.

This week, I was able to created a web-based form that simulating either a user attempting to login or to register on a server. First, I ran Python script (provided my the professor) that started an HTTP server on my local machine. An HTTP server allows you to access and HTML file on your browser via a GET request to that server:

127.0.0.1 - - [09/Jan/2019 18:02:09] "GET /login.html HTTP/1.1" 200 -

As previously mentioned, our assignment was to create a web-based form to allow a user to login or register. A user’s login information (userid and password) had to be checked against a MySQL database in order to determine if the credentials were valid for either registration (userid can’t already exist) or logging in (password must match for an existing userid).

The HTML <form> tag requires two attributes: action and method. The action is the name of a file or the name of a program the HTTP server should send form-data to. The method specifies whether the HTTP server should send the form-data with a GET or POST request. In this case, our web form sent a POST request encoding key-value pairs (userid: ‘your_user_id’ and password: ‘your_password’) to a Python 3 program that would use this data to authenticate the credentials (or insert in the case of valid registration) against our MySQL database.

127.0.0.1 - - [09/Jan/2019 18:53:04] "POST /cgi-bin/login.py HTTP/1.1" 200 -

Week 8 CST 205 : FINISHED!

A huge thanks to my team, Team SCSI Logic for being so supportive these past eight weeks. We all have quite busy lives, but we pulled through and created many projects we all should feel proud of. Our final project, The Crypts of Chelon, came out really well for a game created in JES. I look forward to seeing what our team can do with more robust resources.

You can view the trailer and demo for The Crypts of Chelon below:

Week 7: Web Scraping and HTML Creation

This week’s emphasis has been on scraping websites and creating HTML with Python. During the individual lab, we had to build off of a previous lab (Lab 14), in which we created a frequency dictionary of all the unique words in a modified version of Dr. Seuss’ Green Eggs and Ham.

This time, however, instead of merely printing out the words and their counts, we had to display each word in an HTML file, altering the size and color of the text based on the frequency the word occurs in the story. My first approach was to use the frequency count as the font size. However, many words had a frequency of 1 or 2, which made them quite hard to see since their font size was being set to 1 or 2 pixels. Therefore, I decided to add 10 to font sizes, so that the minimum font size would be 10. I was also able to use the frequency to influence the greenness of the font of each word. To do this, I had to convert frequencies (integers) to hexadecimal and insure that each hexadecimal number was two digits to represent the green in the RGB hex code. I wound up scaling the frequencies so that the most frequent word would get the max amount of greenness (255 or FF in hexadecimal).

 

We also worked on our final project! Here’s a screen cap:

Week 6: Text Manipulation

This week emphasized Python lists and dictionaries. Interestingly enough, our team utilized both of these structures in our text adventure game last week. This turned out to be an advantage since one of this week’s assignments was to refactor our text adventure game to implement these data structures in some way. We were ahead of the game in that respect. Great job, team!

Other assignments we were tasked with included creating a word frequency analysis program to identify all unique words in a modified version of Dr. Seuss’ Green Eggs and Ham, developing a news headline scraper, and implementing a MadLibs program that would revise a short news story.

You can play with my MadLibs program below:

Week 5: CST 205: Text Adventure Game

This week, our team worked together to design and implement a text adventure game. Because we all have varying skill levels in programming and programming with Python in particular, pair programming was essential to the development process. We also heavily relied on GitHub for collaboration and version control.

It was easy to see how feature creep could quickly become problematic, as we continuously were juggling different ideas for this seemingly simple text adventure game.  However, the creation process and collaboration was fun and invaluable and I truly enjoyed this assignment.

Try our game below!

You can find the source code to our game on GitHub!

CST 205: Week 4 Audio Manipulation

This week we started learning about audio manipulation with JES. I learned that audio is recorded using different sample rates (samples per second) and with different bit depths (bits per sample). The bit depth of an audio file determines how many unique values a single sample can have. A sample’s value corresponds to amplitude (how quiet or loud it is). It is the change in a sample’s value over discrete time steps that determines a sound wave’s frequency (how high or low the pitch).

Ultimately, we applied what we learned about sound manipulation to create our own unique sound collages. I was selected to create a demo video on my approach and it can be seen below:

I decided I wanted to blend two of my samples together, so that I could have background music throughout my sound collage. It was not part of the assignment, but it was fun figuring it out.

After our brief section on audio manipulation, we were tasked to create a hangman game with a partner. It was fun figuring out all the error handling to redirect a player who may have entered unexpected input. It was satisfying creating a working copy of this simple game and I am excited to work with my team on our upcoming text-based adventure game!

Last, out midterm project was due this week. We had to create two filters: a CSUMB-themed filter, which we created last week, and one of our choice. Cody (my team partner) and I decided to create an effect that would make a photo appear like a bad VHS recording. While we didn’t get to include all the features we originally had hoped for (pixel shifting and color bleeding), I’m pretty satisfied with the results:

Unaltered Paramount logo

Paramount logo with our VHS effect applied

CST 205 Week 3 Midterm Update

We’re already planning a midterm project! This week, we worked on Lab 7 with a pair programming partner to create a design for a Thanksgiving card. I had the idea to create a scene with face cut-outs. After utilizing a similar algorithm to our chromakey procedure to get the photos drawn in the right place, we added functions that would scale the face photos up or down to fit the face cut-outs. After slapping on some text to wish our audience a happy Thanksgiving, it was done!

Thanksgiving card

I had a lot of fun with this project because we got to utilize everything we have learned about image manipulation until now. I also spent a fair amount of time refactoring the code to make it more readable and generalizable. It was a lot of fun.

We are doing peer and self evaluations right now as well. Peer evaluations will involve code checks, where we go through and analyze one another’s code in terms of naming conventions (functions and variables are well named), functionality (everything compiles and executes in the expected manner), and use of abstraction (breaking down complex problems into smaller easier to understand problems). It should be a great learning experience.

I am excited about the midterm project because we get to create our own filters. I have already completed the CSUMB filter. I found out how to programmatically create a gradient of two colors and to blend the colors of two source photos together (you take the average of reds, greens, blues, and use those averages to create a new color). I also added on a pixelated version of CSUMB’s mascot, an otter!

Gradient of CSUMB’s school colors

A photo of me CSUMB-ified

Week 2 of CST 205: Manipulating Pixels

Lab Reflection

This week we used loops and control statements to manipulate images.

Lab 4

In this independent lab, we created a variety of mirroring functions by iterating through the pixels of part of an image and copying the colors of those pixels to another part of an image. Here are some examples:

Original image of Mui, my whitefaced cockatiel.

Vertical mirror, left to right

Vertical mirror, right to left

Horizontal mirror, top to bottom

Horizontal mirror, bottom to top

Quadruple mirroring, using right to left and top to bottom mirroring

Lab 5

This was another independent lab. In this lab, we learned how to create a blank Picture object and to copy pixels from other pictures on to it. This enabled us to copy multiple pictures on to a large empty canvas, creating a collage. My approach was to iterate through a list of pictures and to apply random effects to each picture before it got copied on to the blank canvas. I also added my name to the bottom using the addTextWithStyle JES function. Here is one of the results:

LAB 6

In this pair programming lab, my teammates and I developed functions to manipulate pictures based on the color values of pixels. In the warmup problem, we had to implement red eye correction. Essentially, we had to detect pixels that were within some range of red (using a distance function) and to change those pixels’ color values to something that looked more natural (we chose black). Below is the before and after:

Photo before redeye correction

Photo after redeye correction

 

We then developed a function that created a sepia effect. This was done by first converting the photo to grayscale (when each value of red, green, and blue are essentially equivalent or close to being equivalent). Then, based on each pixel’s red amplitude, we modified the amount of red and blue. This created the sepia effect:

Original photo of salmon roe rice bowl

Sepia effect

 

We also created a function that created a fun posterizing, cartoony effect. This was a lot like the sepia function, in that the amount by which we changed the color of a pixel depended on the original color value. However in this case, we modified each color (red, green, and blue) based on the previous values of red, green, and blue. Here is the effect:

Original photo of bike

“Artified” version of bike

Last, we developed a chroma key function to add a background on to a green screen photo. Essentially, we had to detect the degree to which each pixel’s color matched the green color we expect the green screen to be. If it matched within a reasonable range, we copied the colors of the background photo (the volcano picture in the example) on to the green pixels of the green screen photo (the photo of Shia LeBeouf in this example):

Shia LaBeouf in front of a green screen

Volcano

Shia defeating the volcano