Monday, April 23, 2007

Wrapping Up

This will be my final post for ISys 590R. I wanted to share some of my thoughts and some suggestions for future web analytics classes. I would have liked to see more Search Engine Optimization(SEO) in the class. I think it would have been very beneficial for all of the students to start their own website at the very beginning of class. Using WordTracker they could find keyword phrases that people are searching for and how many sites are competing for those phrases. The search engines usually index you within about 4-8 weeks. The students could then use google analytics to track the traffic on their website and try out different techniques to drive traffic to different landing pages. The students could also use adwords to drive traffic to their website. Whoever has the most traffic to their website by the end of the semester receives an automatic A! Well, maybe not that last part. I think these suggestions would make the class better as a whole. I've learned a lot this semester. Thanks Dr. Liddle and Dr. Rogers!

Prepare to be amazed!

Actually for most of my family, you probably want to tune out of this post. This is a summary post for my ISys 532 class, Information Architecture. So unless web services and model-driven architecture is on your casual reading list, check back later and I’m sure Megan will have posted some extremely witty story. For those who dare, read on!

1. Understand enterprise information architecture so you can articulate a rational position on when and why various supporting methodologies, tools, and protocols are helpful or not helpful in a particular business environment. Major information architecture topics you should know well include the following:

This class has helped me gain the knowledge to move forward and gain a better understanding of what can be done and what is out there for enterprise systems. I liked how we had a general overview of many aspects of information architecture and hands-on assignments in many of them. Actually doing the work helps internalize the processes that we talked about in class.

a. Enterprise application integration

Enterprise systems are getting big, complex and contain terabytes of information. How do we get all that information under control? One simple phrase: Enterprise application integration. What may sound easy isn’t so easy in practice. Enterprise application integration is the concept that all systems can be interwoven through one single interface, to realize competitive advantages over other companies. Most of the time this is a very slow process, as large companies can have thousands of different programs, interfaces and functions (believe me, I worked at Wal-Mart last summer). Most major companies are working towards this goal or have already achieved this goal.

b. Service oriented architecture

I believe SOA is the architecture that will really take off in the years to come and it is already showing up in many organizations. This last week I had a BYU project manager interview for another class. One of his current projects was to implement SOA architecture here at BYU. If BYU is implementing SOA, then we know it has gone mainstream. BYU doesn’t adopt something unless they know it’s a best practice. SOA is a great idea that allows different functions of a system to be broken down into individual services, and these services are independent of one another. These services can then be mashed to together to perform the bigger overall functions. Different areas of the business can call these functions, pass in certain parameters specified by the service and receive a standard answer. SOA cuts down on redundant code and is platform independent.

c. Model-driven architecture and executable models

Kent and I did our paper on Model Driven Architecture. Model-driven architecture is the ability to create working programs by only completing the higher level data modeling. MDA provides great advantages in theory. Once you finish the data model, you have the ability to export code to any language that your tool is compatible with. After talking about MDA with numerous people, I found that MDA, at least among the younger generation, has a less than worthy reputation. Our generation doesn’t believe in MDA. We believe in the concept and think MDA is a good idea, but we don’t believe there is any tool out there that can complete the job from start to finish. I think we’re looking for a service that can produce the whole working system straight from our model, but the MDA programs aren’t currently capable of that. I also learned a lot about the abstraction of programming languages. Every generation of programming languages has gotten easier to understand by human minds. I believe MDA is where the world will move in the future but as of right now, there’s nothing capable of accomplishing the fully functional MDA services from start to finish.

d. Web services

Although there were many grumblings about the difficulty of the web services assignment, I was confident that I could tackle the web services assignment and succeed. Needless to say, the assignment took a lot longer than I anticipated. I wanted to write my web service in PHP, since that was the language I was using at work and was most familiar with and it seemed a lot easier than doing one in Java. There were several good tutorials on how to use PHP to write a web service. Having a PHP web service library abstracted a lot of the more confusing aspects of the web service. Creating a web service to handle one variable was very easy. Throwing in multiple variables of different types was a little trickier. By the end of the assignment, it was rewarding to refresh the page and know that you got this weather information from another web service somewhere out in the Internet. The concept of web services is fascinating. The notion of people creating hundreds of services out there for us to access, and the ability to access it in a like fashion, is where the internet is going. Google allows us to use their GoogleMaps API to access web services in their maps area. Programming a page to use a web service can take as little as five minutes, as showcased in www.ldstemplemaps.com.

2. Understand the key role of database management systems in enterprise information architecture and demonstrate the major elements of DBMS optimization, including DBMS server tuning and SQL query optimization.

The db optimization assignment really had me scared, but Swavek and I sat down and diligently started to chip away at the requirements. It is surprising how something as simple as an index can dramatically affect your query efficiency. Time after time we would test the query before and after and see incredible results. Well, savings of a few tenths of a second, but when you have millions of queries in an enterprise system, tenths of a second makes a huge difference. Afterwards, I reflected on what I had learned and how the assignment could help me at my current work. Right now at my work we’ve switched most of our tables to InnoDB, but we run into problems with tables that are used mostly for reading. InnoDB should be used for tables performing mostly updates and MyISAM should be used for tables performing mostly reads, I heard myself saying to my co-workers. Also, wherever you can make joins in the WHERE clause and not in the FROM statement. The WHERE is performed first and will limit the amount of rows you will actually join on. And lastly, after finishing the whole DB assignment, I admit it was actually a little bit of fun tuning the database.

3. Understand the importance of event logging and system monitoring within an enterprise-class system, and demonstrate how to put a logging/monitoring system in place.

When someone mentions logging and programming together in a sentence, a yawn is usually the most appropriate response. Dr. Liddle taught us otherwise. Logging is a very important function of any enterprise architecture. With million of things going on in your system, diagnosing a problem would be very difficult without a standardized way of dealing with errors. At my current workplace, on programmer in particular likes to put the error 2319 at many places in his code. If you’ve seen Monsters Inc., you’ll remember that number as the code for human contamination. Needless to say, that error doesn’t do a whole lot of good in explaining anything about the particular error. You might as well put nothing down! This is where logging would come in handy.

4. Enhance your teamwork and communication skills by delivering a solid group paper and an in-class presentation on an ISys 532 topic.

Kent and I had great expectations to jump on this paper from the get go and be finished with two weeks to spare. Needless to say, as always, other things popped up and we spent the last few days working vigorously to produce the paper and presentation. Overall, it was a good experience to delve more deeply into an information architecture topic. I enjoyed all of the presentations or snapshots of what is out there. Having presentations on lots of topics helped me choose which topics interest me and from there I can research further in those topics.

Conclusion

This class was great in that it exposed me to a lot of the different architectures to be found in any typical enterprise system. What I’ve realized is that there is a lot of information out there and I can’t begin to be an expert in everything or even a novice. But as a future manager and hopefully CIO, I’m expected to have a general surface knowledge of technologies that can and will help my company achieve better results. This class has taught me how to research these technologies and gain a better understanding of them through trial and error, reading articles and peer presentations.

Great class Dr. Liddle, keep up the good work!


Thursday, April 19, 2007

For the Love of the Game





Today was a reading day before finals so naturally we didn't study! Matt had a softball game tonight and let me tell you, he is one good ballplayer! I (the dedicated wife) braved the freezing weather-yes it's freezing and it's April!- and went out to cheer him on. Here are the stats: he was 4-4 with three runs scored, batting leadoff. What a champ! He made some great hits, catches and slides into home base...and sacrificed his body doing it, literally. The ball bounced off another guys glove and hit Matt in the "you know where" AND he sliced his leg upon sliding into home. His leg was dripping blood after it happened but these pictures don't really do it justice. So of coarse he was limping a little after the game. Despite all the sacrifice they lost 15 -20. This is a huge improvement for the team considering last weeks score was 16-0. GOOOO Circlepix!

Tuesday, April 17, 2007

DB Optimization Assignment

MRP Database

Complex and Simple Query Division

Below is our division of simple and complex queries. To make things more readable, we have included only the complex queries in this paper (Appendix A). The simple queries are all other queries not included in Appendix A.

In our division, we have mostly looked at typecasting, multiple joins, and nested queries. Queries which included any of these three elements were classified as complex.

Indexes

In general, indexes should be added to optimize the queries in two cases. First, indexes should be added whenever fields are used in the WHERE clause, especially to compute JOINs. Second, indexes should be added when fields are computed outside of the WHERE clause. An example would be SELECT Max(X) where the maximum function works in the select clause. The index on field X should speed things up. We assume that primary keys are indexed automatically, as this happened during our importing of the data into MySQL.

In accordance with the guidelines, the following indices should be added:

  • Activity.campID
  • Double index (order and reverse order) on adjacentcampground.groundID and adjacentcampground.adjacentTo (since this is a linking or associate table)
  • Agentstake.stakeID (this does not seem to be a part of composite primary key)
  • Agentstake.campID
  • Amenity.groundID
  • Announcement.campID
  • Double index (order and reverse order) on assignedto.stakeID and assignedto.campID (since this is a linking or associate table)
  • Campground.campID
  • Camppicture.campID
  • Camprule.catID
  • Camprulecategory.catID
  • Campsite.groundID
  • Made codes.code the PK
  • Direction.campID
  • Facility.campID
  • Made glossary.term the PK
  • Reservation.stake
  • Reservation.ward
  • Reservation.campID
  • Reservationdate.reservedate
  • Reservationdate.reserveID
  • Double index (order and reverse order) on reservationforcampsite.siteID and reservationforcampsite.reserveID (since this is a linking or associate table)
  • Ruletable.catID
  • Unavailabledate.campID
  • Made usertable.userID the PK
  • Double index (order and reverse order) on usercamp.campID and usercamp.userID (since this is a linking or associate table)
  • Made ward.wardID the PK
  • Ward.stakeID

After careful consideration, we selected the following complex queries to optimize:

Query Optimization 1

Before

SELECT announceID, announceTitle, announceText,

CAST(MONTH(annBeginDate) AS varchar) + '/' +

CAST(DAY(annBeginDate) AS varchar) + '/' +

CAST(YEAR(annBeginDate) AS varchar) annBeginDate,

CAST(MONTH(annEndDate) AS varchar) + '/' +

CAST(DAY(annEndDate) AS varchar) + '/' +

CAST(YEAR(annEndDate) AS varchar) annEndDate

FROM Announcement

WHERE campID='cid'

After

SELECT announceID, announceTitle, announceText, annBeginDate, annEndDate

FROM announcement

WHERE camped = 'cid'


This query suffers from several unnecessary complexities. First of all, casting the different parts of a date as a string does not make sense in this case, as it's reassembled in the exact format of the datetime data type. Thus, all the CASTing can be removed to improve query speed. Additionally, an index on Announcement.campID should be added.

This cuts the number of rows returned from several (e.g. 21 for campID=11) to 1. With the casting, we couldn't get the query to run at all despite an hour of research. After the casting was removed, the query took 0.0224 seconds uncached, 0.0003 cached. After the index was added, the query took .0011 seconds uncached and .0003 uncached. Since this table does not change a lot, this optimization actually does not help the database performance tremendously, because after the query is cached, the time to execute it is the same. The optimization does, however, show the principles of proper query design.

Query Optimization 2

Before

SELECT r.reservid, r.reservedBy, r.stake, r.ward, r.groupType,

CAST(r.startDate AS DateTime) AS arrival,

CAST(r.endDate AS DateTime) AS departure,

(SELECT stakeName FROM Stake s

WHERE s.stakeID=r.stake) AS stakeName,

(SELECT wardName FROM Ward w

WHERE CAST(w.wardID AS NVARCHAR)=r.ward) AS wardName

FROM Reservation r

INNER JOIN ReservationForCampsite rfc ON (r.reservid=rfc.reservid)

WHERE ((CAST(r.startDate AS DateTime) >= 'firstDate' AND

CAST(r.startDate AS DateTime) <= 'lastDate') OR

(CAST(r.endDate AS DateTime) >= 'firstDate' AND

CAST(r.endDate AS DateTime) <= 'lastDate')) AND r.campID=cid AND rfc.siteID=sid

ORDER BY departure

After

SELECT r.reservid, r.reservedBy, r.stake, r.ward, r.groupType, r.startDate AS arrival, r.endDate AS departure,

(SELECT stakeName FROM Stake s WHERE s.stakeID = r.stake ) AS stakeName,

(SELECT wardName FROM Ward w WHERE w.wardID = r.ward) AS wardName

FROM Reservation r, ReservationForCampsite rfc

WHERE r.reservid = rfc.reservid AND

((r.startDate >= 'firstDate' AND r.startDate <= 'lastDate') OR (r.endDate >= 'firstDate' AND r.endDate <= 'lastDate'))

AND r.campID=cid AND rfc.siteID=sid

ORDER BY departure


To begin the optimization, we took out the casting and changed all the varchars to datetimes, which is what we think they should be in the first place. Again, we could not get the original query to execute, so we don't know exactly how much time was gained through this change. Having seen optimization in practice before, however, we feel confident this change speeds up the query. We then changed the ward field by first replacing "all" with null and then changing the field type to int to remove the need to cast w.wardid as a varchar (now null became a special value to mean "all").

We indexed reservation.ward, changed reservation.stake to int(11) from varchar, then indexed reservation.stake as well. In the reservationforcampsite, an association (linking) table, we created two indexes for the two fields, rfc.siteID and rfc.reserveID. The indexes contained both fields, each index in different order.

When the query was first run, before the indexing, it took .0392 seconds uncached and .0004 seconds cached. After the indexing the query took .0030 seconds uncached and .0004 seconds cached.

Query Optimization 3

Before

SELECT s.siteID, s.siteName, s.siteCapacity, g.groundName, r.reservID

FROM Campsite s

INNER JOIN Campground g ON (s.groundID=g.groundID)

INNER JOIN Camp c ON (g.campID=c.campID)

LEFT JOIN ReservationForCampsite r ON (r.reservID=" + editMode AND r.siteID=s.siteID)

WHERE c.campID=cid AND c.campID=g.campID AND g.groundID=s.groundID

ORDER BY g.groundName, s.siteName

After

SELECT s.siteID, s.siteName, s.siteCapacity, g.groundName, r.reservID

FROM Campsite s, Campground g, Camp c, ReservationForCampsite r

WHERE c.campID=cid AND c.campID=g.campID AND g.groundID=s.groundID AND r.siteID = s.siteID

ORDER BY g.groundName, s.siteName


Before we started on the optimization, we decided to remove the mysterious 'r.reservID=" + editMode AND' from the FROM clause. This would actually cause the query to break, as the lone double quote would completely throw it off. So, part of the optimizing sometimes involves getting things to work. After this small change, we ran the query, which completed in .0473 seconds uncached and .0003 cached.

The next step involved removing the inner joins from the FROM clause, as the data is joined in the WHERE clause anyway. This improvement alone brought the query execution time to .0024 seconds uncached, and .0003 cached.

Next, we created indexes on campground.campID, campsite.groundID. There were already indexes in the reservationforcampsite table, so we didn't need to create them. All these changes brought down the query execution time to .0016 uncached, and .0003 cached.

Database Schema Changes

The main changes to the schema we recommend are data type changes. The data types in the beginning database are often not ideal for the data they store. In addition to the datatype changes already made for the query optimization, the following datatype changes should be made:

  • Reservationdate.reservdate from varchar to date
  • Unavailabledate.unavailDate from varchar to date
  • Reservation.startDate and endDate from varchar to date

Additionally, the following changes can be made to improve the schema:

  • The tables priority schedule and nextpriorityschedule should be combined, because the data would not lose any integrity, and logically nextpriorityschedule is simply a continuation of priorityschedule. Basically, currently we have horizontal fragmentation of a table with 12 rows, and that's just cruel.

Changes vs. Improvements

Whether the changes to the database are actually beneficial needs to be carefully measured. Specifically, changes made need to increase the efficiency of the database as measured by the time queries take to complete. If a change to a table increases the speed of the queries operating on that table, it is a good indication that the changes are really improvements.

We have sampled the complex queries optimized above and they all yielded improvements. Even though we have not measured every single index and datatype change, we have no reason to suspect they would yield a different result.

Scriptures Database

Index Changes

Given the database schema and the queries, there are several unnecessary indexes. Indexes are meant to speed up queries, and they shouldn't exist if they don't serve that purpose, because indexes also slow down transactions during updates. This is the underlying motivation for the JODDB index changes.

The general recommendation is to remove all indexes on fields which are not in the WHERE clauses or computation functions of queries. For example, the 'speaker' table contains three indexes which are not tied to any WHERE clauses or computation functions of queries. Therefore, each of these indexes could be removed. Additionally, the "LastName_2" is an index of two columns, which can also be used as an index for just one of the columns, making the "LastName" index unnecessary. Without listing every single index, suffice it to say that these two general tests ((1)indexes used in WHERE clauses and (2) index redundancy) should be implemented across the whole database.

Recommendations

The recommendations above to delete unnecessary indexes would actually increase database performance for updates as well as decrease database size, both of which are part of optimization.

Tuesday, April 10, 2007

Google is at it again!

Recently, I heard about two new Google products that I believe will have a big effect on website marketing.

First, we'll start with the google web optimizer. This program lets you create two or more similar landing pages, and then tell Google a percentage of your traffic you want to send to each page. All of this traffic would be coming from your pay-per-clicks. Google then runs these tests until the results become statistically significant and displays all the information in simple reports to tell you the effectiveness of each landing page. All you need to do is create the similar pages and embed a little bit of javascript code on the page. I haven't tried it yet, but knowing Google, I'm sure it is very easy. This would be very easy to implement on Choiceskills.com, even faster than Site Catalyst.

Second, I recently received an email from Google, giving me $400 towards my Google radio ads account. How cool would it be to do a radio spot for portableheavydutyaircleaners.com? Something that would have never been in reach for small, small businesses, is now available to all.

Go Google!

Friday, April 6, 2007

TGIF

This week has been a rough week. First I discovered that I have been blessed with allergies. So I have not had an uninterrupted nights sleep for 6 days! (Humor me here moms, I enjoy my 8 hours of sleep EVERY night. This may be one of the reasons I don't have children yet:) On top of that, my merciless Chemistry teacher scheduled us to have two tests in one week, UGH! So as I go through numerous boxes of tissue, and snarf down Sudafed cocktailed with Claritin, I'm expected to calculate the molecular freezing pt. of cyclohexanol and do a bunch of other useless calculations. Needless to say I survived the week and didn't do too bad on my tests. Now I am ready to relax and enjoy the weekend! WAHOO!

Thursday, April 5, 2007

Character Education Landing Page

Here is the ChoiceSkills.com character education landing page.

Character Education Landing Page

As you can see, it's still a work in progress, but most of the information is there. I put it in all the meta tags and all the links in the main body are H1 tags, changed with CSS, to look like normal links. The links at the bottom are all good keywords that can be linked to various pages in the website.

Tuesday, April 3, 2007

Log4J vs. JMS

To start off I'll give a brief description of each topic.

Log4J - is a java utility that allows a developer to turn logging off and on at runtime. Logging statements can be a resource consuming task and could take up thousands of lines of code. Logging can also slow down performance because of all the extra code. Log4J seeks to minimize the resource usage. Logging can be a very important tool, not only for error messaging but also for debugging purposes. Log4J uses inheritance to help control the message levels that are outputted.

JMS - This topic is a little more complicated than Log4J. As far as I can tell JMS is way for your java program to pass certain types of messages from one server to another.
  • Publish-Subscribe - When a single message needs to be received by multiple applications. (Seems like this could also be accomplished through web services)
  • Point-to-Point - When one application is sending a direct message to another application
  • Request-Reply - When an application sends a message and is waiting for a message in return
For more information on JMS, http://my.execpc.com/~gopalan/jms/jms.html.

JMS seems like more of a way to pass information from one application to another, much like web services. Log4J is used in one application for logging and debugging purposes. There are a lot of different versions of Log4J, including all of the major programming languages. On the other hand JMS is only for use with Java. I'm not exactly sure how Log4J and JMS will fit together. I'll have to wait for class tomorrow.

Choice Skills Banner


We've recently been working on improving a website in my web analytics class. We each made up mock banners that could go up on www.choiceskills.com. Here is my entry, though it wasn't as good as some of the other entries. This site is selling education materials for teaching people through character-based learning. If you know any teachers, pass them along to this website.