New Wednesday, new Just KNIME It! challenge!
This week we’ll take you back to high school to tackle a challenge on the cost of college education. What are the most important factors affecting cost in this scenario?
Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason2-16.
Need help with tags? To add tag JKISeason2-16 to your workflow, go to the description panel on the right in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. Let us know if you have any problems!
Hello everyone, this is my solution.
- Using tuition fees to measure the quality of education( and reputation of the school …), I chose the 10 states with the most expensive tuition fees.
- I rank the above results based on the increase in tuition fees to measure returns.The value in 2013 represents historical input costs, the value in 2020 represents potential value, and the difference between the two represents potential returns.
Since there are no specific universities in the data, I refer to the state name as the university cluster center. The state is a virtual university.
This is my solution.
The tuition average for Private is the highest, with a large difference in amounts.
Public In-State tuition is inexpensive.
Room/Board fees are not significantly different among Private, Public In-State, and Public Out-of-State.
Fees have been increasing year by year, especially the rate of increase in private tuition.
Vermont fees are the highest, and the rate of increase is greatest in Lowa.
Fees tend to be higher in coastal areas and lower in inland areas.
Here’s my solution. Read description for various features.
Hello everyone, This is my supplementary submission, Ver1 did not answer the question of feature importance.
1、I used the global feature importance component to observe how features affect prices. Since this component seems to only support classification, I have Discretization the target variable.
2、The two methods in the figure show that the most important feature is “state”, the least important is “year”, and the other features are relatively important.
Here is version 1. I had a really great time building this tonight, but it is clearly an IKEA version because in the end it doesn’t really address the challenge as described. Nevertheless, I thought it would be fun to show this first version.
Here are the highlights:
1. The Data App. Data is visualized with a bar graph categorized by state. Single selection widgets allow you to slice and dice the data by filtering different columns. Another version could have used multiple selections, but the way this App is intended to be used, I thought single selection made more sense. Using this approach, it could be seen that “type” and “state” were a major drivers of fees and tuition cost.
3. The JS Code. I “wrote” the code for for the plotly view using KNIME nodes. I am sure many of you know how to code this in your prefered language, but frankly i find using KNIME nodes to construct code to be super fun. You can create the code for each bar (categorized by state) very easily with the Java Snippet (Simple) node:
and then concatenate all entries into the final data variable the plotly graph needs. Doing things this way means only six lines of code are needed in the plotly view:
… but i did not address the challenge correctly so i will come back with a second attempt this weekend…
hope you like it!
Here’s my solution.
Grouped by state, type and length, the averages of tuition fee, room/board and total cost were calculated for the entire period. The Z-scores were also calculated and represented in a scatter plot and bar chart. Positive Z scores are above average, negative, below average. Thanks.
Here is my solution. I tried to represent both aggregated prices split into categories, as well as provided information for tuition prices by type and state over the years. In my opinion is it not possible to estimate let’s call it return of investment (ROI), since there is no additional information about the quality of education (or something similar) in the data set. As well as it is not possible to understand the factors influence the costs, since no lack of data. And I strongly believe the more expensive the better is definitely not the case for the education.
This way this dashboard might only be helpful to compare the prices between states and education type.
Just one random note I found: for some reason prices for 2-year public out-of-state in Colorado were higher that prices for 4-year in-state college. Perhaps there were higher demand for this type, it is hard to tell without further knowledge:
My take on challenge 16 … Thought not much of cost modelling done in the attempt as mostly users want a selection and u have broad open ended query.
Aesthetics are really appealing.
@AnilKS Thank you very much!!
I took another attempt at this one and tried to build more of a comparison feature into the Data App… along the way i again got lost and dont think i answered the challenge correctly. Nonetheless, i want to share what i learned in the event it could be helpful to anyone else here.
1. The Data App. Similar look and feel, except all expenses plotted, including calculated total, on a time course. Multiple selction widget is used to select multiple states for comparison. I think this was a great way for people to debate different regions for school.
2. The JS Code. Again, the code is constructed in KNIME. I looped over each state selected in the multiple selection widget to construct the 6 arrays per state. Then the code is built as before using the Java Snippet (Simple) node. This time i directly coded the arrays for the x and y data (compared to solution 1, where i read the data in to the generic java script node using JS and the data input port).
4. Error Handling. Its not a big deal, but for the moments where no state is selected and the re-execution is triggered, the original version of the data app throws an error that would be annoying on KNIME server. i handle simple cases like this, which generate an empty table, with an Empty Table Switch node. A message about having no content to display is given, and the error is avoided.
Hope you find it helpful.
Here is my solution this week:
I decided to start by filtering the data and keeping only the most recent year as this is most relevant to know the possible cost for this year. I also decided to calculate the Expense Values per year, so that a mean value could be calculated using both the 2-year and 4-year data.
In the interactive component, the user can select the Type of University using the -Single Selection Widget- node:
This filters the data, which is then displayed for both the Fees/Tuition and the Room/Board using separate -Choropleth Map- components:
Also as a simple Bar Chart:
I have also displayed unfiltered data using the -Conditional Box Plot- node, showing the Fees/Tuition for each type of University and each length of study. I have also repeated this for the Room/Board:
On average, private Universities are more expensive than public, but the states with the cheapest tuition fees are not necessarily the cheapest to live in and vice versa. For example, California has a high mean living cost but low tuition fees if going to a Public In-State University. Therefore, both choropleth maps should be reviewed when making a decision.
You can find my workflow on the hub here:
Here is mine:
My idea is different, so I believe it’s worth sharing.
This question is intriguing. Despite being classified as an easy challenge, it has proven to be the most time-consuming amongst my recent tasks. The American university education system differs significantly from China’s, necessitating a portion of my time dedicated to understanding their approach. A notable deviation lies within the cost framework; the same university applies differential fees for in-state and out-of-state students. This is an inevitable consequence of each state’s fiscal autonomy.
Consequently, 2-year public schools impose higher fees on out-of-state students than their in-state counterparts. Hence, a student considering a two-year program primarily for employment prospects should prioritize in-state colleges. However, if the student views the two-year education as a stepping stone towards further learning, a favorable option would be a two-year college, offering the possibility of eventually transitioning to a four-year university.
In simpler terms, selecting a two-year college within your state is a wise choice for those looking to graduate swiftly and secure employment.
The fee structure for four-year universities differs too. Tuition for public schools varies between in-state and out-of-state students, whereas private schools uphold a uniform cost. Factoring in the considerable impact offered scholarships can have on the overall cost, I employed a single total cost in my workflow to filter different states.
Key observations were made:
- Tuition fees, unsurprisingly, ascend annually.
- The chart on the left illustrates each state’s fees for its in-state students. California is the most affordable at $1270, with New Hampshire as the priciest at $7130.
- The subsequent graph highlights the disparity in fees for out-of-state and in-state students; a higher value correlates with the state’s lack of receptiveness to students from other states. Tennessee, with the difference standing at $12,558, essentially screams, “two-year students, stay away from our school!”
- Below, a general filter of states according to total cost is provided.
Why is there no further analysis? On a personal note, I believe that its value is limited. Choosing a school should involve a comprehensive review inclusive of the student’s preferences, family circumstances, and the school’s reputation academically and professionally. Tuition is merely one chapter of an extensive narrative.
As always on Tuesdays, here’s our solution to last week’s challenge!
This week our solution is very beginner-friendly, not even using data apps. We rely only on very simple plot types and models that lead to insights that are still powerful!
Thanks for the terrific visualizations and neat data apps and see you tomorrow for a new challenge!
Hello, a lot late, here is my solution:
It is necessary normalize the US States, due to some data is not showed