Introduction to Predictive Analytics with IBM SPSS – Part I

Most folks who know about SPSS are aware of it because they took a Statistics class in college and used IBM SPSS Statistics. It is a favorite among professors for teaching due to its easy to learn interface and the fact that the user doesn’t need to learn any coding language.

What’s not widely known is that there’s MUCH MORE than IBM SPSS Statistics in the brand family!

Because there is a lot to cover, I’m going to split this into “Beginner” and “Advanced” – this is aligned with the analytical maturity blog published earlier, so check where you are before you dive in.

The items listed in this “Beginner” post are more geared towards folks with little to no experience or exposure – maybe working in spreadsheets but interested in “getting predictive”.

IBM Analytic Answers: This new little gem is about as easy as predictive analytics gets. There are several very specific blueprints that IBM has developed for very specific needs: Insurance Renewals, Donor Contribution Growth, Prioritized Collections, Retail & Offer Targeting, Student Retention, and Telco Churn prevention. The tag line for this gem is “if you know how to upload a picture to Facebook – you can do predictive analytics!” IBM hosts this offering in a SaaS model – you just log into a webpage and upload a CSV file with variables pre-specified by IBM, and click submit. About a minute later, a file is returned to you with a prediction, a confidence interval, and an action. IBM won’t accept any personally identifiable information so your data is very secure, and it will take them about 2 weeks to build the model for you once you’ve signed up. When you submit a file, it runs against a Modeler (see below) instance in the cloud, and through a Modeler stream that IBM has created specifically for your data. Lodestar (as an IBM Business Partner) can offer services adjacent to this offering and can prep and submit the data for you, as well as provide results in a framework that will snap-fit into your business (i.e. Cognos Dashboard, etc). Pricing will vary depending on which blueprint(s) you need, any additional data you’d like to run, and how many cases you want to run against the model per month. This is a VERY cost effective launch for an organization that is interested in “getting predictive” but doesn’t know where to start.

IBM SPSS Data Collection: This family is the soup to nuts survey tool offered by IBM. Here you can create a survey and deploy it in multiple modes; paper, web, phone, mobile, even manual data entry. It also has a reporting feature available to keep you up to date with your progress. There is a Text Analytics component as well. This allows you to put those open ended responses to work by providing valuable sentiment analysis and allowing you to make that qualitative data into quantitative data to boost the accuracy of any models you might create with the data. When speaking about licensing, it is modular, so you don’t need to buy any capabilities you don’t require and can create a snap-to-fit survey solution.

IBM SPSS Statistics: This is the old familiar spreadsheet-looking package you might remember from college. The licensing methodology is modular, so there is the Base and fourteen modules to choose from. IBM was kind enough to analyze SPSS Statistics customers’ buying habits and has created discounted bundles that align to various common analysts’ needs: Base, Standard, Professional and Premium. There are a few stand-alone products in this family: Sample Power and Viz Designer. IBM SPSS Statistics is an excellent way to avoid common errors found in spreadsheets and has many capabilities for data cleansing, organizing, and modeling. In the most recent releases, IBM has added Monte Carlo simulation.  It furnishes the decision maker with a range of possible outcomes, along with how probable that outcome is (confidence score). Statistics is friendly with open source R algorithms.  They can even be stored in the drop down menu, so any algos that don’t come stock can be added at any time. If you’re a Statistics customer already and notice that your jobs are running slow (usually due to a very wide data set, or just very large – peta bytes or terabytes) there is a server component that can be added. This component pushes the “crunching” to the server for additional power, and when the job is finished, the results are sent back to the client. We can help you design a statistical package that meets your needs, so just let us know.

IBM SPSS Modeler: This data mining workbench is the world class gold standard. While Statistics is great for proving a hypothesis, Modeler is more focused on hypothesis generation and allowing a user to uncover complex and hidden patterns and trends in large data sets. To be clear, this workbench is not a “black box” technique as many claim data mining to be – and while it’s the easiest data mining software on the market, one should most certainly invest in learning how to use it and why someone would choose one course of action over another. The workbench, unlike Statistics, begins as a blank canvas.  The user pulls in icons from the tray at the bottom and connects them into a stream or workflow. Modeler is data agnostic and friendly with any ODBC compliant source. Some capabilities included in modeler are data cleansing, data merging, auto modeling, auto clustering, social analytics, entity resolution, text analytics and much, much more. Modeler is available in two “flavors”: Professional (quantitative), and Premium (qualitative, entity resolution, social analytics, etc.). Like Statistics, if you’re a Modeler user already and have jobs that are taking too long to crunch there is a server component available. This is a good idea if you have very wide data sets, or are trying to crunch petabytes or terabytes (or more) of data.

Again – these are all good starting places. Lodestar can help you identify which, if any, would be useful for your organization. Please contact us to get started on the path to Predictive!

You can now see Part 2 of our SPSS Product Family Overview HERE.

X