<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>grimbeek.com.au</title>
	<atom:link href="http://grimbeek.com.au/PGstats/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://grimbeek.com.au/PGstats</link>
	<description>Peter Grimbeek: Statistical anecdotes</description>
	<lastBuildDate>Sun, 06 Dec 2009 14:50:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Writing a result chapter based on survey data</title>
		<link>http://grimbeek.com.au/PGstats/?p=66</link>
		<comments>http://grimbeek.com.au/PGstats/?p=66#comments</comments>
		<pubDate>Sun, 06 Dec 2009 14:50:20 +0000</pubDate>
		<dc:creator>Andrianq</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://grimbeek.com.au/PGstats/?p=66</guid>
		<description><![CDATA[First, add labels and value labels where missing and adjust decimals to zero in data sheet (i.e., prepare sheet carefully). The descriptive statistics section reports personal characteristics in text . One option is to expand this section by adding an Item and scales subsection. Report each of the sub-scales in turn, either via a table [...]]]></description>
			<content:encoded><![CDATA[<p>First, add labels and value labels where missing and adjust decimals to zero in data sheet (i.e., prepare sheet carefully).</p>
<p>The descriptive statistics section reports personal characteristics in text .<br />
One option is to expand this section by adding an Item and scales subsection.</p>
<p>Report each of the sub-scales in turn, either via a table or a graph of the mean values (labels important here).<br />
The graph makes it easier for the reader to compare items so has some advantages.<br />
Report the items to which participants responded positively vs. those to which they responded negatively.<br />
Also, with each sub-scale, report the Cronbach&#8217;s Alpha statistic .</p>
<p>At end of this section, include a correlation matrix of bivariate correlations for the scales scores.</p>
<p>The following section is entitled Inferential statistics.<br />
In it, outline intention of testing a series of hypotheses, in the main by using linear and stepwise regression (and possibly also hierarchical regression to test moderating influences).</p>
<p>Note that are transforming categorical IVs (personal characteristics such as gender and type of user) into dummy variables to facilitate these regressions.</p>
<p>It might be worth starting the inferential section by listing the hypotheses, together with relevant IVs and DVs.<br />
This organisational strategy should help one to think about which IVs and DVs best test a specific hypothesis.<br />
It would also prevent one from using the same list to test differing hypotheses (not that I&#8217;m suggesting you do).</p>
<p>When obtain a significant outcome, especially one involved a categorical (dummy) or ordinal level IV, then graph the outcomes.</p>
<p>I use stepwise regression because it selects the most powerful IVs more clearly.</p>
<p>It might be worthwhile using SEM procedures to model the more complex relationships between IVs and DVs.</p>
]]></content:encoded>
			<wfw:commentRss>http://grimbeek.com.au/PGstats/?feed=rss2&amp;p=66</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SPSS Graduate Pack copy paste issues</title>
		<link>http://grimbeek.com.au/PGstats/?p=46</link>
		<comments>http://grimbeek.com.au/PGstats/?p=46#comments</comments>
		<pubDate>Mon, 16 Feb 2009 13:46:38 +0000</pubDate>
		<dc:creator>Aardvark</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://grimbeek.com.au/PGstats/?p=46</guid>
		<description><![CDATA[I&#8217;ve recently installed both Windows and Apple versions of SPSS 16 Graduate Pack. While the graduate pack software works, it does so in a curiously limping fashion. That is, the copy and paste functions seem to be crippled &#8211; both in terms of copying and pasting from SPSS to Excel and when copying and pasting to [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve recently installed both Windows and Apple versions of SPSS 16 Graduate Pack.</p>
<p>While the graduate pack software works, it does so in a curiously limping fashion. That is, the copy and paste functions seem to be crippled &#8211; both in terms of copying and pasting from SPSS to Excel and when copying and pasting to Word.</p>
<p>One work around appears to be to adjust workspace to its maximum.</p>
<p>As follows:</p>
<p>set workspace 2097151.</p>
<p>show workspace.</p>
<p>Now copy and paste functions should work more normally.</p>
]]></content:encoded>
			<wfw:commentRss>http://grimbeek.com.au/PGstats/?feed=rss2&amp;p=46</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Starting points for qualitative analysis</title>
		<link>http://grimbeek.com.au/PGstats/?p=36</link>
		<comments>http://grimbeek.com.au/PGstats/?p=36#comments</comments>
		<pubDate>Sun, 21 Sep 2008 02:28:05 +0000</pubDate>
		<dc:creator>Aardvark</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://grimbeek.com.au/PGstats/?p=36</guid>
		<description><![CDATA[I&#8217;m currently evolving brief spiels on manual vs. semi-automated vs. fully automated text analysis. The claim that qualitative data analysis can proceed without the primary step of identifying a set of categories seems downright illogical. This is a first step regardless of whether one is identifying themes and patterns, nodes, or concepts (differing language, same [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m currently evolving brief spiels on manual vs. semi-automated vs. fully automated text analysis.</p>
<p>The claim that qualitative data analysis can proceed without the primary step of identifying a set of categories seems downright illogical. This is a first step regardless of whether one is identifying themes and patterns, nodes, or concepts (differing language, same thing).</p>
<p>Manual text analysis is what qualitative researchers do when they explore the themes and patterns in selected vignettes, usually brief extracts from larger chunks of transcribed conversations, etc.</p>
<p>Semi-automated analyses could be said to include the use of MS Word, Excel, or NVivo software, with the latter qualifying most properly for this category.</p>
<p>With NVivo (<a href="http://www.qsrinternational.com/">QSR</a>), large chunks of text and other qualitative materials (photos, videos, etc) are imported into what is an electronic database. A priori categories (termed nodes) are set up (tree nodes), and one of a number of documents (internal sources) opened for coding purposes.</p>
<p>The researcher highlights words, phrases, or entire paragraphs, and links these to one or more of the tree nodes or creates additional (free) nodes.</p>
<p>One big plus in this software (from my perspective) is the word frequency query option. It generates a thesaurus listing of all available words, including words such as I, am. Selected words can be saved as new nodes or merged into existing nodes, which means one can conduct generic analyses of multiple documents very quickly indeed.</p>
<p>A negative is that my copy of the current version (v.8) quickly seizes up. It doesn&#8217;t seem able to sustain heavy duty collection of words linked to nodes for very long at all. I assume that QSR will fix this.</p>
<p><a href="http://www.leximancer.com/">Leximancer </a>is my preference for fully automated text analysis. I haven&#8217;t discussed it on this blog previously,  but won&#8217;t go into details right now. For the purposes of comparison with NVivo, one thing of interest is the convergence between NVivo&#8217;s word frequency list and Leximancer&#8217;s Ranked concept list. Imagine NVivo&#8217;s list without the less clearly semantically relevant words, and you get the Leximancer list. This is comforting as it suggests these search mechanisms are reliable and valid.</p>
]]></content:encoded>
			<wfw:commentRss>http://grimbeek.com.au/PGstats/?feed=rss2&amp;p=36</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A statistical game: Categorical, ordinal, or equal interval?</title>
		<link>http://grimbeek.com.au/PGstats/?p=10</link>
		<comments>http://grimbeek.com.au/PGstats/?p=10#comments</comments>
		<pubDate>Sat, 12 Jan 2008 15:04:18 +0000</pubDate>
		<dc:creator>Aardvark</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://grimbeek.com.au/PGstats/?p=10</guid>
		<description><![CDATA[The starting point for a lot of my current interest in the range of statistical methods for making sense of the world of numbers came about in 1999 as a result of interactions with two clients (Fiona Bryer and Wendi Beamish) at the Mount Gravatt campus of Griffith University. They&#8217;d asked me to review statistical [...]]]></description>
			<content:encoded><![CDATA[<p>The starting point for a lot of my current interest in the range of statistical methods for making sense of the world of numbers came about in 1999 as a result of interactions with two clients (Fiona Bryer and Wendi Beamish) at the Mount Gravatt campus of Griffith University. They&#8217;d asked me to review statistical analyses of dichotomous(e.g., No/Yes)  and Likert scale responses (e.g., strongly disagree, disagree, unsure, agree, strongly agree) items developed using the Delphi technique (working with groups of participants to the point of consensus about what is important or relevant and what isn&#8217;t).</p>
<p>That dataset has previously been analysed by reporting correlations, means, and standard deviations per item, and by using t-tests to examine group differences relevant to these categorical/ordinal items. That is, the analyses had assumed the items were equal interval (distance between 1 and 2 equivalent to distance between 3 and 4, etc) and met the assumptions required for parametric analyses (e.g., normally distributed), even though they generally did not. One outcome of this failure to take into account the nonparametric measurement properties of the responses was to produce overly frequent and unreliable  reports of statistically significant group differences.</p>
<p>Starting with this insight, we developed methods that involved collapsing across Likert scale responses to produce dichotomous responses (e.g., the percentage in agreement (agree, strongly agree) that could be described very readily (tables, figures), analysed very straightforwardly (e.g., contingency tables), and that generated conservative (fewer) but more reliable reports of statistically significant group differences.<a class="alignleft" href="http://www.amazon.com/Consensus-about-Program-Quality-Beamish/dp/3639028783/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1221961553&amp;sr=1-1" target="_blank"><span class="aligncenter">Wendi Beamish has now published a book documenting these developments.</span></a></p>
<p>As a result of that work, I&#8217;ve developed a broad interest in forms of statistical analysis that take into account the measurement properties of the variables concerned. With that in mind, I was happy to review a book by <a title="Michell (1999) review" rel="attachment wp-att-11" href="http://grimbeek.com.au/PGstats/?attachment_id=11">Michell (1999)</a> that addressed these issues and did so very lucidly (the book not the review). For the same reason, I was also happy later on to review a book by <a title="Bond &amp; Fox (2001, 2007)" rel="attachment wp-att-12" href="http://grimbeek.com.au/PGstats/?attachment_id=12">Bond &amp; Fox (2001, 2007)</a> about Rasch analytic methods (the publisher, Erlbaum, used extracts from this review on its website to sell the book), and more recently to review a book <a title="Categorical data analysis methods" href="http://grimbeek.com.au/PGstats/wp-content/uploads/2008/01/aedpvol24no1_proof2.pdf">on new developments in categorical data analysis methods</a> (e.g., nonparametric factor analysis).</p>
<p>It is for similar reasons that I was happy to come across a modern analogue of the <a title="Correspondence analysis" rel="attachment wp-att-14" href="http://grimbeek.com.au/PGstats/?attachment_id=14">Correspondence analysis</a> (developed by Pierre Bourdieu in the course of his pioneering sociological work), SPSS Optimal Scaling, that utilises nonparametric factor analytic methods to identify clusters of responses within demographic and other nonparametric sets of variables.</p>
<p>Finally, a suspicion that experimental design of the kind practiced in psychology and other laboratory settings is sufficient but not necessary when pursuing trails of cause and effect in data sets has led to a long-term interest in structural equation modelling (SEM). The charm of  SEM is that it provides a way to test hypotheses based on survey and other data collected in semi-natural settings (though not without the usual reservations about the measurement properties of the variables  concerned).</p>
]]></content:encoded>
			<wfw:commentRss>http://grimbeek.com.au/PGstats/?feed=rss2&amp;p=10</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data mining with SAS</title>
		<link>http://grimbeek.com.au/PGstats/?p=9</link>
		<comments>http://grimbeek.com.au/PGstats/?p=9#comments</comments>
		<pubDate>Sat, 29 Dec 2007 13:36:48 +0000</pubDate>
		<dc:creator>Aardvark</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://grimbeek.com.au/PGstats/?p=9</guid>
		<description><![CDATA[I&#8217;m playing around with SAS at moment. It seems similar to an earlier version of SPSS insomuch as it is primarily driven via a command-line interface, though it does include a drop-down menu style graphic user interface (GUI). For example, one can use the GUI to do Principal Components Analysis (PCA) or factor analysis (Maximum [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m playing around with SAS at moment.</p>
<p>It seems similar to an earlier version of SPSS insomuch as it is primarily driven via a command-line interface, though it does include a drop-down menu style graphic user interface (GUI). For example, one can use the GUI to do Principal Components Analysis (PCA) or factor analysis (Maximum likelihood?) but the command line interface is required if you want to rotate the factors. Again, it is possible to generate frequencies as per SPSS but without the SPSS option of copying the output to Word in tabbed or figure format.</p>
<p>What is of interest is Enterprise Miner. I&#8217;ve wanted access to data mining software for almost 10 years now, and this compilation of SAS (9.1.3) includes it along with a swag of other stuff.</p>
<p>As you&#8217;d know, data mining allows one to examine which of a swag of variables most significantly influence a designated target variable. This process is in some ways equivalent to stepwise regression except that the procedure can include an element of model fitting, complete with training and confirmatory subsets from the specified sample.</p>
<p>Of course, the process is not without hiccups. The steps required to do data mining ala Enterprise Miner are many and not all of them seem to function as advertised. But I live in hope.</p>
]]></content:encoded>
			<wfw:commentRss>http://grimbeek.com.au/PGstats/?feed=rss2&amp;p=9</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On getting it right (or wrong)</title>
		<link>http://grimbeek.com.au/PGstats/?p=8</link>
		<comments>http://grimbeek.com.au/PGstats/?p=8#comments</comments>
		<pubDate>Sat, 22 Sep 2007 15:49:46 +0000</pubDate>
		<dc:creator>Aardvark</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://grimbeek.com.au/PGstats/?p=8</guid>
		<description><![CDATA[Originally published February 26th, 2007 A working assumption is that any data analytic work is done without errors. This is a trifle hopeful, and instead Murphy&#8217;s law prevails. That is, anything that can go wrong will go wrong. Checks and balances commonly in place include examining the data diagnostically to ensure that values are not [...]]]></description>
			<content:encoded><![CDATA[<p><small>Originally published February 26th, 2007 <!-- by Aardvark --></small></p>
<p class="entry">A working assumption is that any data analytic work is done without errors. This is a trifle hopeful, and instead Murphy&#8217;s law prevails. That is, anything that can go wrong will go wrong.</p>
<p>Checks and balances commonly in place include examining the data diagnostically to ensure that values are not out of range or drastically non-normally distributed. A rationale for doing the above is that results obtained with dodgy data are not worth reporting and should be avoided. If after doing the above, univariate and multivariate analyses are repeated a couple of times, this helps to ensure that outcomes are reliable (reproducible).</p>
<p>However, occasionally (more than once is too often here), the reported outcomes on subsequent examination turn out to be incorrect.</p>
<p>On one memorable occasion, I failed to notice the many occurrences of zeros as a response option in data used to examine the effect of a specific treatment. The effect of the treatment appeared to be statistically significant, an appearance that dissolved when the zeros were more properly treated as missing data.</p>
<p>On another occasion, AMOS was used to examine the plausibility of a model that incorporated a subset of interrelated variables. After incorporating the outcomes into a journal paper, and at the point of making editorially requested changes, I redid the analysis only to find the outcomes quite different. In fact, two of the variable labels had been swapped with the effect that associations were significant but not as reported.</p>
]]></content:encoded>
			<wfw:commentRss>http://grimbeek.com.au/PGstats/?feed=rss2&amp;p=8</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thoughts about MANOVA</title>
		<link>http://grimbeek.com.au/PGstats/?p=7</link>
		<comments>http://grimbeek.com.au/PGstats/?p=7#comments</comments>
		<pubDate>Sat, 22 Sep 2007 15:49:07 +0000</pubDate>
		<dc:creator>Aardvark</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://grimbeek.com.au/PGstats/?p=7</guid>
		<description><![CDATA[Originally published February 26th, 2007 SPSS MANOVA generates various bits of output, including: Between-Subjects factors This is a descriptive report of numbers per response category for each independent variable included in the analysis. Box test of the equality of covariance matrices In theory (See Tabachnick &#38; Fidell, Using multivariate statistics) this test examines whether variance-covariance [...]]]></description>
			<content:encoded><![CDATA[<p><small>Originally published February 26th, 2007 <!-- by Aardvark --></small></p>
<p class="entry"><span style="font-size: x-small;"><strong>SPSS MANOVA</strong> generates various bits of output, including:</span></p>
<p><strong><span style="font-size: x-small;">Between-Subjects factors</span></strong></p>
<p><span style="font-size: x-small;">This is a descriptive report of numbers per response category for each independent variable included in the analysis.</span></p>
<p><strong><span style="font-size: x-small;">Box test of the equality of covariance matrices</span></strong></p>
<p>In theory (See Tabachnick &amp; Fidell, Using multivariate statistics) this test examines whether variance-covariance matrices within each cell of the design (e.g., male, male &amp; 20-30 yrs) are sampled from the same population variance-covariance matrix. This test is described by Tabachick and Fidell (see above) as notoriously sensitive, and so tends both to report statistically significant results and to be ignored.</p>
<p>To be fair, I have in fact seen this test report statistically non-significant outcomes, especially where cell sizes (N per cell) are equal and sample size is larger (100s -&gt; 1000s).</p>
<p><strong><span style="font-size: x-small;">Multivariate tests</span></strong></p>
<p>The table of multivariate effects provides information about the extent to which specific IVs and combinations of IVs are associated with the pooled DVs.</p>
<p>This test is useful because it lets you know whether a specific IV has a systematic effect across a range, say, of subscales. If not, then it becomes likely that this IV has potentially opposite effects on related subscales, which usually does not make sense from the point of view of more detailed analysis (Assuming these subscales are not negatively correlated).</p>
<p><strong><span style="font-size: x-small;">Levene&#8217;s test of equality of error variances</span></strong></p>
<p>Levene test examines extent to which standard deviation scores (variances) varies from cell to cell of the design for specific DVs. This test probably should be taken seriously because a statistically significant outcome suggests serious discrepancies in cell variance that could lead to unreliable statistical outcomes.</p>
<p>Should Levene test report significant outcomes then one options is to report univariate outcomes on the basis that the parametric test involved (analysis of variance (ANOVA)) is robust to violations of the relevant assumptions of normality. After all, not all variables are normally distributed. However, should there be a shadow of doubt not only about distributional properties but also about the measurement properties of the variable in question (i.e., Is the variable possess equal-interval properties versus is the variable based on ordinal or categorical data that and thus less than interval in its conception).</p>
<p>Scale scores typically are produced by adding item scores, where these item scores represent ordinal responses to Likert scale items. It seems unlikely that the sum of ordinal scores is greater than its nonparametric parts.</p>
<p>Under these conditions, after obtaining a statistically significant Levene test, it might be appropriate to use a nonparametric equivalent to ANOVA such as the Kruskal-Wallis test. This generates a chi-square statistic that evaluates the probability of obtaining a particular difference in mean rankings by chance.</p>
<p><span style="font-size: x-small;"><strong><span style="font-size: x-small;">Tests of between-subject effects</span></strong></span></p>
<p><span style="font-size: x-small;">If it is OK to proceed with parametric testing, then the between-groups tests provide information about whether specific IVs or combinations of IVs are significantly associated with specific DVs. That is, this table reports the results of univariate tests using ANOVA.</span></p>
<p><span style="font-size: x-small;">Typically one reports statistically significant outcomes as follows:</span></p>
<p><span style="font-size: x-small;">F value, degrees of freedom (treatment), degrees of freedom (error), probability less than some cut-off point (usually .05, .01, .001).</span></p>
<p><span style="font-size: x-small;">In text the above might be reported as (F (df trt, df err)=value, p</span></p>
<p>An example might go: (F (1,140)=4.55,p&lt;.05).</p>
<p><span style="font-size: x-small;"><strong><span style="font-size: x-small;">Estimated marginal means</span></strong></span></p>
<p><span style="font-size: x-small;">The first mean to be reported is the grand mean. This, like the intercept, usually is ignored in relation to human data.</span></p>
<p><span style="font-size: x-small;">Estimates for specific IVs and specific DVs provide useful information that includes the mean, standard error (Standard deviation divided by square root of sample size), and lower versus upper bounds of 95% confidence interval.</span></p>
<p><span style="font-size: x-small;">The combination of mean and standard error provide enough information to make judgments about the likelihood of particular mean scores being significan<span style="font-size: x-small;">tly different.</span></span></p>
<p><span style="font-size: x-small;">For example, a graph containing these two bits of information summarises in visual format the signficance of the gap between means. If a standard error based on one mean does not encompass the other mean, then the difference is most likely significant.</span></p>
<p><span style="font-size: x-small;">The confidence interval (CI) provides equivalent information. That is, if the lower and upper bounds for a CI associated with one mean do not overlap the CI for another mean score, then the difference between these two mean scores is likely to be statistically significant.</span></p>
<p><span style="font-size: x-small;"><strong><span style="font-size: x-small;">Pairwise comparisons</span></strong></span></p>
<p><span style="font-size: x-small;">This table reports the mean difference for a specific IV (e.g., male vs. female scores) related to a specific DV.</span></p>
<p><span style="font-size: x-small;">This table parallels information provided by the estimates table and adds to it an estimate of the statistical significance of the gap between these means. This table can be useful for IVs with multiple levels (e.g., Where participants are collapsed into multiple age groups such as young, young-old, old that correspond to specific cut-offs). One might discover that the scores on a specific DV are significantly different for young participants versus young-old or old participants but young-old versus old participant scores are not statistically distinct).</span></p>
<p><span style="font-size: x-small;"><strong>General comments</strong></span></p>
<p><span style="font-size: x-small;"><span style="font-size: x-small;">The output considered above is for between-groups MANOVA with two or more dependent variables (DVs) and one or more independent variables (IVs).</span></span></p>
<p><span style="font-size: x-small;"><span style="font-size: x-small;">Other options included the repeated measures MANOVA and mixed model ANOVA (repeated measures plus between-groups effects)</span></span></p>
<p><span style="font-size: x-small;"><span style="font-size: x-small;">It is tempting to use repeated measures MANOVA to analyse multivariate DVs consisting of two or more subscales. Doing so generates a multivariate effect for scale, which may or may not be significant. If it is, then what you have learnt is that the average score per scale differ significantly. The issue is whether this discrepancy really matters.</span></span></p>
<p><span style="font-size: x-small;"><span style="font-size: x-small;">I would suggest that the only time repeated measures makes sense practically speaking is when the scores entered reflect temporal differences of interest. Such differences might include a specific measure collected, say, prior to an intervention versus afterwards, where one might expect the average score to increase or decrease significantly all going well.<br />
</span><br />
<span style="font-size: x-small;">Where multivariate/univariate testing identifies statistically significant two-way or three-way interactions, one way to examine these is to split dataset by one or more components of the interaction (you might need to experiment here with differing components).</span></span></p>
<p><span style="font-size: x-small;"><span style="font-size: x-small;">The aim here is to hold one component constant (by splitting it per level) and then look for statistically significant main effect involving the other component.</span></span></p>
<p><span style="font-size: x-small;"><span style="font-size: x-small;">The idea behind interactions being that one of the components has distinct effects at separate levels of the other component (thus the notion of an interaction). For example, Spending one or more years abroad influences international knowledge for women but not for men).</span></span></p>
<p><span style="font-size: x-small;"><span style="font-size: x-small;">This kind of analysis can also be done by using SPSS syntax (not available via GUI interface) but it can be tricky.</span> </span></p>
]]></content:encoded>
			<wfw:commentRss>http://grimbeek.com.au/PGstats/?feed=rss2&amp;p=7</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analysing survey data</title>
		<link>http://grimbeek.com.au/PGstats/?p=6</link>
		<comments>http://grimbeek.com.au/PGstats/?p=6#comments</comments>
		<pubDate>Sat, 22 Sep 2007 15:48:17 +0000</pubDate>
		<dc:creator>Aardvark</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://grimbeek.com.au/PGstats/?p=6</guid>
		<description><![CDATA[Originally published January 18th, 2007 Surveys can be used to collect information about the respondent&#8217;s demographic profile (life to this point) as well as information about his/her attitudes, beliefs, and knowledge. What you don&#8217;t get is information about what the respondent does as opposed to what they say they do, have done, or will do. [...]]]></description>
			<content:encoded><![CDATA[<p><small>Originally published January 18th, 2007 <!-- by peteg --></small></p>
<p class="entry">Surveys can be used to collect information about the respondent&#8217;s demographic profile (life to this point) as well as information about his/her attitudes, beliefs, and knowledge.</p>
<p>What you don&#8217;t get is information about what the respondent does as opposed to what they say they do, have done, or will do.</p>
<p>I analyse and report on this information as follows:</p>
<p>1. I report the demographic profile for the sample (gender, age, educational qualifications, etc), item by item, noting which items vary and which are relatively constant (i.e., 90% espousing the same response).</p>
<p>2. I report significant associations between these variables via cross-tabulation, correlation coefficient, and Optimal scaling.</p>
<p>Optimal scaling provides a 2-D spatial representation that is especially proficient at capturing complexity (For more information see relevant paper on main website).</p>
<p>3. I report responses to individual variables measuring attitudes, beliefs, knowledge.</p>
<p>Typically, these variables employ Likert scale response categories (e.g., Strongly Disagree -&gt; Strongly Agree).</p>
<p>If Likert, then ordinal, and if so, then I prefer to report, say, the percent strongly agreeing (often using a graph).</p>
<p>4. I examine the extent to which variables with common Likert response categories form scales and subscales.</p>
<p>I do so by using exploratory factor analysis (EFA) or confirmatory factor analysis (CFA: I&#8217;ll provide more info on this another time), depending upon sample size (bigger is better) and the credibility of item-scale relationships (if you&#8217;ve just written the items and believe them to form a scale then I&#8217;d prefer to treat this as a hypothesis to be tested).</p>
<p>5. I generate scale scores based on the previous step.</p>
<p>Researchers tend to prefer average scores, and I will compute these where appropriate but I also like to use EFA to save factor scores. This has the advantage of taking loadings into account, using all items, and also generating a scale score with z-score qualities (mean=zero, standard deviation=one).</p>
<p>6. I use ANOVA, MANOVA, Regression, or structural equation modelling (SEM) procedures to examine the associations between conceptually or empirically relevant aspects of the demographic profile and outcome scores (usually sub-scale or scale scores).</p>
<p>More about all of this at another time.</p>
]]></content:encoded>
			<wfw:commentRss>http://grimbeek.com.au/PGstats/?feed=rss2&amp;p=6</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Exploratory factor analysis</title>
		<link>http://grimbeek.com.au/PGstats/?p=5</link>
		<comments>http://grimbeek.com.au/PGstats/?p=5#comments</comments>
		<pubDate>Sat, 22 Sep 2007 15:47:22 +0000</pubDate>
		<dc:creator>Aardvark</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://grimbeek.com.au/PGstats/?p=5</guid>
		<description><![CDATA[Originally published January 5th, 2007 In conversation yesterday, I discussed the essentials of reporting for exploratory factor analysis. These include of course the KMG estimate of factorability (equivalent to Cronbachâ€™s alpha), Bartlett&#8217;s test of symmetry (preferably significant), cumulative variance explained (if factors rotated orthogonally -e.g., Varimax), the type of factor extraction (principal components &#8211; not [...]]]></description>
			<content:encoded><![CDATA[<p><small>Originally published January 5th, 2007 <!-- by peteg --></small></p>
<p class="entry">In conversation yesterday, I discussed the essentials of reporting for exploratory factor analysis. These include of course the KMG estimate of factorability (equivalent to Cronbachâ€™s alpha), Bartlett&#8217;s test of symmetry (preferably significant), cumulative variance explained (if factors rotated orthogonally -e.g., Varimax), the type of factor extraction (principal components &#8211; not highly regarded but robust; principal axis factoring &#8211; takes error variance into account; maximum likelihood-default for structural equation modeling and handy if wish to align exploratory and confirmatory factor analyses).</p>
<p>Then there are the number of factors, the simplicity of the factor structure, and the intelligibility of item clusters.</p>
<p>One could also report the estimates of communalities (preferably 0.5 or better).</p>
<p>What doesnâ€™t always get a mention is the need to screen items prior to analysis with the aim of excluding highly skewed items (90% or better of responses falling in one response category). Another thing to consider is the collinearity of items.<br />
While orthogonal rotation methods produce independent components, and this is preferable because the cumulative variance per component (factor) is additive, oblique or non-orthogonal (e.g., Oblimin) rotations make sense where responses across factors are likely to be correlated (i.e., the individual responding to these items does not treat them as expressing clearly distinct factors).</p>
]]></content:encoded>
			<wfw:commentRss>http://grimbeek.com.au/PGstats/?feed=rss2&amp;p=5</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Peter Grimbeek&#8217;s blog</title>
		<link>http://grimbeek.com.au/PGstats/?p=4</link>
		<comments>http://grimbeek.com.au/PGstats/?p=4#comments</comments>
		<pubDate>Sat, 22 Sep 2007 15:43:30 +0000</pubDate>
		<dc:creator>Aardvark</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://grimbeek.com.au/PGstats/?p=4</guid>
		<description><![CDATA[I&#8217;m very pleased that I&#8217;ve finally got a blog up and running within my own website. Watch this space for ruminations about statistics, research methods, and the world in general. I initially set up this site so that others might publish blogs or comment on mine but I&#8217;ve now turned off the comments option, and [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m very pleased that I&#8217;ve finally got a blog up and running within my own website.</p>
<p>Watch this space for ruminations about statistics, research methods, and the world in general.</p>
<p>I initially set up this site so that others might publish blogs or comment on mine but I&#8217;ve now turned off the comments option, and I&#8217;ve yet to see anyone take advantage of the blogging option.</p>
<p>Originally published 28 December 2006</p>
]]></content:encoded>
			<wfw:commentRss>http://grimbeek.com.au/PGstats/?feed=rss2&amp;p=4</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
