A number of factors such as age, mileage and engine size all affect the second hand price of car but the factor that I feel affects the price most is mileage the car has done. I believe this to be true because as a car travels more and more miles the engine and the other parts of the car have lot of strain put on them and so begin to wear, which then makes the car less and less reliable. Therefore this is apparent in the second price of the car.The price spread of the small sized cars will be the smallest. This is because the engine size range of the car small sized cars is quite small and as the small sized cars to not depreciate fast. The large sized car will have the largest price spread, as the engine size range is the biggest and also because luxury cars that are quite expensive tend to have large engines sizes. That is why the price of the large sized cars will be greatest.Small sized cars depreciate a lot slower in price than large engine sized cars. I believe this because small sized cars have a fairly low price any way and so the price of those cars won’t drop as dramatically as others. Also small sized cars have a bigger demand than large cars as large cars are more expensive to maintain and run.IntroductionIn this investigation I will collect information on a sample of cars from a used car database and then present this data many ways such as scatter graphs and cumulative frequency graphs. I will do this to find out what is the main factor that factors the second hand price of a car. Factors that can affect the second hand price of a car are as follows:AgeA car that is five years old will cost less then the same car which is less than a year old as the five year old car has had more wear and tear so will not be as reliable as it was before, so this will lower its price on sale.MakeA prestigious make car will cost more than a car that is not so prestigious, even if they have the similar specifications as the prestigious car is very much sort and signifies the owner’s wealth and status.ModelA newer model car will cost more than older model car that has gone out of production as the newer model is more safe and is packed with the latest technology where as the older model will have out dated safety features and technology so would not be in as much demand so the older model will have to sell for less.OptionsThe options put on a car determine the price of it, such as a car with low-profile Alloy wheels, Satellite navigation, Upgraded Audio System etc, will sell for a higher price compared to the same car but with standard options as the car with all the upgraded features is more sort after and more impressive than the basic car.MileageA car with a high mileage will sell for a lower price compared to a similar car with a low mileage because a high mileage means more wear on the engine which will reduce the performance and reliability of the car.Engine sizeThe engine size of a second hand car can affect it price. As a small engine car would have less power and do more mile to the gallon than a larger engine car with more power but less miles to the gallon.ColourThe colour of a car could affect its second hand price, for example a car that is bright pink in colour will appeal to less people compared to a more moderate coloured silver car.In my investigation I will be looking at how the Mileage, Age and Engine size of a second hand car affect its price. I have chosen these factors to investigation as I think they are the most relevant to the price of a car and the most important things to consider when purchasing a second hand car.Obtaining a sample of carsFrom the database of used cars I will collect a sample of 50 cars. I will only use 50 cars in this investigation as I feel that this is enough to get a good outcome from. If the sample I take is too big it will be too had to work with and if the sample is to small then I won’t get I reliable outcome.First I will have spilt the database into groups of small, medium and large sized cars. I will do this so that I can get a sample of cars that covers the range of car sizes and is not concentrated in one particular size of cars. In order to get a reliable sample of cars I will take a stratified sample of 50 cars. This means that the ratio of the different sized cars in the sample of 50 cars will have to be as in the database.To insure that the sample I take is random and fair, I will first give every car in the database a unique number to identify each by, then I will use the random button on a calculator to pick out the cars in will use in my sample.What is meant by small, medium, and large size car?The cars in the used car database are split into groups by the size of their engines. Small cars having an engine size of 1500cc or lower. Medium cars having an engine size that is between 1800cc and 1500cc. Large cars having engines greater than 1800cc.Presenting and analysing the sample of carsFrom the sample of cars I hope to test the hypothesis that have been stated previously and see if they are correct. To do this I have to interpret and analyse the sample of cars in many different ways. I will put the cars ages against the price, mileage against price and engine size against the price.Using the data collected from the sample of cars I will plot scatter diagrams for, age against price, mileage against price and engine size against price. I will then draw a line of best fit on each scatter diagram in order to able to estimate the prices of others cars, as the line of best fit will show the trend in the second hand price of the cars against each of the factors. Also the line of best fit on the scatter graphs will show which size of car depreciates the fastest.From the scatter diagrams I hope to see that the factors of mileage and age have a negative correlation with the second hand price of the cars. As these factors increases, the second hand price of cars should decrease. This should be apparent in the scatter diagrams of these two factors against the second hand price of the cars. In contrast to this, the scatter diagram with engine size against the second price I believe should show a positive correlation, in other words as the engine size of a car increases so should the price.After using the scatter diagrams, I will use Spearman’s rank correlation coefficient to present the data. I will be using spearman’s because it will show how strong the correlations are as it will measure it on a scale. The data will be measured on a scale from -1 to +1. -1 being perfect (100%) negative correlation, +1 being perfect (100%) positive correlation and 0 being no correlation. Using spearman’s rank correlation coefficient will man it will be easy see which factor has a best agreement with the price of second hand car.I will use cumulative frequency graphs and box plots to analyse the prices of the different size cars groups in the sample of cars. I will use it to see how big the inter-quartile range of each group is and also where the median fits in the cumulative frequency graphs. From looking at how big the inter-quartile range and bow plots are of each size of car I will be able to tell how big the spread in price is.Hopefully from using the methods mentioned previously about how I am going to handle the data I will be able to make a conclusion and find out the main factor that affects the price of a second hand car. Also I will be able to find out which car size has the biggest price spread and depreciation.Sample of carsIn order to carry out this investigation, I will need to collect a sample of 50. the reasons for collecting a sample of 50 cars I have already explained previously. So now I am going to actually collect the sample of 50 cars.First, as I want to collect a stratified sample of 50 cars, the number of different size cars, small, medium and large cars has to be proportionally representative of the whole population of cars in the used car database from I am collecting my sample of cars.There are a total of 251 cars in the database.69 Small cars90 medium cars92 Large carsTotal number of cars = 251So the ratio is,69:89:92Now to find how this ratio of cars that should be in the sample of 50 cars,Small cars:(50 / 250) x 69 = 13.8Medium cars:(50 / 250) x 89 = 17.8Large cars:(50 / 250) x 92 = 18.4In the sample of 50 cars there need to be:14 Small cars18 Medium cars18 Large carsNow that I now the proportions of the different sized cars that are meant to be in the sample I can go on to select the cars from the database. I am going to do this buy giving every car in the database a number. Then I am going to use the random button to pick out the right amount of cars from each category that I need in my sample. Then after doing this I will the car into a table.Below is the sample of 50 cars that I selected.SAMPLE IDENTITY NO.ENGINE TYPEENGINE SIZE (cc)MILEAGEAGEPRICE1SMALL110040000529952SMALL110027000263993SMALL120046000524004SMALL120017395149995SMALL12002760173996SMALL130019880244997SMALL130051000534958SMALL140040087999SMALL140035480799910SMALL1400124702719911SMALL140095401929912SMALL1400196903619913SMALL15008000011200014SMALL1595980007349515MEDIUM1600190001579916MEDIUM1600417604469917MEDIUM1600693011019918MEDIUM1600179002609919MEDIUM1600249003629920MEDIUM16001001149921MEDIUM1700492605699922MEDIUM1700220702599923MEDIUM1799365345699924MEDIUM1800186202919925MEDIUM1800172003999926MEDIUM1800495894549927MEDIUM1800494011739928MEDIUM1800292001489929MEDIUM1800677803869930MEDIUM1800800007500031MEDIUM1800493011329932MEDIUM1800197701859933LARGE19001732021019934LARGE19001819011559935LARGE1900650004829536LARGE19001416001469937LARGE2000232942899938LARGE20001313011069939LARGE2000255204569940LARGE20002742011819941LARGE20001287911129942LARGE20001434521499943LARGE2000234903699944LARGE22002305021969945LARGE22001112021299946LARGE2300927911299947LARGE25003700021395048LARGE250030002699949LARGE2800534404899950LARGE390025000418000Scatter graphs to show depreciationBelow are three scatter graphs that I have plotted to show which size of car depreciates the fastest and which size of car hold it is value longest.On these graph I also plotted lines of best fit as through finding the equation of the line I can then figure the depreciation.Scatter graphsFrom the first three scatter graphs I was able to analyse the data from the sample of cars. All three factors Engine size, Mileage and Age were put against the second hand price. I did in order to find out how each factor affects the second hand price of a car. Below are the findings from the scatter graphs.What do the Scatter graphs illustrate?Most basically the scatter graphs show that as the Engine size of cars increase so too does the second hand price of the cars. Also when the mileage and age of cars increase so too does the second hand price of cars.Scatter Graph 1Scatter graph 1 on page 7 shows that the engine size has a positive correlation with the second hand price of a car. This means that as the engine size increases so too does the price. Looking at the points on the scatter graph, it shows that quite clear that this is true. As the scatter graph also shows that some cars with the same engine size have different prices it means that there are other factors that too affect the price. Even though other factors do also affect the second hand price of a car the scatter graph clearly demonstrates that engine size is a major factor. The mean point is in the middle of where most of the points are concentrated and alongside the line of best fit on the graph. Through this the mean point suggest that the price of cars with different engine sized isn’t that far spread and also this is suggested the scatter graph because most of the points are concentrated quite near each other. Using the line of best fit that is also on the scatter graph I can use it to estimate the value of other cars with different engine sizes. Estimating the value of a car that has an engine size that would fit in where most of the sample of cars are concentrated on scatter graph would give a reliable estimate value of the car because there is already data on the scatter graph that would suggest this value. Where if the engine size of car was more towards where there aren’t many cars on the graph then the estimate value of the car wouldn’t be that accurate as there isn’t enough data on the scatter graph of similar cars. For example estimating the value of a car that has the engine size 1500cc, which from the line of best fit shows that the price could be around ï¿½8,500 would be accurate.Scatter graph 2Scatter graph 2 on page 8 shows there is a negative correlation between the second hand price and mileage of car. The points of data from the sample of cars on the scatter graph show that there is agreement between the data the second hand price is more wide spread as the points are fairly loose and spread. Also the scatter graph suggest that where the mileage is fairly small the value of the fall in price of car is quite steep, but the mile age gets higher the point of data start to be even more spread and varied showing that the fall in price slows. From looking at the mean point on the scatter graph it can be seen that it is right next to the line of best, where also most of the points of data is situated. This means that the line of best is precise and reliable.Also as I have mentioned in the analysis of scatter graph 1, the line of best fit can be used to estimate the value of other cars.Scatter graph 3Scatter graph 3 on page 9 shows that thee is negative correlation between the second hand price and age on a car, which means that as the car ages the value of the car begins to fall. The line of best fit shows that there is a similar trend towards the price in this scatter graph as well as scatter graph 2. the scatter graph shows there are some with the same age that have different second hand values, which would mean that there are also other factors acting upon the value of the cars. From looking at the scatter graph it can be seen that the more newer a car is the higher it’s price is, but there are also some outliers on the scatter graph which still hold their value with age. This would suggest that these cars are more prestigious upmarket cars such as a Bentley or a limited edition car where as they age the value becomes higher due to their rarity. The mean point that I also plotted on the scatter graph shows the line of best fits is accurate as the mean point is next to it. The line of best shows that the value of a car will go into the minuses at the age of 10, but this can’t be true because there is a point on the scatter graph that shows that there is a car which is 11 years old and has a value around ï¿½2000. So instead of having a line of best fit there should be a curve of best fit which would tail off towards the end.DepreciationI have also plotted three other scatter graphs in order to look at depreciation of the different sizes of cars. Through constructing a line of best fit on each scatter graph I am able to use them to calculate the depreciation. To use the line of best fit to calculate the deprecation you need to find the equation of the line.So first of all the gradient of the lines are needed,Gradient = Vertical heightDistance acrossSpearman’s Rank correlation coefficientAfter using scatter graphs to analyse the correlation between each factor and priceI want to use Spearman’s Rank correlation coefficient to get a more accurate reading on the correlations.Using Spearman’s Rank correlation coefficientAfter ranking the factors I can put each factor against the second hand price by using Spearman’s Rank correlation coefficient formula which is,rs = 1 – 6?d2n(n2-1)rs – is the measure of the agreement between the two fieldsd – is the difference between corresponding ranksn – is the number of data pairs.In order to carry out Spearman’s Rank correlation I will need to rank the data to find the difference between the factors. After ranking the data I will compare engine size against price, mileage against price and age against price. This will show how strong the correlation between the two factors is.The table below shows the corresponding ranks of each car.Engine size rankMileage rankage rankPrice rank1.5394531.53525.5174414524201384413226.5444546.52725.569.82534159.81325.5219.863.5239.823.5279.81113311348.5501145048.5517.54039.5717.524131217.52125.51417.532341617.591333.517.513.53721.52825.51321.5424519233745192848.548.59284339.5102826132528473426282325.5302818343228713402853.543288134634.54639.52434.51925.533.534.5163.54234.5221345403439.51140313419403025.528.54015133540141336401725.5444036134844.51225.538.544.52925.54946101338.547.53825.54147.533.550494539.528.5503339.547Engine size against PriceEngine size rankPrice rankdd21.53-1.52.251.517-15.5240.25422448-416422-183246.542.56.256.560.50.259.815-5.227.049.821-11.2125.449.823-13.2174.249.827-17.2295.849.831-21.2449.441311214414598117.5710.5110.2517.5125.530.2517.5143.512.2517.5161.52.2517.533.5-1625617.537-19.5380.2521.5138.572.2521.5192.56.252319416289193612810183242825392826242830-242832-4162840-121442843-152252846-1832434.52410.5110.2534.533.51134.542-7.556.2534.545-10.5110.254011298414019214414028.511.5132.25403552540364164044-4164048-86444.538.563644.549-4.520.254638.57.556.2547.5416.542.2547.550-2.56.254928.520.5420.255047396590Mileage against PriceMileage rankPrice rankdd239336129635171832441239152120812144422-18324444401600276214412515101001321-864623-17289227-256251131-2040048.5147.52256.255054520254073310892412121442114749321616256933.5-24.5600.25137-36129628131522542192352937191832448.5939.51560.2543103310892625114726214412330-7491832-14196740-331089543-381444846-3814444624224841933.5-14.5210.251642-266762245-235293411235293119121443028.51.52.251535-204001436-224841744-277293648-121441238.5-26.5702.252949-204001038.5-28.5812.253841-39350-4722094528.516.5272.253347-1419632167Age against PriceAge rankPrice rankdd245342176425.5178.572.254524318491385251322-98145441168125.5619.5380.2534151936125.5214.520.253.523-19.5380.253.527-23.5552.251331-1832450149240148.5543.51892.2539.5732.51056.2513121125.51411.5132.253416183241333.5-20.5420.253.537-33.51122.2525.51312.5156.2545192667645192667648.5939.51560.2539.51029.5870.251325-12144342686425.530-4.520.253432241340-277293.543-39.51560.251346-33108939.52415.5240.2525.533.5-8643.542-38.51482.251345-32102439.51128.5812.2534191522525.528.5-391335-224841336-2352925.544-18.5342.251348-35122525.538.5-1316925.549-23.5552.251338.5-25.5650.2525.541-15.5240.253.550-46.52162.2539.528.51112139.547-7.556.2532777What does Spearman’s Rank Correlation show?Spearman’s shows that the findings from each scatter graph is correct as they match what spearman’s shows.Here I placed my findings from using spearman’s on a scale to the correlation.Engine size against PriceThe correlation between Engine size and price was 0.6835534214 which was the strongest correlation against price out of all the factors. This showed that there is definitely a link between engine size and the price. Which means that as the engine size increases so too does the price.Mileage against PriceSpearman’s showed that the correlation between Engine size and price was-0.5446338535. This is fairly strong which also means that the agreement between the two was very good and that they are related as suggested before with the scatter graphs. This shows that as Mileage increases the price of a car decreases.Age against PriceAge against price had a correlation of -0.57392557, which is still also moderately strong. As the number was a minus it showed that there is a negative correlation between the two factors, meaning that the age of cars increases the values of the car decrease.After looking at scatter graphs to handle the data it showed how the factors affected the price of second hand car and also showed if there was a negative or positive correlation. The scatter graphs showed that Engine size had a positive correlation with price and that Mileage and price had a negative correlation with price. After this I wanted to find out how strong the correlation between the factors price in order to see which factor had the biggest affect on price so I used Spearman’s. This also showed the same overview of the findings as the scatter graphs and also which factor had the strongest correlation. From spearman’s I gathered that engine size is the main factor that affects the second hand price of a car. After engine size age was the next biggest factor then mileage.Cumulative frequencyNow that I have found what affect the value of a second hand car the most the looked at depreciation I can go on to look at which size of car has the biggest price spread and see how the prices in the different sizes range. To do this I construct cumulative frequency tables, graphs and box plots.RangeFrequencyCumulative frequency0 < p ? 1000001000 < p ? 2000112000 < p ? 3000233000 < p ? 4000254000 < p ? 5000275000 < p ? 6000076000 < p ? 7000297000 < p ? 80003128000 < p ? 90001139000 < p ? 10000114Cumulative frequency tablesRangeFrequencyCumulative frequency0 < p ? 1000001000 < p ? 2000002000 < p ? 3000003000 < p ? 4000004000 < p ? 5000225000 < p ? 6000356000 < p ? 7000497000 < p ? 8000098000 < p ? 90002119000 < p ? 1000021310000 < p ? 1100021511000 < p ? 1200001512000 < p ? 1300001513000 < p ? 1400011614000 < p ? 1500011715000 < p ? 1600001716000 < p ? 1700001717000 < p ? 18000118RangeFrequencyCumulative frequency0 < p ? 1000001000 < p ? 2000002000 < p ? 3000003000 < p ? 4000004000 < p ? 5000005000 < p ? 6000116000 < p ? 7000127000 < p ? 8000028000 < p ? 9000359000 < p ? 100000510000 < p ? 110002711000 < p ? 120001812000 < p ? 1300021013000 < p ? 1400011114000 < p ? 1500021315000 < p ? 1600011416000 < p ? 1700001417000 < p ? 1800011518000 < p ? 1900011619000 < p ? 2000011720000 < p ? 2100001721000 < p ? 2200001722000 < p ? 2300001723000 < p ? 2400001724000 < p ? 2500001725000 < p ? 2600001726000 < p ? 27000118After constructing frequency tables I can move on to plotting a graph and plotting box plots to analyse to data.What I found out from using cumulative frequencyThe results from the Cumulative frequency graphs combined with the Box plots show the spread of price between the car sizes.The box plot on the small engine sized cars shows that the range of prices of the cars is very compact and relatively small compared to the other sizes of cars. From looking at the median of the inter-quartile range it can be seen that most of the prices of the cars are situated in a similar region of price and do not stray way too far from each other.The box plot on the medium engine sized cars shows a quite different story from the small cars. Here the inter-quartile range is far away from the highest priced car in the engine size which means that a few of the cars were priced more than the others. Also as the median is situated nearer to the low quartile than the upper quartile it shows that most of the cars most have been priced in this lower band of prices.Clearly looking at the box plots for the large engine sizes cars shows that these sizes of car have the most diverse range of price which could mean that they affected by many different factors. The median shows that prices were most of the car prices are concentrated towards the upper band of prices.ConclusionAfter looking at the sample of data in many different ways I can now come to a conclusion about my findings and if the hypothesises at the start was correct or not. My hypothesis which stated that mileage was the main factor that affected the price of a second hand was wrong as the result from this investigation have showed. I found that it was engine size that had the largest affect over price of a car as both spearman’s and the scatter graphs showed this as it had the strongest relationship with the price. Mileage and age came out to have very similar affects of price but still age was found to be just that bit more important to price than mileage. Perhaps if I looked at a another sample of cars I will find that mileage has a bigger affect on price than age as I still believe this to so.I have found that my second hypothesis in which I stated that small cars have the smallest spread in price to be true and correct. Through looking at cumulative frequency graphs and box plots for the different sizes of cars I can see that small cars do have a small spread where most of the car prices are concentrated. Where as large cars have very varied price ranging from quite low to high, this is apparent in its box plot it is very big com pared to the other two.In relation to my third hypothesis I can now say that this is also not true as a I have found out that in this sample of cars that I took large cars seem to hold their strongest not depreciate in value as fast as medium or small carsOverall I feel that this investigation has been a great success as I have managed to find the answers I was looking for at the start. I didn’t have mush trouble with this investigation, but only that it was fairly confusing to get started as there a lot of data handling to be done.ImprovementsImprovements that could be made to this investigation are as follows,Using as sample of cars that had even numbers of each size of car.When looking at the cumulative frequency using small intervals to plot the graph as this will help analysing the data.Looking at cars with greater mileages and investigating more factor that could possibly affect the second hand price of a car.