Carson Farmerhttp://carsonfarmer.com/2015-09-24T12:00:00-04:00Boulder Teen Science Café: Leave No (Digital) Trace2015-09-24T12:00:00-04:00cfarmertag:carsonfarmer.com,2015-09-24:2015/09/teen-science-cafe-boulder-colorado/<p><a href="http://sciencediscovery.colorado.edu/program/teen-cafe/"><img alt="Teen Science Cafe Logo" class="left" src="http://carsonfarmer.com/images/teen-science-cafe.jpg" /></a></p>
<p>I am very excited to announce that I will be giving a talk at the <a href="http://sciencediscovery.colorado.edu/program/teen-cafe/">Boulder Teen
Science Cafe</a> next
Tuesday September 29th from 5:30-7pm at the <a href="https://cumuseum.colorado.edu"><span class="caps">CU</span> Museum of
Natural History</a>‘s <a href="https://cumuseum.colorado.edu/exhibits/biolounge">BioLounge</a>.
My talk is entitled “Leave No (Digital) Trace: The Geography of Social Media”,
and here is the ‘snippet’ from the <a href="http://sciencediscovery.colorado.edu/teen-cafe-schedule/">website announcing my talk</a>:</p>
<blockquote>
<p>Our first Teen Café of the 2015-16 academic year features self-described “Geo Nerd” Dr. Carson Farmer, assistant professor in the <span class="caps">CU</span> Geography Department. When the average person thinks about geography, they might imagine a weathered, old map or a globe. When Dr. Farmer thinks about geography, he sees the streams of data about our locations, movements, and actions that we leave in a digital trail behind us every time we post to Facebook, send a text message, or upload a photo to Instagram. It turns out that social media is a geographer’s goldmine. Come learn about how social media can be used for good and evil, and about the fascinating ways in which scientists take advantage of the “big data” our many electronic devices now generate to study everything from the carbon footprint of your morning commute to snow and water processes around the globe.</p>
</blockquote>
<p>I should probably mention that the Teen Cafe is for <em>teenagers</em>. To quote the organizers:</p>
<blockquote>
<p>In order to maintain an atmosphere where teens can be themselves and feel comfortable
engaging with our speaker and their peers, we encourage parents to drop teens
off when possible and to leave younger siblings at home.</p>
</blockquote>
<p>In any case, it should be a lot of fun, and it sounds like an awesome program for teens in and around the Boulder and Denver area… go science!</p>Citizen-based Environmental Monitoring in NYC2015-01-16T12:00:00-05:00cfarmertag:carsonfarmer.com,2015-01-16:2015/01/citizen-based-enviro-monitoring-nyc/<p>Last fall, <a href="http://carsten.io/">Carsten Kessler</a> and I gave a online talk about our <a href="https://envirocar.org">EnviroCar project</a>.
The talk was entitled “EnviroCar - A <span class="caps">GIS</span> Framework for Citizen-based Environmental Monitoring in <span class="caps">NYC</span>”
and was hosted by the <a href="http://www.nysgis.net">New York State <span class="caps">GIS</span> Association</a>. It was the first talk of
the Transportation Professional Affiliation Group, which was quite an honor for Carsten and I. Check out
the <span class="caps">NYSGIS</span> <a href="https://www.youtube.com/user/NYSGISA">YouTube channel</a> for more awesome talks, and check out
the following embedded video to see the EnviroCar recording:</p>
<p><center>
<iframe width="640" height="360" src="//www.youtube.com/embed/_rStGIgqTXg" frameborder="0" allowfullscreen></iframe>
</center></p>
<!--more-->Interdisciplinary Workshop on Geospatial Computing2014-10-30T14:00:00-04:00cfarmertag:carsonfarmer.com,2014-10-30:2014/10/workshop-on-geospatial-computing/<p><a href="http://thespatiallab.org/?page_id=6576"><img alt="Geospatial Computing Workshop" class="left" src="http://carsonfarmer.com/images/geocomputing.png" /></a> </p>
<p>Hey all, I’ll be running a half-day workshop on Geospatial Computing with Python at the upcoming <strong>Interdisciplinary Workshop on Geospatial Computing (<span class="caps">IWGC</span>-2014)</strong> on <strong>November 20-21, 2014</strong> at <strong>TheMuseum, Kitchener, Ontario</strong>. Its going to be a really great event, with lots of experts in the field of geospatial computing. It is particularly relevant to students who have an interest in geospatial computing.</p>
<p>Organizers: Colin Robertson (Laurier), Graham Taylor (Guelph), Rob Feick (Waterloo)</p>
<p>We invite participants to an Interdisciplinary Workshop on Geospatial Computing. The workshop is aimed broadly at bringing together disparate communities working with geospatial tools and data. We invite participants from academia (faculty, postdocs, grad students), industry (geospatial developers, data scientists, <span class="caps">GIS</span> analysts), and government to connect and share knowledge.</p>
<ul>
<li>Day 1 of the workshop will include invited talks from researchers from a variety of disciplines including engineering, statistics, ecology, geography, and information science. The day will conclude with a panel discussion and networking event.</li>
<li>Day 2 will include hands-on interactive sessions so participants are encouraged to bring laptops and follow along. The day will conclude with a poster session and industry networking event which will include product demos and research posters.</li>
</ul>
<p>Our list of speakers includes:
<em> Krista Amolins (Esri Canada)
</em> Jennifer Baltzer (Laurier)
<em> Andrew Davidson (Agriculture and Agri-Food Canada)
</em> Carson Farmer (<span class="caps">CUNY</span>)<em>
</em> Bahram Gharabaghi (Guelph)
<em> Steve Grise (vertex3)
</em> Dan Gillis (Guelph)
<em> Nicole Rabe (<span class="caps">OMAFRA</span>)
</em> Tarmo Remmel (York)
<em> Martin Sykora (Loughborough)
</em> Matthew Tenney (McGill)</p>
<p>Participation: <strong>Open to all</strong>.</p>
<p>Participants should <a href="http://www.thespatiallab.org/geoworkshop">register</a> for the specific day or days they are interested in attending. We also encourage participants (especially grad students) to submit a poster abstract and present a poster during the session on the afternoon of Nov 21. Poster abstracts can be submitted via the registration page or can be <a href="haydnlawrence@gmail.com">emailed directly</a></p>
<p>The deadline for poster abstract submissions is <strong>November 8th</strong>.</p>
<p>Cost: <strong>Totally Free!</strong></p>
<p>Space is limited for this event. We hope to see you in November!</p>What Do We Do With All This Big Data?2014-10-28T14:33:00-04:00cfarmertag:carsonfarmer.com,2014-10-28:2014/10/what-to-do-with-all-this-data/<p>This is an interesting and thought-provoking <span class="caps">TED</span> Talk by <a href="http://susanetlinger.wordpress.com">Susan Etlinger</a> about how important it is to encourage ‘critical thinking’ when it comes to Big Data. Here, the idea is not let the ‘data speak for themselves’ but rather tell an engaging and intelligent story ‘with data’. Basically, we need to move <em>beyond</em> counting things in order to really understand anything.</p>
<p><CENTER>
<iframe src="https://embed-ssl.ted.com/talks/susan_etlinger_what_do_we_do_with_all_this_big_data.html" width="640" height="360" frameborder="0" scrolling="no" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe></CENTER></p>
<p>I agree with pretty much everything Susan says, and her use of a very personal example really deepens the impact of her talk. I really liked her focus on the need for ‘deepening our critical thinking skills’, and I think this is something we need to ensure we are passing on to our students as well. Putting less focus on what the <em>data</em> say, and more focus on what we can learn <em>with</em> the data is something we should be teaching more at the undergraduate, graduate, and even high-school levels. I think in general, the statistics literature and curriculum are better at this perspective than their machine learning counterparts (although statistics curruculum [my own included] aren’t always at the cutting-edge of modern teaching methods either). But this need not be the case, and I hope as we continue to generate, collect, store, and analyze more and more data, we keep in mind the <a href="http://www.wired.com/2013/04/with-big-data-context-is-a-big-issue/">need for context</a>.</p>History of Social Media in 90 Seconds2014-10-28T12:33:00-04:00cfarmertag:carsonfarmer.com,2014-10-28:2014/10/history-of-social-media/<p>I’ve been spending a fair bit of time looking at material on social media data and social media usage, and I came across this video is from a <span class="caps">CNBC</span> piece called “11 Predictions on the future of social media” by <a href="http://twitter.com/mcwellons">Mary Catherine Wellons</a>:</p>
<p><CENTER>
<iframe width="640" height="360" src="//www.youtube.com/embed/LgF3xh76Hcg" frameborder="0" allowfullscreen></iframe>
</CENTER></p>
<p>It is very interesting to see how <em>short</em> the history of this relatively pervasive set of technologies is!</p>GEOS seminar series at the Graduate Center2014-09-30T23:00:00-04:00cfarmertag:carsonfarmer.com,2014-09-30:2014/09/geos-seminar-graduate-center/<p>The Geography, Earth Science and Oceanography Seminars (<span class="caps">GEOS</span>) is hosting myself and Prof Sean Ahearn this Thursday to give a talk on “Computational GIScience: Current Research at the <span class="caps">CARSI</span> Lab”.</p>
<p><CENTER>
<img alt="GEOS Seminar Series Announcement" src="http://carsonfarmer.com/images/GEOS-Oct-2-Ahearn-Farmer.jpg" />
</CENTER></p>
<p>The talk will feature a bunch of the fun and exciting stuff we’re doing at <span class="caps">CARSI</span>, including our EnviroCar project and some of my recent work with spatial data flows. The talks will start around 5:30 <span class="caps">PM</span> on Thursday October 2nd 2014 in the Science Center, Room 4102 at the <span class="caps">CUNY</span> Graduate Center (365 5th Avenue, <span class="caps">NYC</span>). I’m told that light snacks and refreshments will be served, so if anything, it’s a chance to get some free grub!</p>
<p>Here’s a link to the <a href="http://carsonfarmer.com/uploads/GEOS-Oct-2.pdf">original flyer</a>. If you can’t make it but want to hear about what <span class="caps">CARSI</span> is up to these days, <a href="mailto:carson.farmer@hunter.cuny.edu">drop me a line</a>.</p>What if Google was a Guy?2014-09-29T12:33:00-04:00cfarmertag:carsonfarmer.com,2014-09-29:2014/09/what-if-google-was-a-guy/<p>This is absolutely hilarious!</p>
<p><CENTER>
<iframe width="640" height="360" src="//www.youtube.com/embed/YuOBzWF0Aws" frameborder="0" allowfullscreen></iframe>
</CENTER></p>
<p>Everything you ask Google sounds a lot less intelligent when you actually <strong>ask</strong> Google. Don’t forget to checkout Parts 2 and 3!</p>Become a Citizen Scientist with EnviroCar2014-09-26T12:00:00-04:00cfarmertag:carsonfarmer.com,2014-09-26:2014/09/become-citizen-scientist/<p><CENTER>
<a href="https://www.flickr.com/photos/joiseyshowaa/7454479488" title="World Class Traffic Jam 2 by joiseyshowaa, on Flickr">
<img src="https://farm9.staticflickr.com/8162/7454479488_9cf64433d6_z.jpg" width="640" height="465" alt="World Class Traffic Jam 2">
</a>
</CENTER></p>
<h2>Do you have an Android phone? Do you use a car on a regular basis?</h2>
<h1>Become a Citizen Scientist!</h1>
<p>enviroCar is an open platform to collect and analyze moving vehicle data. It accesses the car’s sensors with an Android smartphone and a Bluetooth <span class="caps">OBD</span>-<span class="caps">II</span> adapter that we will provide for free. The enviroCar app provides you with information about your car and your driving characteristics. By uploading the data to the enviroCar server you contribute to a growing collection of anonymized open data, helping scientists and traffic experts understand our city. We are looking for volunteers to join the envircoCar community in <span class="caps">NYC</span> and help us learn more about traffic flows and emissions.</p>
<p>Sound exciting? Contact <a href="mailto:carson.farmer@hunter.cuny.edu">myself</a> or <a href="mailto:carsten.kessler@hunter.cuny.edu">Carsten Kessler</a> to get your <strong>free</strong> <span class="caps">OBD</span>-<span class="caps">II</span> adapter and become a citizen scientist!</p>
<p>Learn more about enviroCar at <strong><a href="http://envirocar.org">http://envirocar.org</a></strong>. View the original <a href="http://carsonfarmer.com/uploads/ad-envirocar-participation.pdf"><span class="caps">PDF</span> poster</a>.</p>Stephen Flood Guest Speaker2014-05-13T12:00:00-04:00cfarmertag:carsonfarmer.com,2014-05-13:2014/05/guest_speaker_may_2014/<p>Please join us <strong>Wednesday, May 21st</strong> from <strong>4:00-6:00 pm</strong> in the Hunter College Geography
Conference Room (<strong>Hunter North 1004</strong>) for a talk by guest speaker Stephen Flood, who will be talking about <em>Communicating across disciplines in the climate change sphere</em>.</p>
<p>If you can’t make it in person, there will also be a <a href="https://plus.google.com/u/0/events/ccludsocarc92guqs3p8370lr6g">live feed available here</a> (the feed will also be recorded, so you can view it at a later date as well).</p>
<p><strong>Topic</strong>: This seminar will leverage the work carried out in the Climate Changes Impacts and Implications for New Zealand (<span class="caps">CCII</span>) project to demonstrate some of the key issues and approaches associated with communicating across a diverse range of disciplines and stakeholders. The broad aim of the presentation is to provide the epistemic community within the climate change sphere with grounded examples of effective communication approaches while also pointing out potential pitfalls and how best to avoid them.</p>
<p><strong>Bio</strong>: Dr. Stephen Flood is a Postdoctoral Research Fellow at the Climate Change Research Institute (<span class="caps">CCRI</span>) in Victoria University of Wellington, New Zealand. His research is focused on climate change adaptation, communication and decision-making associated with the interdisciplinary Climate Changes Implications and Impacts (<span class="caps">CCII</span>) project. Dr. Flood is also a lead consultant with SmartEarth consulting, which works with Governments and civil society in least developed countries (LDCs) to secure the necessary funding for capacity building and institutional development in the sphere of climate change mitigation and adaptation.</p>
<p>Stephen is also a former colleague of mine from the <a href="http://www.nuim.ie/">National University of Ireland Maynooth</a>, where he worked at the <a href="http://icarus.nuim.ie/">Irish Climate Analysis and Research Units (<span class="caps">ICARUS</span>)</a>.</p>
<p>For any questions, contact <a href="mailto: carson.farmer@hunter.cuny.edu">me via email</a> or <a href="http://twitter.com/carsonfarmer">@carsonfarmer</a>.</p>
<p>Hope to see you there!</p>Google Summer of Code 20142014-05-08T11:18:00-04:00cfarmertag:carsonfarmer.com,2014-05-08:2014/05/google-summer-of-code-2014/<p><a href="http://www.google-melange.com/gsoc/homepage/google/gsoc2014"><img alt="Google Summer of Code 2014 Logo" class="left" src="http://carsonfarmer.com/images/gsoc2014.png" /></a> </p>
<p>The decisions for <a href="http://www.google-melange.com/gsoc/homepage/google/gsoc2014">Google Summer of Code 2014</a> have been made, and I’ll be mentoring <a href="http://www.google-melange.com/gsoc/project/details/google/gsoc2014/rahul110392/5641332169113600">Rahul Raja</a> from India. He will be working on the <a href="https://envirocar.org/">enviroCar</a> app <span class="caps">UX</span> design. Right next door, Carsten Keßler (who has <a href="http://carsten.io/mentoring-google-summer-of-code-project/">already posted an annoucement</a>) will be mentoring <a href="https://www.google-melange.com/gsoc/project/details/google/gsoc2014/taolin/5771770325893120">Tao Lin</a> from China, who will be working on the <a href="https://en.wikipedia.org/wiki/Linked_Data">Linked Data</a> service provided by enviroCar.</p>
<p>We are both looking forward to working with the students and with the folks at 52°North, who are also mentoring 3 more students in <a href="http://blog.52north.org/2014/04/22/welcome-google-summer-of-code-2014-studenta/">other projects</a>:</p>
<ul>
<li>Sensor Data Access for Rasdaman, Simona Badoiu (Romania)</li>
<li>Using the <span class="caps">ILWIS</span> framework for geo-data capture with a mobile application, Bouke Pieter Ottow (Netherlands)</li>
<li>Proposal for Access Control User Interface for <span class="caps">SOS</span> Servers, Dushyant Sabharwal (India)</li>
<li>enviroCar App <span class="caps">UX</span> Design, Rahul Raja (India)</li>
<li>The Improvements of enviroCar Linked Data Service, Tao Lin (China)</li>
</ul>Playing with CitiBike Trip Histories2014-05-07T15:45:00-04:00cfarmertag:carsonfarmer.com,2014-05-07:2014/05/playing-with-citibike-trip-histories/<p><a href="http://bl.ocks.org/cfarmer/11478345"><img alt="Chord flow diagram with time slider" class="right" src="http://carsonfarmer.com/images/chord_citibike.png" /></a> </p>
<p>I recently attended the April 2014 <a href="http://www.meetup.com/betanyc/">#BetaNYC</a> (#BikeNYC) <a href="https://twitter.com/CitibikeNYC">@CitiBikeNYC</a> Hacknight, and after seeing several interesting presentations on what people <em>are</em> doing, <em>want</em> to be doing, and are <em>thinking</em> of doing with the recently released <a href="https://citibikenyc.com/system-data">Citi Bike Trip Histories</a>, I was inspired. A few of us got together to ‘quickly’ and ‘easily’ hack together a <a href="http://bost.ocks.org/mike/sankey/">Sankey diagram</a> of the bike trip flows… Turns out this wasn’t nearly as quick and easy as we though: By the end of the evening, we were still struggling with getting <a href="http://d3js.org/">D3js</a>‘s <a href="https://github.com/d3/d3-plugins/tree/master/sankey">Sankey</a> plugin to play nicely with our data (which I was manipulating in Python using <a href="http://pandas.pydata.org/">Pandas</a>). I ended up playing around with the data later, and opted to visualize the flows between <a href="http://nycdata.pediacities.com/dataset?tags=neighborhoods"><span class="caps">NYC</span> neighborhoods</a> using a simpler <a href="https://github.com/mbostock/d3/wiki/Chord-Layout">chord diagram</a>.</p>
<p>A chord diagram arranges the nodes (neighborhoods) radially, drawing thick curves between nodes. In my version, the thickness of links between neighborhoods encodes the relative frequency of rides between two neighborhoods: thicker links represent more frequent rides. Only flows that represent more than 1000 trips are represented to avoid too many small flows. Links are directed, and are colored by the more frequent origin (i.e., colored according to where most of the trips originate from). Whereas thecColors themselves are pretty much arbitrary.</p>
<p>The <a href="http://bl.ocks.org/cfarmer/11478345">visualization is here</a>, and you can move the slider around to change which year/month is shown. Play around by sliding around and comparing flows over different time periods. Also watch for chord ‘flipping’, where the dominant flow direction changes from month to month. This is particularly common in the smaller flows, where there isn’t a strong dominant direction.</p>
<p>The whole thing was built using <a href="http://d3js.org/">D3js</a> and based very heavily on <a href="http://bost.ocks.org/mike/uberdata">this</a>, <a href="http://exposedata.com/tutorial/chord/latest.html">this</a>, and <a href="http://fleetinbeing.net/d3e/chord.html">this</a>. As I mentioned, the initial visualization was started at the April 2014 #BetaNYC Hacknight, and the version <a href="http://bl.ocks.org/cfarmer/11478345">linked here</a> is what I ended up with. Checkout the linked visualization for details on the data sources and the actual code/data used to produce it.</p>Free and Open Source Masterclass, Aug 20142014-04-30T12:00:00-04:00cfarmertag:carsonfarmer.com,2014-04-30:2014/04/foss_master_class_2014/<p><strong><span class="caps">FREE</span> <span class="caps">AND</span> <span class="caps">OPEN</span> <span class="caps">SOURCE</span> <span class="caps">GIS</span></strong> – <strong>August 4th to 8th, 2014</strong></p>
<p>The <a href="http://www.geo.hunter.cuny.edu/">Department of Geography</a> at <a href="http://www.hunter.cuny.edu/">Hunter College</a> of the <a href="http://www.cuny.edu/">City University of New York</a> and <a href="http://www.hunter.cuny.edu/ce">Hunter Continuing Education</a> are offering a five day professional course in <strong>Free and Open Source <span class="caps">GIS</span></strong> from <strong>August 4th to 8th, 2014</strong>. This five day course will span the entire range of <span class="caps">GIS</span> data capture, management, analysis, and visualization of geographic information using Free and Open Source Software (<span class="caps">FOSS</span>). These different elements of the <span class="caps">GIS</span> workflow will be discussed over the first four days and will then be applied in a final project completed on Friday. The course will combine lectures with hands-on sessions where participants will work with different free and open source <span class="caps">GIS</span> packages. Since we expect participants from many different organizations in the tri-state area, this training course also presents an excellent networking opportunity.</p>
<p>The course is designed for experienced <span class="caps">GIS</span> users who want to broaden their skill set with expertise in the ever-growing world of free and open source <span class="caps">GIS</span>. Participants are expected to have a technical background and an interest in developing comprehensive workflows using multiple software components. While we do not require any programming experience, we will be working on the command line and developing some small scripts. Participants should be eager to master these valuable skills.</p>
<p>This course is offered at Hunter College, <span class="caps">CUNY</span>, in the heart of the upper eastside of Manhattan very convenient to public transportation. <a href="http://www.hunter.cuny.edu/ce/gis/">Click here</a> For course description, tuition, instructor bios and contacts, or call the Hunter Continuing Education office at 212-650-3850.</p>Two Guest Speakers2014-04-28T12:00:00-04:00cfarmertag:carsonfarmer.com,2014-04-28:2014/04/guest_speaker_april_2014/<p>Please join us <strong>Wednesday, April 30th</strong> from <strong>3:30-5:00 pm</strong> in the Hunter College Geography
Conference Room (<strong>Hunter North 1004</strong>) for a talk by guest speakers <span class="caps">JD</span> Godchaux and Lela Prashad, and Robert Buchanan, who will be talking about <em>Free <span class="amp">&</span> Open Source Software for Geospatial Applications</em>.</p>
<p>If you can’t make it in person, there will also be a <a href="https://plus.google.com/u/0/events/cctokhnpg32mkqta02cs3c83me8">live feed available here</a> (the feed will also be recorded, so you can view it at a later date as well).</p>
<p><strong>Announcement</strong>:
We are excited to announce that the department will be hosting the second Free and Open Source for Geospatial (<span class="caps">FOSS4G</span>) presentation/discussion this Wednesday, from 3:30-5 pm in the Geography Department conference room (<span class="caps">HN</span> 1004).</p>
<p>The discussion will be on use of <span class="caps">FOSS4G</span> and web mapping technologies for community-based projects. Our theme is water quality and safety in the <span class="caps">NYC</span> region, and our guest speakers will be discussing two <span class="caps">FOSS4G</span> projects:</p>
<p><span class="caps">JD</span> Godchaux and Lela Prashad from Nijel.org will discuss the <a href="http://nijel.org/nysewage/"><span class="caps">NY</span> State Sewage Project</a> which monitors sewage overflows in public waterways throughout the state.</p>
<p>Robert Buchanan from the New School will discuss a <a href="http://www.nycwatertrail.org/water_quality.html">separate project</a> on water quality mapping in the New York metropolitan region using volunteer data collection efforts and Google maps.</p>
<p>Both tools have been employed by the <span class="caps">NY</span> Surfriders Foundation chapter as ways to monitor overall water quality and respond to deficiencies in government efforts to ensure clean and safe water for humans and other species to enjoy.</p>
<p>We hope you join us if you have time, it should be a great discussion on the ways you can employ easy to use <span class="caps">FOSS4G</span> tools to support community service and non-profit causes of all kinds. Checkout the <a href="http://carsonfarmer.com/uploads/GIS-Society-Announcement.pdf">flyer here</a>!</p>
<p>For any questions, contact <a href="mailto: erichte@hunter.cuny.edu">Gene via email</a>.</p>
<p>Hope to see you there!</p>After the Irish Famine: Population Change in Cartograms2014-03-28T15:45:00-04:00cfarmertag:carsonfarmer.com,2014-03-28:2014/03/irish_famine_cartograms/<p>In celebration of St Patrick’s day last week, I decided to dig up an old dataset from when I was living/working in Ireland on historical Irish populations by county, and have a play with <a href="http://d3js.org/">D3js</a> and cartograms. <a href="http://carsonfarmer.com/maps/irish_famine/">Click here</a> to view it ‘live’. If you’ve read any of my <a href="http://carsonfarmer.com/2012/08/olympic-cartogram/">previous posts</a>, you’ll know that I like cartograms as a useful and fun way to visualize data. The <a href="http://en.wikipedia.org/wiki/Great_Famine_(Ireland)">Great Famine</a> was an important and significant event in Irish (and global) history, and cartograms provide a fun and informative way to explore the resultant population change in Ireland from around the Famine era (and beyond).</p>
<p><a href="http://carsonfarmer.com/maps/irish_famine/"><img alt="Population cartogram" class="right" src="http://carsonfarmer.com/images/irish-famine.png" /></a></p>
<p>The cartogram ‘time-series’ provides a simple visualization of population change in Ireland after the Famine era. It uses <a href="http://lambert.nico.free.fr/tp/biblio/Dougeniketal1985.pdf">continuous area cartograms</a> and population estimates from 1841 to 2001 to demonstrate change. The cool thing about this visualization is how dramatically it emphasizes population loss from 1841 to 1851 (and beyond), and how, even in modern Ireland, many counties remain well below their pre-famine population levels. As a whole, the population of Ireland remains less than 70% of its pre-famine levels!</p>
<p>I’ve added a few nice interactive features to the map, including a popover feature that gives you additional information on mouse over. This includes county name, total population for the give year, and a nice little <a href="http://en.wikipedia.org/wiki/Sparkline">sparkline</a> showing population change over time (with the current year highlighted for reference). This gives you a quick feel for the population change over time, and was pretty easy to do using D3js and Twitter Bootstrap.</p>
<p>To produce the visualization, I leaned heavily on <a href="http://d3js.org/">D3js</a>, <a href="http://colorbrewer2.org">colorbrewer</a>, Twitter’s <a href="http://getbootstrap.com/">Bootstrap</a>, <a href="http://jquery.com/">jQuery</a>, and some helpful examples from <a href="http://www.tnoda.com/blog/2013-12-19">here</a>, <a href="http://benjchristensen.com/2011/08/08/simple-sparkline-using-svg-path-and-d3-js/">here</a>, and <a href="http://jsfiddle.net/eQmYX/77/">here</a> (among others). The code is based on the <a href="https://github.com/shawnbot/d3-cartogram">d3-cartogram</a> example by <a href="http://stamen.com/studio/shawn">Shawn Allen</a> at <a href="http://stamen.com">Stamen</a>.</p>Python Resources for QGIS Users2014-03-18T18:45:00-04:00cfarmertag:carsonfarmer.com,2014-03-18:2014/03/python_resources_qgis/<p>There’s a discussion thread on the <span class="caps">QGIS</span> LinkedIn Group page about Python tutorials and resources. There were a few good suggestions, so I thought I’d share these with others. It starts with a <em>very</em> common question from a <span class="caps">GIS</span> (or any software that supports scripting) user:</p>
<blockquote>
<p>I’m a real ‘end-user’ of qgis and I want to improve my skills a little… I’ve found many python tutorials online but I don’t know which are any good. Can anyone point me to some good resources?</p>
</blockquote>
<p>The responses were useful, but not exhaustive:</p>
<ul>
<li>The PyQGIS Programmer’s Guide: http://locatepress.com/ppg</li>
<li>PyQGIS developer cookbook: http://www.qgis.org/en/docs/pyqgis_developer_cookbook/</li>
<li>Geoprocessing with Python using <span class="caps">FOSS</span> <span class="caps">GIS</span>: http://www.gis.usu.edu/~chrisg/python/2009/</li>
<li>10 Resources to Learn Python Programming Language: http://codecondo.com/10-ways-to-learn-python/</li>
</ul>
<p>Do you have another suggestion? Please sound off in the comments below!</p>
<p>[<span class="caps">UPDATE</span>]: I’ve added a link to resources for learning Python in general, very useful and comprehensive list. <a href="http://codecondo.com/10-ways-to-learn-python/">Check it out</a>!</p>Submission deadline extended!2014-03-16T15:30:00-04:00cfarmertag:carsonfarmer.com,2014-03-16:2014/03/scipy_2014_submission_extended/<p><a href="https://conference.scipy.org/scipy2014/"><img alt="SciPy 2014 Logo" class="left" src="http://carsonfarmer.com/images/scipy2014_logo_simple.png" /></a></p>
<p>Due to popular demand, the deadline for submitting talks, tutorials and posters has been extended to <strong>April 1, 2014</strong> - no ‘foolin!’. We encourage submissions related to general scientific computing with Python, one of the two special themes for this year, or the domain-specific mini-symposia held during the conference. Take a look at a <a href="http://conference.scipy.org/past.html">few talks</a> from previous years, our <a href="https://conference.scipy.org/scipy2014/participate/presentations/">guidelines for this year</a>, and we look forward to reviewing submissions!</p>
<p><a href="https://conference.scipy.org/scipy2014/">Submit your abstracts today</a>!</p>
<!--more-->ESRI and Open Source2014-03-12T13:30:00-04:00cfarmertag:carsonfarmer.com,2014-03-12:2014/03/esri_and_open_source/<p><a href="http://blogs.esri.com/esri/esri-insider/2014/03/05/esri-open-source-growing/?WT.mc_id=EmailCampaignh25209"><img alt="ESRI & Open Source" class="right" src="http://carsonfarmer.com/images/esri_open.png" /></a></p>
<p>Here’s a <a href="http://blogs.esri.com/esri/esri-insider/2014/03/05/esri-open-source-growing/?WT.mc_id=EmailCampaignh25209">blog post from <span class="caps">ESRI</span></a> about <span class="caps">ESRI</span>’s transition to open source, open development, and social coding.<a href="#kessler">*</a> It features <a href="https://github.com/">GitHub</a> pretty prominently, which continues to be an awesome resource for collaborative work — and not just for code. My colleagues and I have started using it for planning meetings and workshops, developing research papers, maintaining websites (this site is hosted on GitHub), and yes, even open source software projects. <span class="caps">ESRI</span> obviously also thinks GitHub is a useful resource, and their keynote for the <a href="http://www.esri.com/events/devsummit"><span class="caps">ESRI</span> DevSummit</a> is <a href="https://twitter.com/defunkt">GitHub <span class="caps">CEO</span> and Co-Founder Chris Wanstrath</a>!</p>
<!--more-->
<p>Here’s a particularly nice snippet from the post:</p>
<blockquote>
<p>The value is clear: through active and public collaboration we can effectively deliver a platform that empowers anyone with the freedom and ownership to customize solutions to their own domain. Our community spans the entire domain of government, science, education, commercial, and industrial practice. As a company it would be impossible for us to effectively serve every users need. Instead, open-source enables the community to scale itself.</p>
</blockquote>
<p><a name="kessler">*</a>Thanks to <a href="http://carsten.io/">Carsten Keßler</a> for the link</p>SciPy 2014 Geospatial Data in Science2014-03-06T13:30:00-05:00cfarmertag:carsonfarmer.com,2014-03-06:2014/03/scipy_2014_mini_symposium/<p><a href="https://conference.scipy.org/scipy2014/"><img alt="SciPy 2014 Logo" class="left" src="http://carsonfarmer.com/images/scipy2014_logo_simple.png" /></a></p>
<p>I have recently been asked to help out with the <strong>Geospatial Data in Science</strong> track for <a href="https://conference.scipy.org/scipy2014/">SciPy 2014</a> in Austin, Texas this coming July. The conference is being held at the <a href="https://www.google.com/maps/place/AT%26T+Executive+Education+and+Conference+Center/@30.282362,-97.7401074,17z/data=!3m1!4b1!4m2!3m1!1s0x0:0x7ef52b1ad3321879"><span class="caps">AT</span>&T Executive Education and Conference Center</a> at the University of Texas campus in Austin, Texas from <strong>July 6th to 12th 2014</strong>. It promises to be an awesome gathering of scientific Python users, developers, and organizations. You can checkout the <a href="https://conference.scipy.org/scipy2014/about/">conference announcement</a> on the <a href="https://conference.scipy.org/scipy2014/">SciPy 2014 website</a>, where you can <a href="https://conference.scipy.org/scipy2014/account/signup/">register</a> to submit a proposal and/or abstract, and generally find out all about the SciPy community and conference.</p>
<!--more-->
<p>I’m really excited about the conference, especially because the two main <em>Specialized Tracks</em> (which run parallel to the main conference) for the conference this year are <strong>Scientific Computing in Education</strong> <em>and</em> <strong>Geospatial Data in Science</strong>; both topics near and dear to my heart! In particular, the geospatial track “will focus on libraries, tools and techniques for processing Geospatial data of all types and for all purposes — from low-volume to high-volume, local and global”. Cool stuff!</p>
<p>In addition to these two specialized tracks, we have <em>Domain-specific Mini-symposia</em> which you might be interested in:</p>
<blockquote>
<p>Introduced in 2012, mini-symposia are held to discuss scientific computing applied to a specific scientific domain/industry during a half afternoon after the general conference. Their goal is to promote industry specific libraries and tools, and gather people with similar interests for discussions.</p>
</blockquote>
<p>Mini-symposia on the following topics will take place this year:</p>
<ul>
<li>Astronomy and astrophysics</li>
<li>Bioinformatics</li>
<li>Geophysics</li>
<li>Vision, Visualization, and Imaging</li>
<li>Computational Social Science and Digital Humanities</li>
<li>Engineering</li>
</ul>
<p>If you decide to submit an abstract, be sure to select <em>Geospatial Data in Science</em> as your “Topic Track”. If you have any questions, feel free to <a href="http://www.carsonfarmer.com/contact/">get in touch</a>. And don’t forget, submission deadline is <strong>March 14th, 2014</strong>!</p>
<h4>Important Dates</h4>
<ul>
<li><strong>March 14th</strong>: Presentation abstracts, poster, tutorial submission deadline. Application for sponsorship deadline.</li>
<li>April 17th: Speakers selected</li>
<li>April 22nd: Sponsorship acceptance deadline</li>
<li>May 1st: Speaker schedule announced</li>
<li>May 6th: Early-bird registration ends (or after 150 registrants)</li>
<li>July 6-12th: 2 days of tutorials, 3 days of conference, 2 days of sprints</li>
</ul>
<p>Hope to see you in Austin!</p>CARSI is looking for research assistants2014-03-05T14:00:00-05:00cfarmertag:carsonfarmer.com,2014-03-05:2014/03/carsi_is_hiring_feb_2014/<p>We have two openings for undergraduate research assistants here at the <a href="http://carsilab.org/">Center for Advanced Research of Spatial Information</a>. You can find the <a href="http://carsten.io/openings/">original announcement</a> on <a href="http://carsten.io/contact/">Dr. Carsten Keßler</a>‘s website. In short, we are looking for</p>
<p><a href="http://carsilab.org/"><img alt="carsi logo" class="right" src="http://carsonfarmer.com/images/CARSI1-300x116.png" /></a></p>
<ol>
<li>A student who can help us move our website over to WordPress (<a href="http://carsten.io/ad-website.pdf">detailed description</a>); and</li>
<li>A student who can support us with Android app development in the enviroCar project (<a href="http://carsten.io/ad-envirocar.pdf">detailed description</a>).</li>
</ol>
<p>If either of these topics sounds interesting to you, please get in touch with <a href="http://www.carsonfarmer.com/contact/">myself</a> or <a href="http://carsten.io/contact/">Dr. Carsten Keßler</a>.</p>
<!--more-->First GTECH Experiment Recording2014-03-05T12:00:00-05:00cfarmertag:carsonfarmer.com,2014-03-05:2014/03/first_gtech_experiement/<p>In case you missed <a href="https://twitter.com/ebrelsford">Eric Brelsford’s</a> talk last Wednesday on <em>Free <span class="amp">&</span> Open Source Software for Geospatial Applications</em>, I’ve embedded the recording below for your viewing pleasure (Eric’s slides are <a href="http://ebrelsford.github.io/talks/2014/Hunter/">also available</a>). This is the first in a series of talks organized within the department of <a href="http://www.geo.hunter.cuny.edu/">Geography</a> at <a href="http://www.hunter.cuny.edu/main/">Hunter College</a>, <span class="caps">CUNY</span> around <span class="caps">GIS</span> and Technology (we’re calling them <span class="caps">GTECH</span> Experiments). Each talk is organized by a student (thanks go to <a href="https://twitter.com/maragittleman">Mara Gittleman</a> this time), and features a member of the wider geo-technology community. Check out the video below:</p>
<div class="youtube" align="center">
<iframe width="640" height="360"
src="//www.youtube.com/embed/ngNLvbfup3g"
frameborder="0" allowfullscreen>
</iframe>
</div>
<p>Stay tuned for news and events around <span class="caps">GTECH</span> Experiments in the future!</p>
<!--more-->NYC Geoclient REST API from Python2014-03-01T19:53:00-05:00cfarmertag:carsonfarmer.com,2014-03-01:2014/03/nyc_geocoder/<p>Recently, on the <a href="http://www.meetup.com/betanyc/">betaNYC</a> Meetup email list, <a href="http://blog.accursedware.com/">John Krauss</a> and
<a href="http://29degreesnorth.blogspot.com/">Tom Swanson</a> both posted Python code for accessing the <a href="http://developer.cityofnewyork.us/api/geoclient-api-beta"><span class="caps">NYC</span> Geoclient
<span class="caps">REST</span> <span class="caps">API</span></a>, which is an awesome resource developed by the <a href="http://www.nyc.gov/html/doitt/html/home/home.shtml"><span class="caps">NYC</span> Department
of Information Technology and Telecommunications</a> <span class="caps">GIS</span>/Mapping unit.</p>
<blockquote>
<p>The Geoclient <span class="caps">API</span> is a RESTful web service interface to the <span class="caps">NYC</span> Department of City Planning’s Geosupport system developed
by the Department of Information Technology and Telecommunications <span class="caps">GIS</span>/Mapping unit. Geosupport is a mainframe-based
geocoding system used by <span class="caps">NYC</span> government. Geosupport provides coordinate and geographic attributes for supported input
locations (address, intersection, blockface). Geoclient exposes the most widely used Geosupport functions and provides them
in a more intuitive and modern manner.</p>
</blockquote>
<p>This is what John has to say about <a href="https://github.com/talos/nyc-geoclient">his code</a>:</p>
<blockquote>
<p>I’ve been messing around with <span class="caps">NYC</span>’s geoclient <span class="caps">API</span>. It’s quite powerful! I wrapped the <span class="caps">REST</span> calls in a Python module,
which is accessible for all on PyPI. You can check it out here:</p>
<p>https://github.com/talos/nyc-geoclient</p>
<p>And the documentation here:</p>
<p>http://nyc-geoclient.readthedocs.org/en/latest/index.html</p>
<p>On a side-note, according to Geoclient, almost 20% of the intersections in the city’s own collision statistics releases are
ambiguous or invalid.</p>
</blockquote>
<p>And this is what Tom has to say <a href="https://github.com/tswanson/NYCParkingGeocode">about his</a>:</p>
<blockquote>
<p>My code is nowhere near as clean as John’s but if might be of interest that I ran ~3millions records through the <span class="caps">NYC</span>
GeoClient in December. Overall, the services worked great and was able to make ~1,500 calls per min. I was geocoding
the parking ticket data on nyc open data.</p>
<p>https://github.com/tswanson/NYCParkingGeocode</p>
</blockquote>
<p>In addition to the above two Python implementations, <a href="http://gonzalez.io">Edgar Gonzalez</a> also recently released a ruby gem for the <span class="caps">NYC</span> GeoClient <span class="caps">API</span>:</p>
<blockquote>
<p>Github: http://github.com/edgar/NYCGeoClient<br />
Rubygems: http://rubygems.org/gems/nyc_geo_client</p>
</blockquote>
<p>Note that you need to <a href="http://developer.cityofnewyork.us/">register an app with DoITT</a> (and sign it up
for the Geoclient <span class="caps">API</span>) <s>then wait a few days before being able</s> to use the <span class="caps">API</span>.
So get registered <span class="caps">ASAP</span>!</p>Eric Brelsford Guest Speaker2014-02-21T17:00:00-05:00cfarmertag:carsonfarmer.com,2014-02-21:2014/02/guest_speaker_feb_2014/<p>Please join us <strong>Wednesday, February 26th</strong> from <strong>3:00 to 5:00 pm</strong> in the Hunter College Geography
Conference Room (<strong>Hunter North 1004</strong>) for a talk by guest speaker Eric Brelsford, who will be talking
about <em>Free <span class="amp">&</span> Open Source Software for Geospatial Applications</em>.</p>
<p><span class="caps">UPDATE</span>: If you can’t make it in person, check out a <a href="https://plus.google.com/u/0/events/ca20rt7putlltjhl1k6iqppt44o">live feed of the talk here</a>.</p>
<p><strong>Topic</strong>:
Eric will give us an overview of the Free <span class="amp">&</span> Open Source Software for Geospatial Applications (<span class="caps">FOSS4G</span>)
terrain, followed by a few examples of workflow. What tools are out there for making useful and interesting
online maps? What is “open source” software and how is it different from other software? Where does it fit in
with the history of web mapping, who are the people (and what is the technology) on the cutting edge? </p>
<p>Eric will then discuss which programs talk to each other, what types of things you can do with each platform.
He’ll talk about (and hopefully have time to walk us through) CartoDB (and cartocss), geojson, TileMill, and
more, depending on what we have time to cover. If there’s interest, he can go over using javascript libraries
like leaflet to customize these online maps. </p>
<p><strong>Bio</strong>:
Eric teaches web mapping (mostly open source) at The New School and co-runs <a href="http://596acres.org/">596 Acres</a>. He’s
never taken a <span class="caps">GIS</span> class or used ArcGIS, but has produced maps <a href="http://596acres.org/">here</a> and <a href="http://www.growingcitiesmovie.com/">here</a> and <a href="http://groundedinphilly.org/">here</a>
(among other places). He’s also active in the Open Street Map community and just started a thing called
Maptime (an idea that originated on the west coast, “Our mission is to create a safe space where
interdisciplinary communities can learn, socialize, and code maps with each other (and for each other)”),
where he led its first meeting. </p>
<p>He’s also on the internet <a href="https://twitter.com/ebrelsford">@ebrelsford</a>.</p>Call for Abstracts: 2nd International Conference UREC 20142014-02-07T12:00:00-05:00cfarmertag:carsonfarmer.com,2014-02-07:2014/02/call_for_abstracts_ugec_2014/<p><strong>The Urbanization and Global Environmental Change Project is pleased to
announce that we will begin accepting abstracts for our 2nd International
Conference on February 7th.</strong></p>
<p><a href="http://www.ugec2014.org/"><img alt="UGEC 2014 Conference Header" class="center" src="http://carsonfarmer.com/images/ugec_conf_2014.jpg" /></a></p>
<p>We invite abstract submissions for oral and poster presentations. We
particularly encourage contributions that exhibit an innovative set of
conceptual and methodological approaches. Abstracts should be focused on
synthesizing <span class="caps">UGEC</span> research, lessons learned, key ways forward, and should fall
under one of the four conference themes:</p>
<ol>
<li>Urbanization Patterns and Processes</li>
<li>Urban Responses to Climate Change: Adaptation, Mitigation and Transformation</li>
<li>Global Environmental Change, Urban Health and Well-Being</li>
<li>Equity and Environmental Justice in Urban Areas</li>
</ol>
<p><a href="http://ugec2014.squarespace.com/themes">Click here for more details on the themes</a>.</p>
<p><a href="http://ugec2014.squarespace.com/concept-and-themes">Click here to read the concept note</a>.</p>
<p><strong>Abstract Submission Guidelines</strong>:</p>
<ul>
<li>Abstract submissions should be written in English, not exceeding 300 words.
The abstracts will be chosen based on clarity, appropriateness of the topic,
methodology, originality and contribution to the overall synthesis-focus of
our conference. </li>
<li>Please indicate the title of the session(s) under which you are submitting
your abstract. If no preference, leave blank. A list of sessions will be made
available before February 7th.</li>
</ul>
<p><strong>Deadline for submission is march 30th, 2014!</strong></p>Urbanization and Global Environmental Change Conference 20142014-02-03T15:45:00-05:00cfarmertag:carsonfarmer.com,2014-02-03:2014/02/ugec_conf_2014/<p>The <a href="http://ugec.org/">Urbanization and Global Environmental Change (<span class="caps">UGEC</span>)</a> 2nd
International Conference on
<a href="http://www.ugec2014.org/">“Urban Transitions and Transformations: Science, Synthesis and Policy”</a>
is scheduled to take place in Taipei, Taiwan from November
6th-8th, 2014. I am organizing a special session entitled “Forecasting
Urbanization: Population and Land Dimensions” which promises to be very exciting.</p>
<p>This is the ‘final wrap-up’ for <span class="caps">UGEC</span>, so the main purpose of the conference is
tp provide a synthesis of <span class="caps">UGEC</span> research and practice. Sessions are supposed to
be reflective (i.e., not solely presentations on ‘new research’), so the talks
should have a ‘lessons learned’ focus. Hopefully there will be time in our
session to discuss key points, research gaps, and ways forward for forcasting
urban population growth. I’m going to start soliciting papers for this session
pretty soon, but in the mean time, if you are interseted in presenting, please
<a href="mailto:carson.farmer@hunter.cuny.edu">get in touch</a>!</p>
<p>Abstracts can be <a href="https://ugec.conference-services.net/authorlogin.asp?conferenceID=3848&language=en-uk">submitted here</a>, and you can <a href="http://www.ugec2014.org/">register here</a>
(click on the Registration tab). When submitting an abstract, if you want to be
included in my session, please indicate so when submitting online. It’s probably
a good idea to register early for the conference, plus, the Early Bird rates
will only be available through June 15th, 2014. If you have any questions about
the conference, you can <a href="mailto:ugec2014@asu.edu">email <span class="caps">UGEC</span></a>.</p>
<p><span class="caps">P.S.</span> If you aren’t yet convinced that you should attend the 2014 <span class="caps">UGEC</span> Synthesis
Conference, maybe <a href="http://environment.yale.edu/profile/seto/]">Dr. Keren Seto</a> can help convince you:</p>
<div class="vimeo" align="center">
<iframe src="http://player.vimeo.com/video/75219567" width="500" height="281"
webkitallowfullscreen mozallowfullscreen allowfullscreen>
</iframe>
</div>One year in New York City2014-01-03T17:50:00-05:00cfarmertag:carsonfarmer.com,2014-01-03:2014/01/one_year_in_nyc/<p>One year ago today, January 3rd 2014, <a href="https://twitter.com/AmandaFarmerNYC">my wife</a> and I officially moved to New York City. It has flown by extremely fast for both of us, but we’ve managed to enjoy the city and all of the benefits that come with it. Some highlights (in no particular order) include:</p>
<ul>
<li>Starting my own research agenda</li>
<li>Skating at Rockefeller Center</li>
<li>Seeing Chicago on Broadway</li>
<li>Wondering through Manhattan at Christmas time</li>
<li>Road-trip to Boston</li>
<li>Living in North America again (Europe was wonderful, but its nice to be closer to family)</li>
<li>Trip up to Montreal</li>
<li>Seeing The Nance on Broadway</li>
<li>Staff passes to Macy’s Thanksgiving Day Parade</li>
<li>Meeting new friends and colleagues</li>
<li>Seeing the Rockettes at Radio City Music Hall</li>
<li>Teaching full courses for the first time</li>
<li>Getting my first keynote speaker invite</li>
<li>Wedding 2.0</li>
<li>Getting <em>two</em> Thanksgivings</li>
<li><span class="caps">NYC</span> parades (Pride, St Patrick’s, Thanksgiving)</li>
<li>Movie date-nights in Bryant Park</li>
<li>Amazing skyline views from our rooftop</li>
<li>Entertaining friends and family</li>
<li>Ordering pretty much anything I could possibly want on-line</li>
<li>Taking the Subway every day (Sometimes that sucks…)</li>
<li>Getting a new dog</li>
<li>Having my own office :-)</li>
<li>Celebrating New Year’s Eve only blocks from Times Square</li>
<li>Walking the High Line</li>
<li>Seeing the New York Philharmonic in Central Park (With fireworks!)</li>
<li>Having access to so many world class restaurants all the time</li>
<li>Street vendors (Street meat)</li>
<li>Turning 30</li>
<li>Extreme temperatures</li>
<li>And many more things, too numerous to list!</li>
</ul>
<p>Here’s to the next year, may it be even more fun and exciting than the last.</p>A quick bookmarklet for off-campus library access2013-12-17T14:56:00-05:00cfarmertag:carsonfarmer.com,2013-12-17:2013/12/bookmarklet_for_off_campus_library_access/<p>I have been doing a fair bit of research off-campus lately, and as usual, have been having trouble accessing research materials (mainly academic publications) from home. <em>Fortunately</em>, Hunter College provides <a href="http://library.hunter.cuny.edu/find/accessfromhome">off-campus access</a> to all electronic resources available to Hunter students, faculty and staff via their Library proxy server. <em>Unfortunately</em>, it turns out to be a huge pain to use anything other than the library search facilities (like <a href="http://scholar.google.com/">Google Scholar</a>) through the proxy server*. In fact, when working off-campus, you actually have to preface each <span class="caps">URL</span> address to licensed resources with
<code>http://proxy.wexler.hunter.cuny.edu/login?url=</code> in order to be able to access it. Not very handy…</p>
<p><a href="http://en.wikipedia.org/wiki/Bookmarklet">Bookmarklets</a> to the rescue! This problem is actually something that bookmarklets are perfect for. A bookmarklet is (usually) just a small piece of JavaScript that resides in your browser and provides additional functionality to a web page. With that in mind, I decided to create a simple bookmarklet to automatically reload a given page with the above prefix prepended to the <span class="caps">URL</span>; giving me access to the material via the library proxy server, while still being able to use whatever search tools I want. In this case, all the bookmarklet contains is the following JavaScript code:</p>
<div class="highlight"><pre><span class="nx">javascript</span><span class="o">:</span> <span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="o">=</span><span class="s2">"http://proxy.wexler.hunter.cuny.edu/login?url="</span><span class="o">+</span><span class="nx">location</span><span class="p">.</span><span class="nx">href</span>
</pre></div>
<p>So that the whole link is simply:</p>
<div class="highlight"><pre><span class="nt"><a</span> <span class="na">href=</span><span class="s">"javascript: location.href='http://proxy.wexler.hunter.cuny.edu/login?url='+location.href"</span><span class="nt">></span>Library Proxy<span class="nt"></a></span>
</pre></div>
<p><span class="quo">‘</span>Installing’ a bookmarklet is as simple as dragging it onto your browser’s bookmarks toolbar (I think on some versions of Internet Explorer, you might have to right-click and select “Add to Favorites…”). If you drag this <a href="javascript:location.href='http://proxy.wexler.hunter.cuny.edu/login?url='+location.href">Library Proxy</a> link onto your bookmarks bar, you’ll have a handy little tool to automatically access the current page via the Hunter College library proxy (note that you’ll need Hunter College credentials for this to work), instantly increasing your productivity by 12.45%… or so.</p>
<p><em>*</em> I might just be missing something, in which case, hopefully someone will correct me.</p>Lectureships (x2) in GeoInformatics at the University of St Andrews2013-12-03T11:57:00-05:00cfarmertag:carsonfarmer.com,2013-12-03:2013/12/cgi-lecturer-positions-2014/<p>Researchers in the <a href="http://www.st-andrews.ac.uk/geoinformatics/">Centre for GeoInformatics (<span class="caps">CGI</span>)</a> in the School of
Geography and Geosciences at the University of St Andrews have been selected
for a prestigious award under the <a href="http://www.nuffieldfoundation.org/q-step">Q-Step Quantitative Methods Programme</a>
funded by a combination of the Nuffield Foundation and the <span class="caps">ESRC</span>. This programme
will employ two new lecturers to add substantial new courses to the school of
Geography and Geosciences’ existing undergraduate curriculum and help deliver a
new MSc in GeoInformatics. They are looking for candidates with research
interests in each of:</p>
<ol>
<li>Remote sensing</li>
<li>Spatio-temporal analysis, specialization spatial statistics</li>
</ol>
<p>For more information on these two positions please <a href="http://www.st-andrews.ac.uk/geoinformatics/lectureships-in-geoinformatics-2-posts/">click here</a>.</p>
<p>Should you have any informal queries about the posts, the university, or life
in St Andrews, contact the <a href="http://www.st-andrews.ac.uk/geoinformatics/people/faculty/"><span class="caps">CGI</span> folks here</a>.</p>Describing Variation2013-11-22T10:49:00-05:00cfarmertag:carsonfarmer.com,2013-11-22:2013/11/statistical-modeling-python-variation/<p>The 3rd in a <a href="http://www.carsonfarmer.com/category/statistical-modeling-for-python.html">series of tutorials</a> on using Python for introductory
statistical analysis, this tutorial covers methods for <strong>describing</strong> data via
simple statistical calculations and statistical graphics. As always, the
notebook for this tutorial is <a href="https://github.com/cfarmer/stat-mod-fresh-approach-python">available here</a>.</p>
<p>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>In the 1880s, Sir Francis Galton, one of the pioneers of statistics, collected data on the heights of approximately 900 adult children and their parents in London. Galton was interested in studying the relationship between a full-grown child’s height and his or her mother’s and father’s height. In order to do so, Galton collected height measurements from about 200 families in the city of London.</p>
<p>As a setting to illustrate computer techniques for describing variability, take the data that Galton collected on the heights of adult children and their parents. The file <code>"galton.csv"</code> stores these data in a modern, case/variable format.</p>
<p><span class="dataset shadow"><i class="icon-flag" style="font-size: 1.5em;"></i> [`galton.csv`][link]</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [1]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="n">gal</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"http://www.mosaic-web.org/go/datasets/galton.csv"</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Simple-Statistical-Calculations">Simple Statistical Calculations<a class="anchor-link" href="#Simple-Statistical-Calculations">¶</a></h2><p>Simple numerical descriptions are easy to compute. Here are the mean, median, standard deviation and variance of the children’s heights (in inches).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [2]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[2]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>66.760690423162515</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [3]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">median</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[3]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>66.5</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [4]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">std</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[4]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>3.5829184699744614</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [5]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">var</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[5]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>12.837304762484134</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Notice that the variance function (<code>var()</code>) returns the square of the standard deviation (<code>std()</code>). Having both is merely a convenience. A percentile tells where a given value falls in a distribution. For example, a height of 63 inches is on the short side in Galton’s data:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [6]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="nn">st</span> <span class="c"># import some useful stats function from the scipy.stats library</span>
<span class="n">st</span><span class="o">.</span><span class="n">percentileofscore</span><span class="p">(</span><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">values</span><span class="p">,</span> <span class="mi">63</span><span class="p">,</span> <span class="n">kind</span><span class="o">=</span><span class="s">"weak"</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[6]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>19.153674832962139</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note right shadow">
Note that in the above case, `kind=”weak”` corresponds to the definition of a ‘cumulative distribution function’. A `percentileofscore()` of 80% means that 80% of values are less than or equal to the provided score. This usage is consistent with how the ‘mosaic’ R package calculates percentiles, using the `pdata` function.
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Only about 19% of the cases have a height less than or equal to 63 inches. The <code>percentileofscore()</code> function from the <code>scipy.stats</code> package takes an array (values in a column) as a first argument and finds where the ‘score’ (second argument) falls in the distribution of values in the array. We use the <code>values</code> attribute of the <code>height</code> column from the <code>gal</code> data frame to get the values in the column as an <code>array</code>.</p>
<p>A quantile refers to the same sort of calculation, but inverted. Instead of giving a value in the same units as the distribution, you give a percentage: a number between 0 and 100. The <code>scoreatpercentile()</code> function then calculates the value whose percentile would be that value:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [7]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">st</span><span class="o">.</span><span class="n">scoreatpercentile</span><span class="p">(</span><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">values</span><span class="p">,</span> <span class="n">per</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[7]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>63.5</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note left shadow">
Note that numpy has a simpler version of this function, but it uses a different naming convention which readers of ‘Statistical Modeling: A Fresh Approach’ and these tutorials might find confusing. In this case, the function is called `percentile`, and returns the ‘score’ or value at a given percentile. For example, `np.percentile(gal.height, 20)` returns `63.5` as in the above example.
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>You can also use the following functionality from ‘pandas’ which is consistent with how the ‘mosaic’ R package calculates quantiles, using the <code>qdata</code> function, and is a bit simpler. Note that instead of a percentage, a probability is given (a number between 0 and 1), but the output is the same.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [8]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="mf">0.2</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[8]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>63.5</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Building on this, the 25th and 75th percentiles - in other words, the 50 percent coverage interval, can be computed as:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [9]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="o">.</span><span class="mi">25</span><span class="p">),</span> <span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="o">.</span><span class="mi">75</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[9]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>(64.0, 69.700000000000003)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The 50 percent coverage interval can <em>also</em> be computed as a single command using ‘list comprehension’ - a ‘Pythonic’ way to clearly and concisely construct lists.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [10]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="p">[</span><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="n">q</span><span class="p">)</span> <span class="k">for</span> <span class="n">q</span> <span class="ow">in</span> <span class="p">[</span><span class="o">.</span><span class="mi">25</span><span class="p">,</span> <span class="o">.</span><span class="mi">75</span><span class="p">]]</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[10]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>[64.0, 69.700000000000003]</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note right shadow">
[List comprehensions][lists] can be used to construct lists in a very natural, easy way, like a mathematician is used to do. The following are common ways to describe lists (or sets, or tuples, or vectors) in mathematics:
$S = \{ x^{2} : \{x \in 0 \dots 9\} \}$
$V = (1, 2, 4, 8, \dots, 2^{12})$
$M = \{x$ $|$ $x$ $\in S$ and $x$ even$\}$
You probably know things like the above from previous math courses. In Python, you can write these expression almost exactly like a mathematician would do, without having to remember any special cryptic syntax. This is how you do the above in Python:
`S = [x**2 for x in range(10)]`
`V = [2**i for i in range(13)]`
`M = [x for x in S if x % 2 == 0]`
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>We can use the same techniques to compute the 2.5th and 97.5th percentiles - in other words, the 95 percent coverage interval:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [11]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="p">[</span><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="n">q</span><span class="p">)</span> <span class="k">for</span> <span class="n">q</span> <span class="ow">in</span> <span class="p">[</span><span class="o">.</span><span class="mi">025</span><span class="p">,</span> <span class="o">.</span><span class="mi">975</span><span class="p">]]</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[11]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>[60.0, 73.0]</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Another simple way to compute different coverage intervals is to provide a <code>percentile_width</code> argument to the <code>describe()</code> function:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [12]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">describe</span><span class="p">(</span><span class="n">percentile_width</span><span class="o">=</span><span class="mi">50</span><span class="p">)[[</span><span class="mi">4</span><span class="p">,</span><span class="mi">6</span><span class="p">]]</span> <span class="c"># Subset to return only the coverage interval values</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[12]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>25% 64.0
75% 69.7
Name: height, dtype: float64</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The interquartile range is the width of the 50 percent coverage interval: the difference between the 75th and 25th percentiles:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [13]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">diff</span><span class="p">([</span><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">quantile</span><span class="p">(</span><span class="n">q</span><span class="p">)</span> <span class="k">for</span> <span class="n">q</span> <span class="ow">in</span> <span class="p">[</span><span class="o">.</span><span class="mi">25</span><span class="p">,</span> <span class="o">.</span><span class="mi">75</span><span class="p">]])</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[13]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>array([ 5.7])</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Simple-Statistical-Graphics">Simple Statistical Graphics<a class="anchor-link" href="#Simple-Statistical-Graphics">¶</a></h2><p>There are several basic types of statistical graphics to display the distribution of a variable: histograms, density plots, and boxplots. These work in a manner that’s similar to <code>mean()</code>, <code>quantile()</code> and so on in terms of syntax, but there are a few important additional items to consider. Firstly, most plotting functions we use will come from the ‘matplotlib’ Python library, which integrates nicely with numpy and other Scientific Python libraries. There are several different ways that we can interact with ‘matplotlib’, inlcuding via the <code>pyplot</code> interface (which is simply a submodule of ‘matplotlib’ and is useful for ‘scripting’) and the <code>pylab</code> interface, which is useful for interactive plotting.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note right shadow">
If you prefer to use the `pyplot` interface, then remember to import it in the usual way at the beginning of your Python session: `import matplotlib.pyplot`
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note left shadow">
If you want interactive plotting (and why wouldn’t you?!), start ‘IPython’ with the <code>--pylab</code> flag. With this flag enabled, you don’t have to <code>import matplotlib</code>, as it will be done for you automatically):</p>
<pre><code>ipython --pylab</code></pre>
<p>Or, for inline figures in an IPython qtconsole or notebook, use:</p>
<pre><code>ipython notebook --pylab inline</code></pre>
<p></span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Histograms-and-Distributions">Histograms and Distributions<a class="anchor-link" href="#Histograms-and-Distributions">¶</a></h3><p>Constructing a histogram involves dividing the range of a variable up into bins and counting how many cases fall into each bin. This is done in an almost entirely automatic way using the <code>hist()</code> function from a column in a dataframe:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [14]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">h</span> <span class="o">=</span> <span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">hist</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXgAAAD9CAYAAAC2l2x5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAGQtJREFUeJzt3W1sU+fdx/Gf20SbbkEIKWBozG4jBQqBkGRlKZrEmg4C
olqz0CK6dOsChGoq2jQEEkNa6eikkvBqPGyV0AQi2iQe3vCgWyLK0Djdg0bddfCmQYVqyZbmibIk
JcBKgJz7BY1tCibxSezris/3I0VwHDv/vy/O9Y/zs2MCruu6AgBknMdMNwAASA0GPABkKAY8AGQo
BjwAZCgGPABkKAY8AGSoRw74trY2Pffcc5o/f74WLFigvXv3SpJ27NihUCik0tJSlZaW6vTp09Hb
1NXVafbs2Zo7d66amppS2z0AIKHAo14H39XVpa6uLpWUlOj69et6+umndeLECR07dkwTJ07U5s2b
77t+c3OzXnnlFb3//vtqb2/XsmXLdOnSJT32GD8oAEC6PXLyTp8+XSUlJZKkCRMmaN68eWpvb5ck
Pez7wsmTJ1VdXa3s7GyFw2EVFBQoEomkoG0AwHBG/NC6tbVV58+f1+LFiyVJ+/btU3FxsWpra9XX
1ydJ6ujoUCgUit4mFApFvyEAANIrayRXun79ulavXq09e/ZowoQJev311/Xmm29KkrZv364tW7bo
wIEDD71tIBAY0WUAgOEl8+4ywz6Cv337tl566SX94Ac/UFVVlSRp2rRpCgQCCgQC2rBhQzSGyc/P
V1tbW/S2n3zyifLz8xM2yYerX/ziF8Z7sOWDtWAtWItHfyTrkQPedV3V1taqsLBQmzZtil7e2dkZ
/fvx48dVVFQkSaqsrNSRI0c0MDCglpYWXb58WWVlZUk35Setra2mW7AGaxHDWsSwFt49MqL561//
qt///vdauHChSktLJUk7d+7U4cOHdeHCBQUCAc2aNUv79++XJBUWFmrNmjUqLCxUVlaW3nnnHeIY
ADDkkS+TTFnRQMDTjxuZyHEclZeXm27DCqxFDGsRw1rEJDs7GfAAME4kOzv5DSTDHMcx3YI1WIsY
1iKGtfCOAQ8AGYqIBgDGCSIaAIAkBrxx5IsxrEUMaxHDWnjHgAeADEUGDwDjBBk8AEASA9448sUY
1iKGtYhhLbxjwANAhiKDB4BxggweACCJAW8c+WIMaxHDWsSwFt4x4AEgQ5HBA8A4QQYPAJDEgDeO
fDGGtYhhLWJYC+8Y8ACQocjgAWCcIIMHAEhiwBtHvhjDWsSwFjGshXcMeADIUGTwADBOkMEDACQx
4I0jX4xhLWJYixjWwjsGPABkKDJ4ABgnyOABAJIY8MbZnC/m5OQpEAik/SMnJ8/0XTfO5vMi3VgL
7xjwSKi/v1eSm8aPs5LcL+oCGC0yeCQUCAR0b/CmvTLnB/AQZPAAAEkMeOPIF+M5phuwBudFDGvh
HQMeADIUGTwSIoMH7DKmGXxbW5uee+45zZ8/XwsWLNDevXslST09PaqoqNCcOXO0fPly9fX1RW9T
V1en2bNna+7cuWpqavJ4NwAAo/XIAZ+dna1f/epX+vDDD3Xu3Dn95je/0cWLF1VfX6+KigpdunRJ
S5cuVX19vSSpublZR48eVXNzsxobG7Vx40YNDg6m5Y6MV+SL8RzTDViD8yKGtfDukQN++vTpKikp
kSRNmDBB8+bNU3t7u06dOqWamhpJUk1NjU6cOCFJOnnypKqrq5Wdna1wOKyCggJFIpEU3wUAwMNk
jfSKra2tOn/+vJ555hl1d3crGAxKkoLBoLq7uyVJHR0dWrx4cfQ2oVBI7e3tD/16a9euVTgcliTl
5uaqpKRE5eXlkmLfsf1wXF5eblU/8ccxQ8flaTq+14Pp+2/6eIgt/Zg6HrrMln7Seew4jg4dOiRJ
0XmZjBE9yXr9+nU9++yz2r59u6qqqjR58mT19sZ+2zAvL089PT36yU9+osWLF+v73/++JGnDhg16
/vnn9eKLL95flCdZxwWeZAXsMua/6HT79m299NJLevXVV1VVVSXp3qP2rq4uSVJnZ6emTZsmScrP
z1dbW1v0tp988ony8/OTugN+8+CjZT9zTDdgDc6LGNbCu0cOeNd1VVtbq8LCQm3atCl6eWVlpRoa
GiRJDQ0N0cFfWVmpI0eOaGBgQC0tLbp8+bLKyspS2D4AIJFHRjR/+ctf9K1vfUsLFy784sf1ey+D
LCsr05o1a/Tvf/9b4XBYx44dU25uriRp586dOnjwoLKysrRnzx6tWLHiwaJENOMCEQ1gl2RnJ7/o
hIQY8IBdeLOxcYZ8MZ5jugFrcF7EsBbeMeABIEMR0SAhIhrALkQ0AABJDHjjyBfjOaYbsAbnRQxr
4R0DHgAyFBk8EiKDB+xCBg8AkMSAN458MZ5jugFrcF7EsBbeMeABIEORwSMhMnjALmTwAABJDHjj
yBfjOaYbsAbnRQxr4R0DHgAyFBk8EiKDB+xCBg8AkMSAN458MZ5jugGjcnLyFAgEjHzk5OSZvvsJ
sUe8yzLdAPCgrOh/EZluEydO1rVrPUZq9/f3KhaJOZLK01jbzHojtcjgkZDJDN5M3Xu1TZ2b5tZb
4nmP8YEMHgAgiQFvHPliPMd0AxZxTDdgDfaIdwx4AMhQZPBIiAw+zZXJ4DEMMngAgCQGvHHki/Ec
0w1YxDHdgDXYI97xOnjgPuZegw+MNTJ4JOTXDN6vtdmT9iODBwBIYsAbR74YzzHdgEUc0w1Ygz3i
HQMeADIUGTwSIoP3V232pP3I4AEAkhjwxpEvxnNMN2ARx3QD1mCPeMeAB4AMRQaPhMjg/VWbPWm/
Mc/g169fr2AwqKKiouhlO3bsUCgUUmlpqUpLS3X69Ono5+rq6jR79mzNnTtXTU1NSbYPABgrww74
devWqbGx8b7LAoGANm/erPPnz+v8+fNauXKlJKm5uVlHjx5Vc3OzGhsbtXHjRg0ODqam8wxBvhjP
Md2ARRzTDViDPeLdsAN+yZIlmjx58gOXP+zHhJMnT6q6ulrZ2dkKh8MqKChQJBIZm04BAEnx/CTr
vn37VFxcrNraWvX19UmSOjo6FAqFotcJhUJqb28ffZcZrLy83HQLFik33YBFyk03YA32iHee3k3y
9ddf15tvvilJ2r59u7Zs2aIDBw489LqJ3plv7dq1CofDkqTc3FyVlJRE/yGHfiTj2OxxzNBxeZqO
hy5LVz1bjjXM59NT35bzj+NyOY6jQ4cOSVJ0XibFHYGWlhZ3wYIFw36urq7Orauri35uxYoV7rlz
5x64zQjL+sLZs2dNt5CQJFdy0/hx9os/0103/sOW2mfTXttWNu+RdEv238lTRNPZ2Rn9+/Hjx6Ov
sKmsrNSRI0c0MDCglpYWXb58WWVlZV5KAABGadjXwVdXV+vdd9/V1atXFQwG9dZbb8lxHF24cEGB
QECzZs3S/v37FQwGJUk7d+7UwYMHlZWVpT179mjFihUPFuV18OMCr4P3V232pP2SnZ38ohMSYsD7
qzZ70n682dg4w2t84zmmG7CIY7oBa7BHvGPAA0CGIqJBQkQ0/qrNnrQfEQ0AQBID3jjyxXiO6QYs
4phuwBrsEe8Y8ACQocjgkRAZvL9qsyftRwYPAJDEgDeOfDGeY7oBizimG7AGe8Q7BjwAZCgyeCRE
Bu+v2uxJ+5HBAwAkMeCNI1+M55huwCKO6QaswR7xjgEPABmKDB4JkcH7qzZ70n5k8AAASQx448gX
4zmmG7CIY7oBa7BHvGPAA0CGIoNHQmTw/qrNnrQfGTwAQBID3jjyxXiO6QYs4phuwBrsEe8Y8ACQ
ocjgkRAZvL9qsyftRwYPAJDEgDeOfDGeY7oBizimG7AGe8Q7BjwAZCgyeCREBu+v2uxJ+5HBAwAk
MeCNI1+M55huwCKO6QaswR7xjgEPABmKDB4JkcH7qzZ70n5k8AAASQx448gX4zmmG7CIY7oBa7BH
vGPAA0CGIoNHQmTw/qrNnrQfGTwAQNIIBvz69esVDAZVVFQUvaynp0cVFRWaM2eOli9frr6+vujn
6urqNHv2bM2dO1dNTU2p6TqDkC/Gc0w3YBHHdAPWYI94N+yAX7dunRobG++7rL6+XhUVFbp06ZKW
Ll2q+vp6SVJzc7OOHj2q5uZmNTY2auPGjRocHExN5wCARxp2wC9ZskSTJ0++77JTp06ppqZGklRT
U6MTJ05Ikk6ePKnq6mplZ2crHA6roKBAkUgkBW1njvLyctMtWKTcdAMWKTfdgDXYI955yuC7u7sV
DAYlScFgUN3d3ZKkjo4OhUKh6PVCoZDa29vHoE0AQLKyRvsFAoHAF6+2SPz5h1m7dq3C4bAkKTc3
VyUlJdHv1EOZmx+O4/NFG/qJP44ZOi5P8fGXL0t1PduOFXd8QdImI/VtOf+Gjnfv3u3r+XDo0CFJ
is7LpLgj0NLS4i5YsCB6/NRTT7mdnZ2u67puR0eH+9RTT7mu67p1dXVuXV1d9HorVqxwz50798DX
G2FZXzh79qzpFhKS5EpuGj/OfvFnuuvGf9hS+2zaa9vK5j2Sbsn+O3mKaCorK9XQ0CBJamhoUFVV
VfTyI0eOaGBgQC0tLbp8+bLKysq8lPAN8sV45aYbsEi56QaswR7xbtiIprq6Wu+++66uXr2qmTNn
6pe//KW2bdumNWvW6MCBAwqHwzp27JgkqbCwUGvWrFFhYaGysrL0zjvvPDK+AQCkDr/JapjjONY+
Qkn/b7I6uvfI1b+/TRqr7Si9j+Lt3ZM275F04zdZAQCSeARvvZycPPX39xrswMS/ky2Pov1Vmz1p
v2RnJwPecube8EsyN3D8eJ/N12ZP2o+IZpzhfTbiOaYbsIhjugFrsEe8Y8ADQIYiorEcEQ2101Wb
PWk/IhoAgCQGvHHki/Ec0w1YxDHdgDXYI94x4AEgQ5HBW44Mntrpqs2etB8ZPABAEgPeOPLFeI7p
BizimG7AGuwR7xjwAJChyOAtRwZP7XTVZk/ajwweACCJAW8c+WI8x3QDFnFMN2AN9oh3DHgAyFBk
8JYjg6d2umqzJ+1HBg8AkMSAN458MZ5jugGLOKYbsAZ7xDsGPABkKDJ4y5HBUztdtdmT9iODBwBI
YsAbR74YzzHdgEUc0w1Ygz3iHQMeADIUGbzlyOCpna7a7En7kcEDACQx4I0jX4znmG7AIo7pBqzB
HvGOAQ8AGYoM3nJk8NROV232pP3I4AEAkhjwxpEvxnNMN2ARx3QD1mCPeMeAB4AMRQZvOTJ4aqer
NnvSfmTwAABJDHjjyBfjOaYbsIhjugFrsEe8yxrNjcPhsHJycvT4448rOztbkUhEPT09evnll/Wv
f/1L4XBYx44dU25u7lj1CwAYoVFl8LNmzdIHH3ygvLy86GVbt27VlClTtHXrVu3atUu9vb2qr6+/
vygZ/IiRwVM7XbXZk/ZLewb/5WKnTp1STU2NJKmmpkYnTpwYbQkAgAejimgCgYCWLVumxx9/XD/6
0Y/02muvqbu7W8FgUJIUDAbV3d390NuuXbtW4XBYkpSbm6uSkhKVl5dLimVufjiOzxcTXT+Wx6b7
WMN8PhX14i9LdT3bjhV3fEHSJiP1bdofkrR7925fz4dDhw5JUnReJmNUEU1nZ6dmzJihTz/9VBUV
Fdq3b58qKyvV29sbvU5eXp56enruL0pEE+U4Ttwgf5C/IhpH94aOn+5zotqOYgM4PbVt3ZPD7RE/
SWtEM2PGDEnS1KlTtWrVKkUiEQWDQXV1dUm69w1g2rRpoymR8Thx45WbbsAi5aYbsAZ7xDvPA/7m
zZvq7++XJN24cUNNTU0qKipSZWWlGhoaJEkNDQ2qqqoam04BAEnxPOC7u7u1ZMkSlZSU6JlnntF3
vvMdLV++XNu2bdMf/vAHzZkzR3/84x+1bdu2sew34/Aa33iO6QYs4phuwBrsEe88P8k6a9YsXbhw
4YHL8/LydObMmVE1BQAYPd6LxnL+epLVdF1/12ZP2o/3ogEASGLAG0e+GM8x3YBFHNMNWIM94h0D
HgAyFBm85cjgqZ0e2ZLuGKk8ceJkXbvWM/wVkfTsHNVbFQDIFHdk6ptLf3/ASF0/IKIxjHwxnmO6
AYs4phuwBnvEOwY8AGQoMnjLkcFT2w+1mQcjw+vgAQCSGPDGkS/Gc0w3YBHHdAPWYI94x4AHgAxF
Bm85Mnhq+6E282BkyOABAJIY8MaRL8ZzTDdgEcd0A9Zgj3jHgAeADEUGPwI5OXnq7+8d/oop47dc
1r9ZtF9rj6d5YFKys5MBPwL+fKLTZG0/3md/1x5P88AknmQddxzTDVjEMd2ARRzTDViDDN47BjwA
ZCgimhEgovFLXWqbqj2e5oFJRDQAAEkMeAs4phuwiGO6AYs4phuwBhm8dwx4AMhQZPAjQAbvl7rU
NlV7PM0Dk8jgAQCSGPAWcEw3YBHHdAMWcUw3YA0yeO8Y8ACQocjgR4AM3i91qW2q9niaByaRwQMA
JDHgLeCYbsAijukGLOKYbsAaZPDeMeABIEORwY8AGbxf6lLbVO3xNA9MSnZ2ZqWwlzEzODioS5cu
aXBw0HQrAMZc1hcPotJr4sTJunatJ+1102lcDPizZ89q5cpKffWr/5v22rdufZriCo6k8hTXGC8c
sRZDHPlnLe7o0T89OErFWvT3p/+bSrqlZMA3NjZq06ZNunv3rjZs2KCf/exno/p6t2/f1v/8zxJ9
9lnjGHWYjLclvZHCr39B/tnIw2EtYliLGNbCqzF/kvXu3bv68Y9/rMbGRjU3N+vw4cO6ePHiWJfJ
IH2mG7AIaxHDWsSwFl6N+YCPRCIqKChQOBxWdna2vve97+nkyZNjXQYARule9m/iIycnL033cIy1
t7dr5syZ0eNQKKT33ntvVF/zscce0+eff6BJk14YbXtJ+/zzy7p1K5UVWlP5xceZVtMNWKTVdAMW
aU3R1x0u+0+ddOX/Yz7gR/psuJdnzW/d+r+kbzN2UvkP0mCw9nDSXXtoLfx0nxPVHu68SGXtdBuu
dqrWwtx9Tscrh8Z8wOfn56utrS163NbWplAodN91eM0rAKTemGfwixYt0uXLl9Xa2qqBgQEdPXpU
lZWVY10GADCMMX8En5WVpV//+tdasWKF7t69q9raWs2bN2+sywAAhpGS96JZuXKlPvroI3388cfa
v3+/Fi5cqNLSUpWVlUmSduzYoVAopNLSUpWWlqqx0cTr29Ovr69Pq1ev1rx581RYWKj33ntPPT09
qqio0Jw5c7R8+XL19fnjJWFfXotz58758rz46KOPove3tLRUkyZN0t69e315XjxsLfbs2ePL80KS
6urqNH/+fBUVFemVV17RrVu3kj4vUv5eNLNmzdIHH3ygvLzYy4LeeustTZw4UZs3b05laevU1NTo
2Wef1fr163Xnzh3duHFDb7/9tqZMmaKtW7dq165d6u3tVX19velWU+5ha7F7925fnhdDBgcHlZ+f
r0gkon379vnyvBgSvxYHDx703XnR2tqqb3/727p48aK+8pWv6OWXX9bzzz+vDz/8MKnzIi3vJvmw
7yF+e6L1s88+05///GetX79e0r0oa9KkSTp16pRqamok3Rt6J06cMNlmWiRaC8l/50W8M2fOqKCg
QDNnzvTleREvfi1c1/XdeZGTk6Ps7GzdvHlTd+7c0c2bN/Xkk08mfV6kfMAHAgEtW7ZMixYt0m9/
+9vo5fv27VNxcbFqa2t98eNnS0uLpk6dqnXr1unrX/+6XnvtNd24cUPd3d0KBoOSpGAwqO7ubsOd
pt7D1uLmzZuS/HdexDty5Iiqq6slyZfnRbz4tQgEAr47L/Ly8rRlyxZ97Wtf05NPPqnc3FxVVFQk
f164KdbR0eG6ruteuXLFLS4udv/0pz+53d3d7uDgoDs4OOj+/Oc/d9evX5/qNox7//333aysLDcS
ibiu67o//elP3TfeeMPNzc2973qTJ0820V5aPWwttm/f7l65csV358WQW7duuVOmTHGvXLniuq7r
y/NiyJfXwo/z4uOPP3bnzZvnXr161b19+7ZbVVXl/u53v0v6vEj5I/gZM2ZIkqZOnapVq1YpEolo
2rRp0V/Z3bBhgyKRSKrbMC4UCikUCukb3/iGJGn16tX6xz/+oenTp6urq0uS1NnZqWnTpplsMy0S
rcXUqVN9d14MOX36tJ5++mlNnTpV0r1HZ347L4Z8eS38OC/+/ve/65vf/KaeeOIJZWVl6cUXX9Tf
/va3pOdFSgf8zZs31d/fL0m6ceOGmpqaVFRUFG1Qko4fP66ioqJUtmGF6dOna+bMmbp06ZKkexnj
/Pnz9cILL6ih4d5v6TU0NKiqqspkm2mRaC38eF4MOXz4cDSSkKTKykrfnRdDvrwWnZ2d0b/75byY
O3euzp07p//+979yXVdnzpxRYWFh8vMihT9luP/85z/d4uJit7i42J0/f767c+dO13Vd99VXX3WL
iorchQsXut/97nfdrq6uVLZhjQsXLriLFi1yFy5c6K5atcrt6+tz//Of/7hLly51Z8+e7VZUVLi9
vb2m20yLL69Fb2+vb8+L69evu0888YR77dq16GV+PS8ethZ+PS927drlFhYWugsWLHB/+MMfugMD
A0mfF0b+yz4AQOrxn24DQIZiwANAhmLAA0CGYsADQIZiwANAhmLAA0CG+n+Zt3Vv6TYhiwAAAABJ
RU5ErkJggg==
"
>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>You can also create a histogram using ‘pylab’ commands directly:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [15]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">h</span> <span class="o">=</span> <span class="n">hist</span><span class="p">(</span><span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">values</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXgAAAD9CAYAAAC2l2x5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAE8FJREFUeJzt3X9MVff9x/HXsZdsWQR/VLjYe9n3moCBi4B3OtYs6XY7
waZLvMNp6HBrScFm6bJkRpNuy2oHSyr435TFxCx2IVsi9R/BLJGQJrtdt6y9rtNkGU2hGa544V7r
gAqyisr5/mG9VYvgBe498LnPR0Jyufec+/7cD5/7OocPH+6xbNu2BQAwzgqnGwAASA0CHgAMRcAD
gKEIeAAwFAEPAIYi4AHAULMG/ODgoJ588kmVlpZq06ZNOnr0qCSpqalJXq9XgUBAgUBAZ8+eTezT
0tKioqIiFRcXq6enJ7WtBwA8kDXbOvhYLKZYLKbNmzdrYmJCW7ZsUWdnp06dOqXs7Gzt37//nu17
e3u1Z88enTt3TtFoVFVVVerr69OKFfyiAADpNmvy5ufna/PmzZKklStXqqSkRNFoVJI003Ghq6tL
dXV1ysrKks/nU2FhoSKRSAqaDQCYy0OfWl+8eFHnz5/X448/Lklqa2tTRUWFGhsbNTY2JkkaGhqS
1+tN7OP1ehMHBABAerkeZqOJiQnt3r1bR44c0cqVK/Xiiy/qlVdekSQdPHhQBw4c0IkTJ2bc17Ks
h7oPADC3ZD5dZs4z+Bs3bmjXrl36wQ9+oJqaGklSXl6eLMuSZVnau3dvYhrG4/FocHAwse+lS5fk
8Xge2Ei+bP3yl790vA1L5Yu+oC/oi9m/kjVrwNu2rcbGRvn9fu3bty9x//DwcOL26dOnVVZWJkkK
hULq6OjQ1NSUBgYG1N/fr8rKyqQbBQBYuFmnaP7617/qD3/4g8rLyxUIBCRJhw4d0smTJ3XhwgVZ
lqUNGzbo+PHjkiS/36/a2lr5/X65XC4dO3aM6RgAcMisyyRTVtSy5vXrhonC4bCCwaDTzVgS6IvP
0BefoS8+k2x2EvAAsEwkm538BxIAGIqABwBDEfAAYCgCHgAMRcADgKEIeAAwFAEPAIYi4AHAUAQ8
ABiKgAcAQxHwAGAoAh4ADEXAA4ChCHgAMBQBDwCGIuABwFAEPAAYioAHAEMR8ABgKAIeAAxFwAOA
oQh4ADAUAQ8AhiLgAcBQBDwAGIqABwBDEfAAYCgCHgAM5XK6AVi6cnLWanx8NO11s7PX6OrVkbTX
BUxj2bZtp72oZcmBskiSZVmSnPg5MT6AmSSbnUzRAIChCHgAMBQBDwCGIuABwFAEPAAYataAHxwc
1JNPPqnS0lJt2rRJR48elSSNjIyourpaGzdu1Pbt2zU2NpbYp6WlRUVFRSouLlZPT09qWw8AeKBZ
l0nGYjHFYjFt3rxZExMT2rJlizo7O/W73/1O69at00svvaTDhw9rdHRUra2t6u3t1Z49e3Tu3DlF
o1FVVVWpr69PK1bcexxhmeTywDJJYGlZ1GWS+fn52rx5syRp5cqVKikpUTQa1ZkzZ1RfXy9Jqq+v
V2dnpySpq6tLdXV1ysrKks/nU2FhoSKRyHxfCwBgAR76P1kvXryo8+fP62tf+5ri8bjcbrckye12
Kx6PS5KGhob0+OOPJ/bxer2KRqMzPl9TU1PidjAYVDAYnEfzAcBc4XBY4XB43vs/VMBPTExo165d
OnLkiLKzs+95zLKsT3+Vn9mDHrs74AEAn3f/yW9zc3NS+8+5iubGjRvatWuXnn32WdXU1Ei6fdYe
i8UkScPDw8rLy5MkeTweDQ4OJva9dOmSPB5PUg0CACyOWQPetm01NjbK7/dr3759iftDoZDa29sl
Se3t7YngD4VC6ujo0NTUlAYGBtTf36/KysoUNh8A8CCzrqL5y1/+om984xsqLy9PTLW0tLSosrJS
tbW1+vDDD+Xz+XTq1CmtXr1aknTo0CG99tprcrlcOnLkiJ566qnPF2UVzbLAKhpgaUk2O/k0STwQ
AQ8sLXyaJABAEgEPAMYi4AHAUAQ8ABiKgAcAQxHwAGAoAh4ADEXAA4ChCHgAMBQBDwCGIuABwFAE
PAAYioAHAEMR8ABgKAIeAAxFwAOAoR7qotsAUi8nZ63Gx0cdqZ2dvUZXr444UhupwxWd8EDOXdEp
S9JNB+o6G3TO9bfEVbSWh2SzkzN4LEE35VTQjY9bjtQFUoE5eAAwFAEPAIYi4AHAUAQ8ABiKgAcA
QxHwAGAolkkC93B9uh4dWP4IeOAezq3BlziwYHExRQMAhiLgAcBQBDwAGIqABwBDEfAAYCgCHgAM
RcADgKEIeAAw1JwB39DQILfbrbKyssR9TU1N8nq9CgQCCgQCOnv2bOKxlpYWFRUVqbi4WD09Palp
NQBgTnNesu+tt97SypUr9dxzz+mf//ynJKm5uVnZ2dnav3//Pdv29vZqz549OnfunKLRqKqqqtTX
16cVK+49jnDJvuXBuUvIOXvpukytzXty6Us2O+c8g3/iiSe0Zs2az90/U5Guri7V1dUpKytLPp9P
hYWFikQiD90YAMDimfccfFtbmyoqKtTY2KixsTFJ0tDQkLxeb2Ibr9eraDS68FYCAJI2rw8be/HF
F/XKK69Ikg4ePKgDBw7oxIkTM277oE/ma2pqStwOBoMKBoPzaQoAGCscDiscDs97/3kFfF5eXuL2
3r17tWPHDkmSx+PR4OBg4rFLly7J4/HM+Bx3BzwA4PPuP/ltbm5Oav95TdEMDw8nbp8+fTqxwiYU
Cqmjo0NTU1MaGBhQf3+/Kisr51MCALBAc57B19XV6c0339SVK1dUUFCg5uZmhcNhXbhwQZZlacOG
DTp+/Lgkye/3q7a2Vn6/Xy6XS8eOHePiCQDgkDmXSaakKMsklwWWSWZWbd6TS9+iL5MEACxPBDwA
GIqABwBDEfAAYCgCHgAMRcADgKEIeAAwFAEPAIYi4AHAUAQ8ABiKgAcAQxHwAGAoAh4ADEXAA4Ch
CHgAMBQBDwCGIuABwFAEPAAYioAHAEMR8ABgKAIeAAxFwAOAoQh4ADAUAQ8AhiLgAcBQBDwAGIqA
BwBDEfAAYCgCHgAMRcADgKEIeAAwFAEPAIYi4AHAUAQ8ABiKgAcAQxHwAGCoOQO+oaFBbrdbZWVl
iftGRkZUXV2tjRs3avv27RobG0s81tLSoqKiIhUXF6unpyc1rQYAzGnOgH/++efV3d19z32tra2q
rq5WX1+ftm3bptbWVklSb2+vXn/9dfX29qq7u1s/+tGPND09nZqWAwBmNWfAP/HEE1qzZs099505
c0b19fWSpPr6enV2dkqSurq6VFdXp6ysLPl8PhUWFioSiaSg2QCAucxrDj4ej8vtdkuS3G634vG4
JGloaEherzexndfrVTQaXYRmAgCS5VroE1iWJcuyZn18Jk1NTYnbwWBQwWBwoU0BAKOEw2GFw+F5
7z+vgHe73YrFYsrPz9fw8LDy8vIkSR6PR4ODg4ntLl26JI/HM+Nz3B3wAIDPu//kt7m5Oan95zVF
EwqF1N7eLklqb29XTU1N4v6Ojg5NTU1pYGBA/f39qqysnE8JAMACzXkGX1dXpzfffFNXrlxRQUGB
fvWrX+lnP/uZamtrdeLECfl8Pp06dUqS5Pf7VVtbK7/fL5fLpWPHjs06fQMASB3Ltm077UUtSw6U
RZJuH5yd+Dk5VTeza/OeXPqSzU7+kxUADLXgVTRIrZyctRofH3W6GQCWIaZoljjnpkkk56YMMvE1
O1+b9+TSxxQNAEASAQ8AxiLgAcBQBDwAGIqABwBDEfAAYCgCHgAMRcADgKEIeAAwFAEPAIYi4AHA
UAQ8ABiKgAcAQxHwAGAoAh4ADEXAA4ChCHgAMBQBDwCGIuABwFAEPAAYioAHAEMR8ABgKAIeAAxF
wAOAoQh4ADAUAQ8AhiLgAcBQBDwAGIqABwBDEfAAYCgCHgAMRcADgKEIeAAwlGshO/t8PuXk5OiR
Rx5RVlaWIpGIRkZG9Mwzz+g///mPfD6fTp06pdWrVy9WewEAD2lBZ/CWZSkcDuv8+fOKRCKSpNbW
VlVXV6uvr0/btm1Ta2vrojQUAJCcBU/R2LZ9z/dnzpxRfX29JKm+vl6dnZ0LLQEAmIcFTdFYlqWq
qio98sgj+uEPf6gXXnhB8XhcbrdbkuR2uxWPx2fct6mpKXE7GAwqGAwupCkAYJxwOKxwODzv/S37
/lPwJAwPD2v9+vX66KOPVF1drba2NoVCIY2Ojia2Wbt2rUZGRu4talmfO/PHzCzLkuRUXzlVOxNf
s/O1eU8ufclm54KmaNavXy9Jys3N1c6dOxWJROR2uxWLxSTdPgDk5eUtpAQAYJ7mHfCTk5MaHx+X
JF27dk09PT0qKytTKBRSe3u7JKm9vV01NTWL01IAQFLmPUUzMDCgnTt3SpJu3ryp73//+/r5z3+u
kZER1dbW6sMPP3zgMkmmaB4eUzTUTldt3pNLX7LZuaA5+Pki4B8eAU/tdNXmPbn0pXUOHgCwdBHw
AGAoAh4ADEXAA4ChFvSfrABM4fr0D/rpl529Rlevjsy9IZJGwAOQdFNOreAZH3fmwJIJmKIBAEMR
8ABgKAIeAAxFwAOAoQh4ADAUAQ8AhiLgAcBQBDwAGIqABwBDEfAAYCg+quAh5OSs1fj46NwbAsAS
whWdHkJmXlXJydqZ+Jozu/ZyygMncUUnAIAkAh4AjEXAA4ChCHgAMBQBDwCGIuABwFAEPAAYioAH
AEMR8ABgKAIeAAxFwAOAoQh4ADAUAQ8AhiLgAcBQBDwAGGpZXPBjenpafX19mp6edropABad69Nr
LqRXdvYaXb06kva66bQsAv5Pf/qTnn46pC9+8f/SXvv69Y/SXhPILDflxMVGxsfTf1BJt5QEfHd3
t/bt26dbt25p7969+ulPf7qg57tx44a+9KUn9PHH3YvUwmS8KunlFD5/WFIwhc+/nIRFX9wRFn1x
R1j0xfws+hz8rVu39OMf/1jd3d3q7e3VyZMn9d577y12GYOEnW7AEhJ2ugFLSNjpBiwhYacbsGwt
esBHIhEVFhbK5/MpKytL3/ve99TV1bXYZQBggW7P/TvxlZOzNk2vcJFFo1EVFBQkvvd6vXrnnXcW
9JwrVqzQJ5+8q1Wrdiy0eUn75JN+Xb+e9rIAUs6ZuX8pffP/ix7wD/vX8Pn81fz69T8mvc/iSeUP
pNnB2nNJd+07fZFJr/lBtecaF6msnW5z1U5VXzj3mtOxcmjRA97j8WhwcDDx/eDgoLxe7z3b2LYz
R00AyCSLPge/detW9ff36+LFi5qamtLrr7+uUCi02GUAAHNY9DN4l8ul3/zmN3rqqad069YtNTY2
qqSkZLHLAADmkJKPKnj66af1/vvv64MPPtDx48dVXl6uQCCgyspKSVJTU5O8Xq8CgYACgYC6u51Y
355+Y2Nj2r17t0pKSuT3+/XOO+9oZGRE1dXV2rhxo7Zv366xsTGnm5kW9/fF22+/nZHj4v3330+8
3kAgoFWrVuno0aMZOS5m6osjR45k5LiQpJaWFpWWlqqsrEx79uzR9evXkx4Xlp3iCfENGzbo3Xff
1dq1ny0Lam5uVnZ2tvbv35/K0ktOfX29vvnNb6qhoUE3b97UtWvX9Oqrr2rdunV66aWXdPjwYY2O
jqq1tdXppqbcTH3x61//OiPHxR3T09PyeDyKRCJqa2vLyHFxx9198dprr2XcuLh48aK+9a1v6b33
3tMXvvAFPfPMM/r2t7+tf/3rX0mNi7R82NhMx5BM+0Prxx9/rLfeeksNDQ2Sbk9lrVq1SmfOnFF9
fb2k26HX2dnpZDPT4kF9IWXeuLjbG2+8ocLCQhUUFGTkuLjb3X1h23bGjYucnBxlZWVpcnJSN2/e
1OTkpB577LGkx0XKA96yLFVVVWnr1q367W9/m7i/ra1NFRUVamxszIhfPwcGBpSbm6vnn39eX/nK
V/TCCy/o2rVrisfjcrvdkiS32614PO5wS1Nvpr6YnJyUlHnj4m4dHR2qq6uTpIwcF3e7uy8sy8q4
cbF27VodOHBAX/7yl/XYY49p9erVqq6uTn5c2Ck2NDRk27ZtX7582a6oqLD//Oc/2/F43J6enran
p6ftX/ziF3ZDQ0Oqm+G4c+fO2S6Xy45EIrZt2/ZPfvIT++WXX7ZXr159z3Zr1qxxonlpNVNfHDx4
0L58+XLGjYs7rl+/bq9bt86+fPmybdt2Ro6LO+7vi0zMiw8++MAuKSmxr1y5Yt+4ccOuqamxf//7
3yc9LlJ+Br9+/XpJUm5urnbu3KlIJKK8vLzEv+zu3btXkUgk1c1wnNfrldfr1Ve/+lVJ0u7du/WP
f/xD+fn5isVikqTh4WHl5eU52cy0eFBf5ObmZty4uOPs2bPasmWLcnNzJd0+O8u0cXHH/X2RiXnx
97//XV//+tf16KOPyuVy6bvf/a7+9re/JZ0XKQ34yclJjY+PS5KuXbumnp4elZWVJRooSadPn1ZZ
WVkqm7Ek5Ofnq6CgQH19fZJuzzGWlpZqx44dam9vlyS1t7erpqbGyWamxYP6IhPHxR0nT55MTElI
UigUyrhxccf9fTE8PJy4nSnjori4WG+//bb+97//ybZtvfHGG/L7/cnnRQp/y7D//e9/2xUVFXZF
RYVdWlpqHzp0yLZt23722WftsrIyu7y83P7Od75jx2KxVDZjybhw4YK9detWu7y83N65c6c9NjZm
//e//7W3bdtmFxUV2dXV1fbo6KjTzUyL+/tidHQ0Y8fFxMSE/eijj9pXr15N3Jep42KmvsjUcXH4
8GHb7/fbmzZtsp977jl7amoq6XGR8mWSAABncE1WADAUAQ8AhiLgAcBQBDwAGIqABwBDEfAAYKj/
B9dw2CJD1CkpAAAAAElFTkSuQmCC
"
>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>When constructing a histogram, Python makes an automatic but sensible choice of the number of bins. If you like, you can control this yourself. For instance:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [16]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">h</span> <span class="o">=</span> <span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">bins</span><span class="o">=</span><span class="mi">25</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXgAAAD9CAYAAAC2l2x5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAGURJREFUeJzt3X1sU9f9x/FPqrCp0gghQJxBqDwNopA0kJSHadW2hjGn
WiVYShkblbpQ6DRpWlW26sfYumoP0oZZp4k+qP90XbH6xyj/NKWThrpoPYxKhbSjdNP6kD4QQWli
1CUpgTBo4P7+yGIbCqnta/sc3/t+SZZ6nXN9vzu755vLx851hed5ngAAgXON7QIAAMVBgweAgKLB
A0BA0eABIKBo8AAQUDR4AAioKRv8pk2bFIlE1NLSknpuaGhIsVhMDQ0N6ujo0MjISOpn27dv18KF
C9XY2KjnnnuueFUDAD7RlA3+zjvv1L59+y55Lh6PKxaLqa+vT6tWrVI8Hpckvfbaa3rqqaf02muv
ad++ffr+97+vixcvFq9yAMCUpmzwX/7ylzVz5sxLntu7d6+6urokSV1dXeru7pYkPfPMM9qwYYOm
TZumaDSqBQsWqLe3t0hlAwA+Sc4ZfDKZVCQSkSRFIhElk0lJ0vvvv6/6+vrUuPr6ep04caJAZQIA
clXpZ+eKigpVVFRM+fNsngMAfLJc7yyT8xV8JBLR4OCgJGlgYEC1tbWSpHnz5un48eOpce+9957m
zZt31SJ5ePr5z39uvQZXHswFc8FcTP3IR84Nfs2aNUokEpKkRCKhzs7O1PO7d+/W+fPndfToUb31
1ltasWJFXkWFRX9/v+0SnMFcpDEXacyFP1NGNBs2bND+/fv1wQcfaP78+frVr36lbdu2af369Xr8
8ccVjUa1Z88eSVJTU5PWr1+vpqYmVVZW6tFHHyWOAQCLKrx8r/3zPWBFRd7/3AgaY4za29ttl+EE
5iKNuUhjLtLy6Z00eAAoA/n0Tm5VYJExxnYJzmAu0piLNObCHxo8AAQUEQ0AlAEiGgBACg3eIvLF
NOYijblIYy78ocEDQECRwQNAGSCDRyBUVdWkbmSXzaOqqsZ2yYCTaPAWkS+mZc7F6OiwJC/rx8T4
4OC8SGMu/KHBA0BAkcHDORM3qcvlHOGcQvCRwQMAUmjwFpEvpjEXacxFGnPhDw0eAAKKDB7OIYMH
Po4MHgCQQoO3iHwxjblIYy7SmAt/aPAAEFBk8Ci6qqqaPP7aNHwZfK7zNH36TJ06NVTEiuASvpMV
TsrnTdMwNnjeXMZUeJO1zJAvZjK2C3CIsV2AM1gj/tDgASCgiGhQdEQ02SGiwVSIaAAAKTR4i8gX
MxnbBTjE2C7AGawRf2jwABBQZPAoOjL47JDBYypk8ACAFBq8ReSLmYztAhxibBfgDNaIPzR4AAgo
MngUHRl8dsjgMRUyeABACg3eIvLFTMZ2AVOqqqpRRUVF1o+qqhofRzOFKrvssUb8qbRdAFAOJm7j
m/0/j0dHK4pXDJAlMngUXRAy+FLk42TwmEpJM/jt27erublZLS0tuv3223Xu3DkNDQ0pFoupoaFB
HR0dGhkZyfflAQA+5dXg+/v79dhjj+nw4cP617/+pQsXLmj37t2Kx+OKxWLq6+vTqlWrFI/HC11v
oJAvZjK2C3CIsV2AM1gj/uTV4KuqqjRt2jSNjY1pfHxcY2Njmjt3rvbu3auuri5JUldXl7q7uwta
LAAge3m9yVpTU6N7771X1113na699lrdfPPNisViSiaTikQikqRIJKJkMnnF/Tdu3KhoNCpJqq6u
Vmtrq9rb2yWlf2OHYbu9vd2peoq5nTa53f4J2/mNd63+XI+X6/9+V/7/Ldb25HOu1FPKbWOMdu3a
JUmpfpmrvN5kfeedd7R69WodOHBAM2bM0De/+U3ddtttuvvuuzU8nP7S4JqaGg0NXfqlwLzJGj68
yerOMVC+SvYm68svv6wbb7xRs2bNUmVlpdauXasXX3xRdXV1GhwclCQNDAyotrY2n5cPjY9fHYaZ
sV2AQ4ztApzBGvEnrwbf2NiogwcP6uzZs/I8Tz09PWpqatLq1auVSCQkSYlEQp2dnQUtFgCQvbw/
B//b3/5WiURC11xzjW644Qb94Q9/0OjoqNavX69jx44pGo1qz549qq6uvvSARDShQ0TjzjFQvvLp
nfyhE4qOBu/OMVC+uNlYmSFfzGRsF+AQY7sAZ7BG/KHBIwAqc7oRmP+bgQHlgYgGRVeKiCa38RP7
5HIeEtHANiIa5Ky0t8EFUEo0eItcyBfTt8HN7jExvhhMkV63HBnbBTjDhTVSzmjwABBQZPAh52q2
TAZfnGOgfJHBAwBSaPAWkS9mMrYLcIixXYAzWCP+0OABIKDI4EPO1WyZDL44x0D5IoMHAKTQ4C0i
X8xkbBfgEGO7AGewRvyhwQNAQJHBh1zuue80SeN5HIkM3oVjoHzl0zvz+tJthNm48mmmAEqPiMYi
8sVMxnYBDjG2C3AGa8QfGjwABBQZfMgV/zPq+exDBl+sY6B88Tl4AEAKDd4i8sVMxnYBDjG2C3AG
a8QfGjwABBQZfMiRwWc5mgwelpHBAwBSaPAWkS9mMrYLcIixXYAzWCP+0OABIKDI4EOODD7L0WTw
sIwMHgCQQoO3iHwxk7FdgEOM7QKcwRrxhwYPAAFFBh9yZPBZjnb0vvmspfDgfvCAM7hvPuwjorGI
fDGTsV2AQ4ztApzBGvGHBg8AAUUGH3Jk8FmOdnSeWEvhUdLPwY+MjGjdunVatGiRmpqadOjQIQ0N
DSkWi6mhoUEdHR0aGRnJ9+UBAD7l3eDvuece3XLLLXr99df1z3/+U42NjYrH44rFYurr69OqVasU
j8cLWWvgkC9mMrYLcIixXYAzWCP+5NXgP/zwQx04cECbNm2SJFVWVmrGjBnau3evurq6JEldXV3q
7u4uXKUAgJzklcEfOXJE3/ve99TU1KRXX31VS5cu1c6dO1VfX6/h4WFJkud5qqmpSW2nDkgG7xRX
s2Uy+OzGs5bCo2Sfgx8fH9fhw4f1yCOPaPny5dqyZcvH4piKior/LYqP27hxo6LRqCSpurpara2t
am9vl5T+JxnbpdmeYCS1Z/y3ptjOdfzktj7h53bGZztfxa8nv23b5w/bxds2xmjXrl2SlOqXucrr
Cn5wcFBf/OIXdfToUUnSCy+8oO3bt+vdd9/V888/r7q6Og0MDGjlypV64403Lj0gV/ApxpjLGm3p
uXNlapRuYmG/gje69Bfq1ccHfS25sEZcUbJP0dTV1Wn+/Pnq6+uTJPX09Ki5uVmrV69WIpGQJCUS
CXV2dubz8gCAAsj7c/Cvvvqq7rrrLp0/f16f//zn9cQTT+jChQtav369jh07pmg0qj179qi6uvrS
A3IF7xR3ruBLOX5iH3ev4LMfz1oKj3x6J3/oFHKuNi4afHbjWUvhwRd+lBk+45vJ2C7AIcZ2Ac5g
jfhDgweAgCKiCTlXowcimuzGs5bCg4gGAJBCg7eIfDGTsV2AQ4ztApzBGvGHBg8AAUUGH3KuZstk
8NmNZy2FBxk8ACCFBm8R+WImY7sAhxjbBTiDNeIPDR4AAooMPuRczZbJ4LMbz1oKDzJ4AEAKDd4i
8sVMxnYBDjG2C3AGa8QfGjwABBQZfMi5mi2TwWc3nrUUHmTwQKhUpr77OJtHVVWN7YJRYjR4i8gX
MxnbBTjEZDluXBNX/Nk9RkeHC15psbFG/KHBA0BAkcGHnKvZMhl8ccaz9soXGTwAIIUGbxH5YiZj
uwCHGNsFOIM14g8NHgACigw+5IKSLZPBZzeetVe+yOABACk0eIvIFzMZ2wU4xNguwBmsEX9o8AAQ
UGTwIReUbJkMPrvxrL3yRQYPAEihwVtEvpjJ2C7AIcZ2Ac5gjfhDgweAgCKDD7mgZMtk8NmNZ+2V
LzJ4AEAKDd4i8sVMxnYBDjG2C3AGa8QfGnzAVFXV5PQtPwCCiww+YIqfFbuZLZPBZzeetVe+yOAB
ACm+GvyFCxfU1tam1atXS5KGhoYUi8XU0NCgjo4OjYyMFKTIoCJfzGRsF+AQY7sAZ7BG/PHV4B98
8EE1NTWlstx4PK5YLKa+vj6tWrVK8Xi8IEUCAHKXdwb/3nvvaePGjbrvvvv0+9//Xs8++6waGxu1
f/9+RSIRDQ4Oqr29XW+88calBySDLyoy+Oz3IYNHOSlpBv/DH/5QDzzwgK65Jv0SyWRSkUhEkhSJ
RJRMJvN9eQCAT5X57PTnP/9ZtbW1amtru2pGNtXH8DZu3KhoNCpJqq6uVmtrq9rb2yWlM7cwbGfO
XaFef4KR1J7x35piu9jjJ7eVxc/bcxif6+tfeXy281v8ejK3j0ja4mP/q2+7dP5ns71z585Q94dd
u3ZJUqpf5iqviOanP/2pnnzySVVWVuq///2vTp06pbVr1+qll16SMUZ1dXUaGBjQypUriWimYIy5
rDH7V74RjVG6KYU9ojG69BdqoY5RfmuvGGukXOXTO31/Dn7//v363e9+p2effVZbt27VrFmz9OMf
/1jxeFwjIyMfe6OVBl9c5dvgSzl+Yh93G3yxxk+TNJ5TRdOnz9SpU0M57YPisPY5+MkoZtu2bfrr
X/+qhoYG/e1vf9O2bdsK8fIACmJcE78Qsn+Mjg7bKRUFwV+yWkREk8mIiKb4EU2x56nQiGjS+EtW
AEAKV/ABU75X8KUcP7GPu1fwroyf2If16gau4AEAKTR4i7jPRiZjuwCHGNsFOIM14g8NHgACigw+
YMjgs9+HDD67fVivbiCDBwCk0OAtIl/MZGwX4BBjuwBnsEb8ocEDQECRwQcMGXz2+5DBZ7cP69UN
ZPAAgBQavEXki5mM7QIcYmwX4AzWiD80eAAIKDL4gCGDz34fMvjs9mG9uoEMHgCQQoO3iHwxk7Fd
gEOM7QKcwRrxhwYPAAFFBh8wZPDZ70MGn90+rFc3kMEDAFJo8BaRL2YytgtwiLFdgDNYI/7Q4AEg
oMjgA4YMPvt9yOCz24f16gYyeABACg3eIvLFTMZ2AQ4xtgtwBmvEHxo8AAQUGXzAkMFnvw8ZfHb7
sF7dQAYPoMAqVVFRkfWjqqrGdsHIQIO3iHwxk7FdgEOM7QIyjGviqj+7x+jocEGPzhrxp9J2AWFS
VVWT4wKYJumjYpUDIODI4EsozDmuizWRwRfnGGFd38VGBg8ASKHBW2VsF+AQY7sAhxjbBTiDDN4f
GjwABBQZfAmR47oyfmIfMvjiHCOs67vYyOABACk0eKuM7QIcYkp8vNz+gKe0TImP5y4yeH/yavDH
jx/XypUr1dzcrOuvv14PPfSQJGloaEixWEwNDQ3q6OjQyMhIQYsFCie3P+ABylFeGfzg4KAGBwfV
2tqq06dPa+nSperu7tYTTzyh2bNna+vWrdqxY4eGh4cVj8cvPSAZfC575Dg+n32oyY3xpTgGGXw5
K1kGX1dXp9bWVknSZz7zGS1atEgnTpzQ3r171dXVJUnq6upSd3d3Pi8PACgA37cq6O/v1yuvvKIv
fOELSiaTikQikqRIJKJkMnnFfTZu3KhoNCpJqq6uVmtrq9rb2yWlM7egbqfz1XZdmrVe6ef5bE8+
58r4yW1l8fP2HMbn+vquj8/cPiJpi4/9C7k9+Vz2440xBVsvO3fuDFV/yNw2xmjXrl2SlOqXufL1
McnTp0/rpptu0v3336/Ozk7NnDlTw8Ppe63U1NRoaGjo0gMS0WQ8Y3TpQvrYHgrPP/ON0nPhSk2l
HJ+5j9HU54WNmrIfX8j1nfnLIuxK+jHJjz76SLfddpvuuOMOdXZ2Spq4ah8cHJQkDQwMqLa2Nt+X
D4l22wU4pN12AQ5pt12AM2ju/uTV4D3P0+bNm9XU1KQtW7aknl+zZo0SiYQkKZFIpBo/AKD08opo
XnjhBX3lK1/R4sWLU58R3r59u1asWKH169fr2LFjikaj2rNnj6qrqy89IBFNxjNGRDSTjIhoiGgu
R0STlk/vzOtN1i996Uu6ePHiFX/W09OTz0sCAAqMe9GUEJ+Dd2V8KY4R3prCur6LjXvRAABSaPBW
GdsFOMTYLsAhxnYBzuBeNP7Q4AEgoMjgS4gM3pXxpThGeGsK6/ouNjJ4AEAKDd4qY7sAhxjbBTjE
2C7AGWTw/tDgASCgyOBLiAzelfGlOEZ4awrr+i42MngAQAoN3ipjuwCHGNsFOMTYLsAZZPD+0OAB
IKDI4EuIDN6V8aU4RnhrCuv6LjYyeABACg3eKmO7AIcY2wU4xNguwBlk8P7Q4AEgoMjgS4gM3pXx
pThGWGuaJmk869HTp8/UqVNDOdYUTiX7RicAuLJx5fILYXS0onilgIjGLmO7AIcY2wU4xNguwBlk
8P7Q4AEgoMjgS4gM3pXxpTgGNWU7Pqz9IFd8Dh4AkEKDt8rYLsAhxnYBDjG2C3AGGbw/NHgACCgy
+BIig3dlfCmOQU3Zjg9rP8gVGTwAIIUGb5WxXYBDjO0CHGJsF+AMMnh/aPAAEFBk8CVEBu/K+FIc
g5qyHR/WfpArMngAQAoN3ipjuwCHGNsFOMTYLqCEKlVRUZHTo6qqxnbRZYMGD8CiybtPXu3x/Mee
Gx0dtlNqGQpsBl9VVZPTiVCK+1KTwbsyvhTHoKbijJ/YJ4y5fT69M7ANPp9mWuy6aPCujC/FMaip
OOMn9qHBZ4eIxipjuwCHGNsFOMTYLsAhxnYBZa3gDX7fvn1qbGzUwoULtWPHjkK/fMAcsV2AQ5iL
NOYijbnwo6Bf2XfhwgX94Ac/UE9Pj+bNm6fly5drzZo1WrRoka/Xvfvu/9Pjj/+hQFW6ZMR2AQ5h
LtKYizTmwo+CNvje3l4tWLBA0WhUkvTtb39bzzzzjO8G/847x3X27AOSbstyj6clbc7xKJX/y8hz
MU3SRznuA8CfXNdqrus0n3Wd2z6l+rLxgjb4EydOaP78+ant+vp6HTp0yPfrTpt2ja699lF96lPP
ZDX+/Pn3dfZsrkfJ7cuCJ+TzhlKm/hyPF2T9tgtwSL/tAhzSf4Xncl2r7r0ZXaovGy9og8/2t2ru
V8oTzp59Jcc9cj1OPnX5PUaiwK+fzz6uzFPmXLhSUynHZ+7zSedFvscox3m60lzYrsn/Pvn2wVwU
tMHPmzdPx48fT20fP35c9fX1l4wJ48ebAMCGgn6KZtmyZXrrrbfU39+v8+fP66mnntKaNWsKeQgA
QJYKegVfWVmpRx55RDfffLMuXLigzZs3+36DFQCQn4J/Dv7rX/+63nzzTb399tv6yU9+omg0qsWL
F6utrU0rVqyQJP3iF79QfX292tra1NbWpn379hW6DCeNjIxo3bp1WrRokZqamnTo0CENDQ0pFoup
oaFBHR0dGhkJx8fCLp+LgwcPhvK8ePPNN1P/e9va2jRjxgw99NBDoTwvrjQXDz74YCjPC0navn27
mpub1dLSottvv13nzp3L+bwo+q0KPve5z+kf//iHamrSd4D75S9/qenTp+tHP/pRMQ/tnK6uLt10
003atGmTxsfHdebMGf3617/W7NmztXXrVu3YsUPDw8OKx+O2Sy26K83Fzp07Q3leTLp48aLmzZun
3t5ePfzww6E8LyZlzsUf//jH0J0X/f39+upXv6rXX39dn/70p/Wtb31Lt9xyi/7973/ndF6U5FYF
V/odErY3Wz/88EMdOHBAmzZtkjQRZ82YMUN79+5VV1eXpImm193dbbPMkrjaXEjhOy8y9fT0aMGC
BZo/f34oz4tMmXPheV7ozouqqipNmzZNY2NjGh8f19jYmObOnZvzeVH0Bl9RUaGvfe1rWrZsmR57
7LHU8w8//LCWLFmizZs3h+Kfn0ePHtWcOXN055136oYbbtB3v/tdnTlzRslkUpFIRJIUiUSUTCYt
V1p8V5qLsbExSeE7LzLt3r1bGzZskKRQnheZMueioqIidOdFTU2N7r33Xl133XWaO3euqqurFYvF
cj8vvCJ7//33Pc/zvJMnT3pLlizx/v73v3vJZNK7ePGid/HiRe++++7zNm3aVOwyrHvppZe8yspK
r7e31/M8z7vnnnu8n/3sZ151dfUl42bOnGmjvJK60lzcf//93smTJ0N3Xkw6d+6cN3v2bO/kyZOe
53mhPC8mXT4XYewXb7/9trdo0SLvgw8+8D766COvs7PTe/LJJ3M+L4p+Bf/Zz35WkjRnzhzdeuut
6u3tVW1tberbWe666y719vYWuwzr6uvrVV9fr+XLl0uS1q1bp8OHD6uurk6Dg4OSpIGBAdXW1tos
sySuNhdz5swJ3Xkx6S9/+YuWLl2qOXPmSJq4OgvbeTHp8rkIY794+eWXdeONN2rWrFmqrKzU2rVr
9eKLL+bcL4ra4MfGxjQ6OipJOnPmjJ577jm1tLSkCpSkp59+Wi0tLcUswwl1dXWaP3+++vr6JE1k
jM3NzVq9erUSiYm/1EskEurs7LRZZklcbS7CeF5M+tOf/pSKJCRpzZo1oTsvJl0+FwMDA6n/Dst5
0djYqIMHD+rs2bPyPE89PT1qamrKvV8U8V8Z3rvvvustWbLEW7Jkidfc3Oz95je/8TzP8+644w6v
paXFW7x4sfeNb3zDGxwcLGYZzjhy5Ii3bNkyb/Hixd6tt97qjYyMeP/5z3+8VatWeQsXLvRisZg3
PDxsu8ySuHwuhoeHQ3tenD592ps1a5Z36tSp1HNhPS+uNBdhPS927NjhNTU1eddff733ne98xzt/
/nzO50XJv9EJAFAafKMTAAQUDR4AAooGDwABRYMHgICiwQNAQNHgASCg/h+AKTHGCd5KJwAAAABJ
RU5ErkJggg==
"
>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The horizontal axis of the histogram is always in the units of the variable. For the histograms above, the horizontal axis is in “inches” because that is the unit of the height variable in the galton dataset. The vertical axis is conventionally drawn in one of two ways, controlled by an
optional argument named <code>normed</code>:</p>
<ul>
<li>Absolute Frequency or Counts - A simple count of the number of cases that falls into each bin. This is the default, as in:<br>
<code>gal.height.hist()</code></li>
<li>Normalized - The vertical axis <em>area</em> of the bar gives the relative proportion of cases that fall into the bin. In other words, the areas can be interpreted as probabilities and the area under the entire histogram is equal to 1. Set the <code>normed</code> argument to <code>True</code>, as in:<br>
<code>gal.height.hist(normed=True)</code></li>
</ul>
<p>You can also produce a histogram of relative frequencies, where the vertical axis is scaled so that the height of the bar give the proportion of cases that fall into the bin:</p>
<pre><code>gal.height.hist(weights=np.zeros_like(gal.height) + 100. / len(gal.height))</code></pre>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Other useful optional ‘pylab’ commands set the labels for the axes and the graph as a whole and color the bars. For example,</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [17]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">xlabel</span><span class="p">(</span><span class="s">"Height (inches)"</span><span class="p">)</span>
<span class="n">ylabel</span><span class="p">(</span><span class="s">"Density"</span><span class="p">)</span>
<span class="n">title</span><span class="p">(</span><span class="s">"Distribution of Heights"</span><span class="p">)</span>
<span class="n">grid</span><span class="p">()</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">normed</span><span class="o">=</span><span class="k">True</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"grey"</span><span class="p">)</span>
<span class="n">show</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAYsAAAEXCAYAAABcRGizAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAIABJREFUeJzt3XlYVNf5B/DvsMQo27gQCIuOBiKLAipqXH7RVI1CAtUU
IzZuERPqU6NJbdU2aUsSEyStO2mLtuJCoqTxqRIlY2sbmsQUiQaTNGgC6sgmLomAiBEczu8PO3MY
WS4wzgzOfD/PM8/jnTn33jOv98x773vmDiohhAAREVE7nGzdASIi6v6YLIiISBGTBRERKWKyICIi
RUwWRESkiMmCiIgUMVmQ1S1evBirV6++I9sqLS2Fh4cHDN8AnzhxIv7yl7/ckW0DQGxsLHbt2nXH
ttdRL730Ery9veHn52eR7Q8ZMgQffvhhh9pqNBr885//tEg/6O7BZEF3lEajQa9eveDp6YnevXtj
3LhxyMjIQPPbef74xz/ipZde6tC2/vWvf7Xbpn///rh69SpUKhUAQKVSGf/dWSkpKZg7d67Jc7m5
uS2es7TS0lKsW7cOp06dQmVlZYvX8/LyEBgY2OL5ziTK//73v3j44Yc71La9mLbVF7I/TBZ0R6lU
Khw4cAC1tbUoLS3FqlWrkJaWhqSkpC5tq717Rm/evGlOV7ut0tJS9O3bF3379u3UeuYkSiIlTBZk
MR4eHoiLi0N2djZ27NiBoqIiAMCCBQvw61//GgBw+fJlPP744+jduzf69u2Lhx9+GEIIzJ07F6Wl
pYiLi4OHhwd+//vfQ6fTwcnJCdu2bcOAAQMwefJknDt3Dk5OTmhqajLut6SkBKNHj4aXlxemT5+O
K1euAGj9LNhQYtFqtUhNTUV2djY8PDwwbNgwAKZn60IIrF69GhqNBj4+Ppg/fz5qa2sBwNi3nTt3
YsCAAfD29sbrr7/eZmxqamowb9483HfffdBoNHjttdcghMDhw4fx6KOPorKyEh4eHli4cGGX43/g
wAFERUUZr/C+/PLLFu8bAK5fv4758+ejT58+CAsLwxtvvNEiToWFhYiMjIRarUZiYiJu3LiBa9eu
ISYmxthXT09PVFVVoaCgANHR0fDy8oKvry+WL1/e5fdA3QeTBVncyJEjERAQgI8++giA6Rnw2rVr
ERgYiMuXL+PixYtITU2FSqXCrl270L9/fxw4cABXr17Fz3/+c+P2PvzwQ5w6dQqHDh1qceUhhMDO
nTuRmZmJ8+fPw8XFBUuXLm2zb4a+TJs2Db/61a+QmJiIq1evorCwsEVfMzMzsWPHDuTl5eHMmTOo
q6vDkiVLTLZ35MgRfPPNN/jnP/+JV155BadOnWp1v8899xyuXr2Ks2fP4t///rexz5MnT8b7778P
Pz8/XL16Fdu2betktG8pLCxEUlIStm7diu+++w7JycmIj49HY2Nji/f18ssvo7S0FGfPnsU//vEP
ZGVlmVyhCCHw17/+FYcOHcLZs2fxxRdfYPv27XBzc4NWqzX2tba2Fr6+vli2bBleeOEF1NTU4MyZ
M3jyySe79B6oe2GyIKvw8/PDd9991+L5e+65B+fPn4dOp4OzszPGjRunuK2UlBT07NkTPXr0aPGa
SqXCvHnzEBYWhl69euHVV1/FO++80245y0AI0W67t956C8uXL4dGo4GbmxtSU1OxZ88ek6ua3/72
t+jRowciIiIQGRmJzz//vMV29Ho9srOzkZqaCjc3NwwYMADLly83TqR3pK+VlZXo3bu3yePjjz82
vr5lyxYkJydj5MiRxpj06NED+fn5Lbb117/+Fb/61a/g5eUFf39/LFu2zKQPKpUKS5cuha+vL3r3
7o24uDicOHGizb7ec889KC4uxuXLl9GrVy+MHj1a8f1Q98dkQVZRXl6OPn36GJcNHzK/+MUvEBQU
hEcffRQPPPAA0tLSFLelNKHa/PX+/fujsbERly9f7mLPpfPnz2PAgAEm27558yYuXLhgfM7X19f4
7169euHatWsttnP58mU0Nja22FZFRUWH++Ln54crV66YPMaPH298/dy5c1i7dq1JMikvL291wryy
stIkZgEBAS3aNH9fPXv2RF1dXZt9+8tf/oJvvvkGoaGhGDVqFA4ePNjh90XdF5MFWdynn36KyspK
kw8zA3d3d/z+97/H6dOnkZOTg3Xr1uGDDz4AgDYna5UmcUtLS03+7erqin79+sHNzQ319fXG1/R6
PS5dutTh7fr5+UGn05ls28XFBT4+Pu2ud7t+/frB1dW1xbZa+5Duqv79++PFF180SSZ1dXWYNWtW
i7b3338/ysrKjMvN/62ktZgFBQXh7bffxqVLl7By5UokJCTg+vXrXXsj1G0wWdAdZ7hqqK2txYED
BzB79mzMnTsX4eHhJq8DtyZhS0pKIISAp6cnnJ2d4eR067D08fHB6dOnO73vrKwsnDx5EvX19fjN
b36DmTNnQqVS4cEHH8T333+P3NxcNDY2YvXq1bhx44ZxXV9fX+h0ujbLQLNnz8b69euh0+lQV1dn
nOMw9Le9WDTn7OyMJ598Ei+++CLq6upw7tw5rF+/HnPmzOnUe23PM888gz/96U8oKCiAEALXrl3D
wYMHW70iePLJJ5Gamorq6mpUVFQgPT29w9+q8vHxwbfffmuc6AeArKwsYxL28vKCSqVqN0Z0d+D/
IN1xcXFx8PT0RP/+/ZGamorly5cjMzPT+HrzydWSkhJMmTIFHh4eGDt2LH76059iwoQJAIBf/vKX
WL16NXr37o1169YZ171d8+cM9fkFCxbg/vvvR0NDAzZt2gTg1gfXH/7wByxatAgBAQFwd3c3Kb/M
nDkTANC3b19ER0e32M/ChQsxd+5cPPzwwxg0aBB69eqFzZs3t9qP9p4DgM2bN8PNzQ2DBg3C//3f
/+Gpp57C008/rbheR18fMWIEtm7diiVLlqBPnz4IDg7Gzp07W13vN7/5DQICAjBw4EA8+uijmDlz
Ju655552923YTkhICGbPno1BgwahT58+OH/+PA4dOoQhQ4bAw8MDL7zwAvbs2dPq/BLdXVT840dE
1Nwf//hHvPPOO8ZyIBFghSsLrVaLkJAQBAcHtzp5eerUKYwZMwb33nsv1q5da3y+rKwMjzzyCMLD
wzFkyBDj2SER3VlVVVU4cuQImpqa8PXXX2PdunWYMWOGrbtF3YxFryz0ej0GDx6Mw4cPw9/fHyNH
jsTu3bsRGhpqbHPp0iWcO3cO+/btQ+/evY038FRVVaGqqgpRUVGoq6vDiBEjsG/fPpN1ich8paWl
eOyxx3D27Fmo1WrMnj0bqampcHFxsXXXqBux6NFQUFCAoKAgaDQaAEBiYiL2799v8oHv7e0Nb2/v
Fl+v8/X1NX5dz93dHaGhoaisrGSyILrD+vfvb3J3N1FrLFqGqqioaPH97c58l9xAp9OhsLCQN/cQ
EdmIRa8s7sSPmtXV1SEhIQEbN26Eu7v7Hd8+EZEj6uwMhEWvLPz9/Vvc7NOZG48aGxvxox/9CHPm
zMH06dNbbWP4iQZHf/z2t7+1eR+6y4OxYCwYi/YfXWHRZBEdHY3i4mLodDo0NDQgOzsb8fHxrba9
/Q0IIZCUlISwsDA8//zzluymXWh+N7CjYywkxkJiLMxj0TKUi4sL0tPTMXXqVOj1eiQlJSE0NBQZ
GRkAgOTkZFRVVWHkyJGora2Fk5MTNm7ciKKiIpw4cQJZWVmIiIgw/lx0amoqpk2bZskuExFRK+7q
m/KU/jiOI8nLy8PEiRNt3Y1ugbGQGAuJsZC68tnJZEFE5GC68tnJ34ayE3l5ebbuQrfBWEiMhcRY
mIfJgoiIFLEMRUTkYFiGIiIii2CysBOsx0qMhcRYSIyFeZgsiIhIEecsiIgcDOcsiIjIIpgs7ATr
sRJjITEWEmNhHiYLIiJSxDkLIiIHwzkLIiKyCCYLO8F6rMRYSIyFxFiYh8mCiIgUcc6CiMjBcM6C
iIgsgsnCTrAeKzEWEmMhMRbmYbIgIiJFnLMgInIwnLMgIiKLYLKwE6zHSoyFxFhIjIV5mCyIiEgR
5yyIiBwM5yyIiMgimCzsBOuxEmMhMRYSY2EeJgsiIlLEOQsiIgfT7eYstFotQkJCEBwcjLS0tBav
nzp1CmPGjMG9996LtWvXdmpdunuo1WqoVCqbPNRqta3fPpFdsNiVhV6vx+DBg3H48GH4+/tj5MiR
2L17N0JDQ41tLl26hHPnzmHfvn3o3bs3li9f3uF1AV5ZNJeXl4eJEyfauhutUqlUSElJsdr+dDod
NBoNACAlJcWhj5HufFxYG2Mhdasri4KCAgQFBUGj0cDV1RWJiYnYv3+/SRtvb29ER0fD1dW10+sS
EZH1WCxZVFRUIDAw0LgcEBCAiooKi6/rqHjGJBmuKojHRXOMhXlcLLVhlUpllXUXLFhg/HBQq9WI
iooyHhSGr8px2bbLBjqdDoD8MLfWskF3iQeXuWzt5by8PGzfvh1A10+mLDZnkZ+fj5SUFGi1WgBA
amoqnJycsHLlyhZtX375Zbi7uxvnLDq6LucspLxuXI/lnIXtdOfjwtoYC6lbzVlER0ejuLgYOp0O
DQ0NyM7ORnx8fKttb+90Z9YlIiLLs1gZysXFBenp6Zg6dSr0ej2SkpIQGhqKjIwMAEBycjKqqqow
cuRI1NbWwsnJCRs3bkRRURHc3d1bXZfaxjMmiXMWEo8LibEwD2/KI4uzdhmqOUcvQxG1pluVoci6
bp9MdmS3T2w7Mh4XEmNhHiYLIiJSxGRhJ1iPlThnIfG4kBgL8zBZEBGRIiYLO8F6rMQ5C4nHhcRY
mIfJgoiIFDFZ2AnWYyXOWUg8LiTGwjxMFkREpIjJwk6wHitxzkLicSExFuZhsiAiIkVMFnaC9ViJ
cxYSjwuJsTAPkwURESlisrATrMdKnLOQeFxIjIV5mCyIiEgRk4WdYD1W4pyFxONCYizMw2RBRESK
mCzsBOuxEucsJB4XEmNhHiYLIiJSxGRhJ1iPlThnIfG4kBgL8zBZEBGRIiYLO8F6rMQ5C4nHhcRY
mIfJgoiIFLnYugN0Z7AeK3HOAlCr1aipqbH6fr28vFBdXW31/XYEx4h5mCyI7FBNTQ1SUlKsvl9b
7JOsg2UoO8F6rNR8zsLJyQkqlcomD7Vabbsg/A/nbySOEfPwyoLsWlNTk83OdnmWTfaEVxZ2gvVY
iXMWEmMhcYyYh8mCiIgUMVnYCdZjJdbpJcZC4hgxj0WThVarRUhICIKDg5GWltZqm6VLlyI4OBiR
kZEoLCw0Pp+amorw8HAMHToUP/7xj3Hjxg1LdpWIiNphsWSh1+uxZMkSaLVaFBUVYffu3Th58qRJ
m9zcXJSUlKC4uBhbtmzB4sWLAdw6G9q6dSs+++wzfPnll9Dr9dizZ4+lumoXWI+VWKeXGAuJY8Q8
FksWBQUFCAoKgkajgaurKxITE7F//36TNjk5OZg/fz4AYPTo0aiursaFCxfg6ekJV1dX1NfX4+bN
m6ivr4e/v7+lukpkEbb82i7RnWaxr85WVFQgMDDQuBwQEICjR48qtqmoqMDw4cOxfPly9O/fHz17
9sTUqVMxefLkVvezYMEC49mTWq1GVFSU8QzCUKN0hOXm9dju0J/mywaG+rnh/8tSy4bnbq/XW2v/
huWmpiaT49Oa+09JSYFOp0NVVRUeeughq+7foLscf4blDRs2OPTnw/bt2wF0/WpTJYQQXVpTwd69
e6HVarF161YAQFZWFo4ePYrNmzcb28TFxWHVqlUYN24cAGDy5Ml444034OXlhbi4OHz00Ufw8vLC
zJkzkZCQgKeeesq08yoVLNT9u05eXl63vcxWqVRWvedAp9MZB0RKSopN77Ow9b6bx8Ja++2uY7I7
jxFr68pnp8XKUP7+/igrKzMul5WVISAgoN025eXl8Pf3x7FjxzB27Fj07dsXLi4ueOKJJ/DJJ59Y
qqt2gYNAYp1eYiwkjhHzWCxZREdHo7i4GDqdDg0NDcjOzkZ8fLxJm/j4eOzcuRMAkJ+fD7VaDR8f
HwwePBj5+fm4fv06hBA4fPgwwsLCLNVVIiJSYLFk4eLigvT0dEydOhVhYWGYNWsWQkNDkZGRgYyM
DABAbGwsBg0ahKCgICQnJ+MPf/gDACAqKgrz5s1DdHQ0IiIiAADPPvuspbpqF/gdcon3FkiMhcQx
Yh6L/jZUTEwMYmJiTJ5LTk42WU5PT2913RUrVmDFihUW6xsREXUc7+C2E6zHSqzTS4yFxDFiHiYL
IiJSxGRhJ1iPlVinlxgLiWPEPEwWRESkiMnCTrAeK7FOLzEWEseIeZgsiIhIEZOFnWA9VmKdXmIs
JI4R8zBZEBGRIiYLO8F6rMQ6vcRYSBwj5mGyICIiRUwWdoL1WIl1eomxkDhGzMNkQUREipgs7ATr
sRLr9BJjIXGMmIfJgoiIFCkmiyeeeAIHDx5EU1OTNfpDXcR6rMQ6vcRYSBwj5lFMFosXL8Zbb72F
oKAgrFq1Cl9//bU1+kVERN2IYrKYMmUK3n77bXz22WfQaDSYNGkSxo4di8zMTDQ2Nlqjj9QBrMdK
rNNLjIXEMWKeDs1ZfPvtt9i+fTv+/Oc/Y/jw4Vi6dCmOHz+OKVOmWLp/RETUDSgmixkzZmD8+PGo
r6/He++9h5ycHCQmJiI9PR1Xr161Rh+pA1iPlVinlxgLiWPEPIp/g/uZZ55BbGysyXM3btxAjx49
cPz4cYt1jIiIug/FK4sXX3yxxXNjxoyxSGeo61iPlVinlxgLiWPEPG1eWZw/fx6VlZW4fv06Pvvs
MwghoFKpUFtbi/r6emv2kYiIbKzNZHHo0CHs2LEDFRUVWL58ufF5Dw8PvP7661bpHHVcXl4ez5z+
R6fT8Yz6fxgLiWPEPG0miwULFmDBggXYu3cvfvSjH1mzT0RE1M20mSx27dqFuXPnQqfTYd26dcbn
DeWon/3sZ1bpIHUMz5gknklLjIXEMWKeNpOFYV7i6tWrUKlUxucNyYKIiBxHm8kiOTkZAJCSkmKt
vpAZWI+VWKeXGAuJY8Q8il+dXbFiBWpra9HY2IhJkyahX79+2LVrlzX6RkRE3YRisjh06BA8PT1x
4MABaDQanD59Gr/73e86tHGtVouQkBAEBwcjLS2t1TZLly5FcHAwIiMjUVhYaHy+uroaCQkJCA0N
RVhYGPLz8zv4lhwTz5gknklLjIXEMWIexWRx8+ZNAMCBAweQkJAALy+vDs1Z6PV6LFmyBFqtFkVF
Rdi9ezdOnjxp0iY3NxclJSUoLi7Gli1bsHjxYuNry5YtQ2xsLE6ePIkvvvgCoaGhnX1vRER0hygm
i7i4OISEhOD48eOYNGkSLl68iHvvvVdxwwUFBQgKCoJGo4GrqysSExOxf/9+kzY5OTmYP38+AGD0
6NGorq7GhQsXUFNTg48++ggLFy4EALi4uMDLy6sr789h8HdvJP4eksRYSBwj5lH8bag1a9bgF7/4
BdRqNZydneHm5tbiQ781FRUVCAwMNC4HBATg6NGjim3Ky8vh7OwMb29vPP300/j8888xYsQIbNy4
Eb169WqxnwULFhgvtdVqNaKiooyXm4aDg8u2XTYwfHAZ/r8stXz7/qy9/+aln+YTzNbev06nQ1VV
ldX3b9Bdjj/D8okTJ7pVf6y5nJeXh+3btwPoemlSJYQQSo2OHDmCc+fOGf9+hUqlwrx589pdZ+/e
vdBqtdi6dSsAICsrC0ePHsXmzZuNbeLi4rBq1SqMGzcOADB58mS88cYbaGpqwpgxY/DJJ59g5MiR
eP755+Hp6YlXXnnFtPMqFTrQfbIxlUpls2/VpaSkcN9W3i/HZPfXlc9OxSuLOXPm4MyZM4iKioKz
s7PxeaVk4e/vj7KyMuNyWVkZAgIC2m1TXl4Of39/CCEQEBCAkSNHAgASEhKwZs2ajr0jIiK64xST
xfHjx1FUVNTpG/Gio6NRXFwMnU4HPz8/ZGdnY/fu3SZt4uPjkZ6ejsTEROTn50OtVsPHxwcAEBgY
iG+++QYPPvggDh8+jPDw8E7t39HwO+QS7y2QGAuJY8Q8isliyJAhOH/+PPz8/Dq3YRcXpKenY+rU
qdDr9UhKSkJoaCgyMjIA3LrpLzY2Frm5uQgKCoKbmxsyMzON62/evBlPPfUUGhoa8MADD5i8RkRE
1qWYLC5duoSwsDCMGjUKPXr0AHCr3pWTk6O48ZiYGMTExJg8Z7gz3CA9Pb3VdSMjI/Hpp58q7oNu
4RmTxDNpibGQOEbMo5gsDJNkzSdE+NtQRESORfE+i4kTJ0Kj0aCxsRETJ07EqFGjMGzYMGv0jTqB
3yGXeG+BxFhIHCPmUUwWW7ZswcyZM43lo/LycsyYMcPiHSMiou5DMVm8+eab+Pjjj+Hp6QkAePDB
B3Hx4kWLd4w6h/VYiXV6ibGQOEbMo5gsevToYZzYBm79VhTnLIiIHItispgwYQJee+011NfX4x//
+AdmzpyJuLg4a/SNOoH1WIl1eomxkDhGzKOYLNasWQNvb28MHToUGRkZiI2NxerVq63RNyIi6iYU
vzrr7OyM6dOnY/r06bjvvvus0SfqAtZjJdbpJcZC4hgxT5tXFkIIpKSkoF+/fhg8eDAGDx6Mfv36
4eWXX+YPhREROZg2k8X69etx5MgRfPrpp7hy5QquXLmCgoICHDlyBOvXr7dmH6kDWI+VWKeXGAuJ
Y8Q8bSaLnTt34u2338bAgQONzw0aNAhvvfUWdu7caZXOERFR99Bmsrh58ya8vb1bPO/t7W38U6vU
fbAeK7FOLzEWEseIedpMFq6urm2u1N5rRERkf9pMFl988QU8PDxafXz55ZfW7CN1AOuxEuv0EmMh
cYyYp82vzur1emv2g4iIujHFm/Lo7sB6rMQ6vcRYSBwj5mGyICIiRUwWdoL1WIl1eomxkDhGzMNk
QUREipgs7ATrsRLr9BJjIXGMmIfJgoiIFDFZ2AnWYyXW6SXGQuIYMY/iT5STfVCr1aipqbF1N4jo
LsVkYSeU6rE1NTVISUmxSl9uZ+39sk4vMRYS5yzMwzIUEREpYrKwE6zHSqzTS4yFxDFiHiYLIiJS
xGRhJ1iPlVinlxgLiWPEPBZNFlqtFiEhIQgODkZaWlqrbZYuXYrg4GBERkaisLDQ5DW9Xo9hw4Yh
Li7Okt0kIiIFFksWer0eS5YsgVarRVFREXbv3o2TJ0+atMnNzUVJSQmKi4uxZcsWLF682OT1jRs3
IiwsDCqVylLdtBusx0qs00uMhcQxYh6LJYuCggIEBQVBo9HA1dUViYmJ2L9/v0mbnJwczJ8/HwAw
evRoVFdX48KFCwCA8vJy5ObmYtGiRRBCWKqbRETUARa7z6KiogKBgYHG5YCAABw9elSxTUVFBXx8
fPDCCy/gd7/7HWpra9vdz4IFC4x1WbVajaioKGNt0nAm4QjLEydOVGxvOMs0xMtaywaOtn/Dc9Z+
v7Z+/wbdaXw071N36Y81l/Py8rB9+3YAXZ/HUgkLnbbv3bsXWq0WW7duBQBkZWXh6NGj2Lx5s7FN
XFwcVq1ahXHjxgEAJk+ejLS0NJw/fx7vv/8+3nzzTeTl5WHt2rV47733WnZepeJVRwepVCqb3pTH
fTvGvlNSUjgm7wJd+ey0WBnK398fZWVlxuWysjIEBAS026a8vBz+/v745JNPkJOTg4EDB2L27Nn4
17/+hXnz5lmqq3aB9ViJdXqJsZA4RsxjsWQRHR2N4uJi6HQ6NDQ0IDs7G/Hx8SZt4uPjsXPnTgBA
fn4+1Go1fH198frrr6OsrAxnz57Fnj178IMf/MDYjoiIrM9icxYuLi5IT0/H1KlTodfrkZSUhNDQ
UGRkZAAAkpOTERsbi9zcXAQFBcHNzQ2ZmZmtbovfhlLG75BLvLdAYiwkjhHzWPSHBGNiYhATE2Py
XHJysslyenp6u9uYMGECJkyYcMf7RkREHcc7uO0E67ES6/QSYyFxjJiHyYKIiBQxWdgJ1mMl1ukl
xkLiGDEPkwURESlisrATrMdKrNNLjIXEMWIeJgsiIlLEZGEnWI+VWKeXGAuJY8Q8TBZERKSIycJO
sB4rsU4vMRYSx4h5mCyIiEgRk4WdYD1WYp1eYiwkjhHzMFkQEZEiJgs7wXqsxDq9xFhIHCPmYbIg
IiJFTBZ2gvVYiXV6ibGQOEbMw2RBRESKmCzsBOuxEuv0EmMhcYyYh8mCiIgUMVnYCdZjJdbpJcZC
4hgxD5MFEREpYrKwE6zHSqzTS4yFxDFiHiYLIiJSxGRhJ1iPlVinlxgLiWPEPEwWRESkiMnCTrAe
K7FOLzEWEseIeZgsiIhIEZOFnWA9VmKdXmIsJI4R8zBZEBGRIosnC61Wi5CQEAQHByMtLa3VNkuX
LkVwcDAiIyNRWFgIACgrK8MjjzyC8PBwDBkyBJs2bbJ0V+9qrMdKrNNLjIXEMWIeiyYLvV6PJUuW
QKvVoqioCLt378bJkydN2uTm5qKkpATFxcXYsmULFi9eDABwdXXF+vXr8dVXXyE/Px9vvvlmi3WJ
iMg6LJosCgoKEBQUBI1GA1dXVyQmJmL//v0mbXJycjB//nwAwOjRo1FdXY0LFy7A19cXUVFRAAB3
d3eEhoaisrLSkt29q7EeK7FOLzEWEseIeSyaLCoqKhAYGGhcDggIQEVFhWKb8vJykzY6nQ6FhYUY
PXq0JbtLRERtcLHkxlUqVYfaCSHaXK+urg4JCQnYuHEj3N3dW6y7YMEC49mTWq1GVFSU8QzCUKN0
hOXm9di22hvq14Z4WWvZwJr702g0Ntt/87N5nU5n9Xg3X66qqsJDDz1k1f0bdKfxAQAbNmxw6M+H
7du3A+j61aZK3P5JfQfl5+cjJSUFWq0WAJCamgonJyesXLnS2OYnP/kJJk6ciMTERABASEgI/v3v
f8PHxweNjY14/PHHERMTg+eff75l51WqFonGUeXl5bV7ma1SqZCSkmK1/jSXkpJi1X03/4C29r6b
6w77bh4La+23u45JpTHiSLry2WnRMlR0dDSKi4uh0+nQ0NCA7OxsxMfHm7SJj4/Hzp07AdxKLmq1
Gj4+PhC2CLKfAAANR0lEQVRCICkpCWFhYa0mCjLFQSCxTi9ZOxZOTk5QqVQ2eajV6nb7xjFiHouW
oVxcXJCeno6pU6dCr9cjKSkJoaGhyMjIAAAkJycjNjYWubm5CAoKgpubGzIzMwEAR44cQVZWFiIi
IjBs2DAAt65Mpk2bZskuE5EZmpqabHo1RZZj0WQBADExMYiJiTF5Ljk52WQ5PT29xXrjx49HU1OT
RftmT3iJLVm79NKdMRYSx4h5eAc3EREpYrKwEzxjkngmLTEWEseIeZgsiIhIEZOFneDv3kj8PSSJ
sZA4RszDZEFERIqYLOwE67ES6/QSYyFxjJiHyYKIiBQxWdgJ1mMl1uklxkLiGDGPxW/KI1NqtRo1
NTW27gYRUacwWVhZTU2NTX6WwJF+CoF1eomxkDhnYR6WoYiISBGThZ1gbVpiLCTGQuKchXmYLIiI
SBGThZ1gbVpiLCTGQuKchXmYLIiISBGThZ1gbVpiLCTGQuKchXmYLIiISBGThZ1gbVpiLCTGQuKc
hXmYLIiISBGThZ1gbVpiLCTGQuKchXmYLIiISBGThZ1gbVpiLCTGQuKchXmYLIiISBGThZ1gbVpi
LCTGQuKchXmYLIiISBGThZ1gbVpiLCTGQuKchXkc7o8fVVRU4Pz58zbZt4eHh032S+QInJycoFKp
bLJvLy8vVFdX22Tf1uJwyWLGjBkoLy/Hvffea/V9l5aWWmzbOp2OZ5H/w1hIjhSLpqamdv8ipCVj
4Qh/idKiyUKr1eL555+HXq/HokWLsHLlyhZtli5divfffx+9evXC9u3bMWzYsA6v2xUNDQ147LHH
EBAQcEe21xmvvvqqxbZdVVXlMB8KShgLibGQGAvzWGzOQq/XY8mSJdBqtSgqKsLu3btx8uRJkza5
ubkoKSlBcXExtmzZgsWLF3d4XTL1/fff27oL3QZjITEWEmNhHosli4KCAgQFBUGj0cDV1RWJiYnY
v3+/SZucnBzMnz8fADB69GhUV1ejqqqqQ+sSEXUXhvkSWzzUarVV3qPFylAVFRUIDAw0LgcEBODo
0aOKbSoqKlBZWam4ble5uLjggw8+gJub2x3ZXmc0NTVZbNv2PrnWGYyFxFhIloyF0nyJJVltv8JC
3n33XbFo0SLj8q5du8SSJUtM2jz++OPi448/Ni5PmjRJHDt2rEPrCiEEAD744IMPPrrw6CyLXVn4
+/ujrKzMuFxWVtZiUvn2NuXl5QgICEBjY6PiugBwK18QEZGlWWzOIjo6GsXFxdDpdGhoaEB2djbi
4+NN2sTHx2Pnzp0AgPz8fKjVavj4+HRoXSIish6LXVm4uLggPT0dU6dOhV6vR1JSEkJDQ5GRkQEA
SE5ORmxsLHJzcxEUFAQ3NzdkZma2uy4REdmGStxFtRyNRgNPT084OzvD1dUVBQUFSElJwZ///Gd4
e3sDAFJTUzFt2jQb99TyqqursWjRInz11VdQqVTIzMxEcHAwZs2ahXPnzkGj0eCdd96x2jclbOn2
WGzbtg1ardbhjouvv/4aiYmJxuUzZ87g1VdfxZw5cxzuuGgtFq+88gquXLnicMdFamoqsrKy4OTk
hKFDhyIzMxPXrl3r9DFxVyWLgQMH4vjx4+jTp4/xuZdffhkeHh742c9+ZsOeWd/8+fMxYcIELFy4
EDdv3sS1a9fw2muvoV+/flixYgXS0tJw5coVrFmzxtZdtbjWYrFhwwaHPC4Mmpqa4O/vj4KCAmze
vNkhjwuD5rHYtm2bQx0XOp0OP/jBD3Dy5En06NEDs2bNQmxsLL766qtOHxN33Q8Jtpbb7qJ8d0fU
1NTgo48+wsKFCwHcKtt5eXmZ3Lcyf/587Nu3z5bdtIq2YgE43nHR3OHDhxEUFITAwECHPC6aax4L
IYRDHReenp5wdXVFfX09bt68ifr6evj5+XXpmLirkoVKpcLkyZMRHR2NrVu3Gp/fvHkzIiMjkZSU
5BDfKz979iy8vb3x9NNPY/jw4XjmmWdw7do1XLhwAT4+PgAAHx8fXLhwwcY9tbzWYlFfXw/A8Y6L
5vbs2YPZs2cDgEMeF801j4VKpXKo46JPnz5Yvnw5+vfvDz8/P6jVakyZMqVrx0Snv2xrQ5WVlUII
IS5evCgiIyPFhx9+KC5cuCCamppEU1OTePHFF8XChQtt3EvL+/TTT4WLi4soKCgQQgixbNky8dJL
Lwm1Wm3Srnfv3rbonlW1Fotf//rX4uLFiw53XBjcuHFD9OvXT1y8eFEIIRzyuDC4PRaO9nlRUlIi
QkNDxeXLl0VjY6OYPn262LVrV5eOibvqyuL+++8HAHh7e2PGjBkoKCjAfffdZ7ztfdGiRSgoKLBx
Ly0vICAAAQEBGDlyJAAgISEBn332GXx9fVFVVQUAOH/+PO677z5bdtMq2oqFt7e3wx0XBu+//z5G
jBhhnMT18fFxuOPC4PZYONrnxbFjxzB27Fj07dsXLi4ueOKJJ/Cf//ynS58Vd02yqK+vx9WrVwEA
165dw9///ncMHTrU+IYB4G9/+xuGDh1qqy5aja+vLwIDA/HNN98AuFWTDQ8PR1xcHHbs2AEA2LFj
B6ZPn27LblpFW7FwxOPCYPfu3cayC3DrfiZHOy4Mbo9F879l4wjHRUhICPLz83H9+nUIIXD48GGE
hYV17bPCQlc/d9yZM2dEZGSkiIyMFOHh4eL1118XQggxd+5cMXToUBERESF++MMfiqqqKhv31DpO
nDghoqOjRUREhJgxY4aorq4W3377rZg0aZIIDg4WU6ZMEVeuXLF1N63i9lhcuXLFYY+Luro60bdv
X1FbW2t8zlGPi9Zi4YjHRVpamggLCxNDhgwR8+bNEw0NDV06Ju6qr84SEZFt3DVlKCIish0mCyIi
UsRkQUREipgsiIhIEZMF2QV3d3eT5e3bt+O5555rd5333nsPaWlp7bbJy8tDXFxcq69t2LAB169f
b3PdWbNm4cyZMwCAxx57DLW1te3uq7P774ycnBy8+uqrZm+HHBeTBdkFlUrV7nJr4uLisHLlyi7v
c+PGjcafFrldSUkJrl27hkGDBgEADh48CE9Pzy7vy1xxcXHYu3cvGhsbbdYHursxWZBdav6N8EuX
LiEhIQGjRo3CqFGj8MknnwAwvfo4ffo0HnroIUREROCll16Ch4eHcf26ujrMnDkToaGhmDNnDgBg
06ZNqKysxCOPPIJJkya12P+ePXtM/mCXRqPBd999B51Oh9DQUDz77LMYMmQIpk6diu+//x7ArQQz
efJkREVFYcSIEThz5gxUKlWr+weA48ePY+LEiYiOjsa0adOMNyJu2rQJ4eHhiIyMNPlNpDFjxuDv
f//7HYkvOSBL3xBCZA3Ozs4iKirK+Ojfv7947rnnhBBCzJ492/i33s+dOydCQ0OFEEJkZmYa/7b7
Y489Jvbs2SOEEOJPf/qTcHd3F0II8cEHHwgvLy9RUVEhmpqaxJgxY8SRI0eEEEJoNBrx7bffttqf
adOmiePHjxuXDW3Pnj0rXFxcxOeffy6EEOLJJ58UWVlZQgghRo0aJfbt2yeEuPWbRvX19a3u/+OP
PxYNDQ1izJgx4vLly0IIIfbs2WP8nSM/Pz/R0NAghBCipqbG2Idt27aJFStWdD3I5NAs9pfyiKyp
Z8+eKCwsNC7v2LEDx44dA3DrJ0BOnjxpfO3q1au4du2ayfr5+fnIyckBAMyePRs///nPja+NGjUK
fn5+AICoqCjodDqMHTu23f6cO3fO+Ftmtxs4cCAiIiIAACNGjIBOp0NdXR0qKyvxwx/+EABwzz33
tLt/Ly8vfPXVV5g8eTIAQK/XG9tERETgxz/+MaZPn27yMw5+fn7QarXt9puoLUwWZJdEszKUEAJH
jx41+QAGOjavAQA9evQw/tvZ2Rk3b97sdB/a256hDNXZ/YeHhxtLas0dPHgQH374Id577z289tpr
+O9//wsnJyc0NTV1+D0T3Y5zFmT3Hn30UWzatMm4fOLECQCmH+YPPfQQ3n33XQC35hs6wsPDo81v
OA0YMMDkR+vaI4SAu7s7AgICsH//fgDAjRs32vymlUqlwuDBg3Hp0iXk5+cDABobG1FUVAQhBEpL
SzFx4kSsWbMGNTU1qKurA3DrR/QGDBjQoT4R3Y7JguxCa9+GMjy3adMmHDt2DJGRkQgPD8eWLVta
tNmwYQPWrVuHqKgonD592vjX9lrbtsGzzz6LadOmtTrBPX78eGMZ7PZttPXNrV27dmHTpk2IjIzE
+PHjUVVVZdLH5lxdXfHuu+9i5cqViIqKwrBhw/Cf//wHer0ec+fORUREBIYPH45ly5YZv4VVUFCA
hx9+uI0IErWPPyRIBOD69evo2bMngFtXFtnZ2fjb3/7W5e2dOXMGzz33HA4ePHinumiWpqYmDB8+
HMeOHYOLC6vP1Hk8aohw62uoS5YsgRACvXv3xrZt28za3qBBg+Dh4YHTp0/jgQceuEO97LoDBw4g
ISGBiYK6jFcWRESkiHMWRESkiMmCiIgUMVkQEZEiJgsiIlLEZEFERIqYLIiISNH/A4RJnuZuW/ta
AAAAAElFTkSuQmCC
"
>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The above plot requires multiple lines of commands to produce. Python evaluates each line on its own, updating the plot as each command is issued. Once the histogram is created, we can ‘show’ it with <code>show()</code>. Notice also the use of quotation marks to delimit the labels and names like “grey”.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Density-Plots">Density Plots<a class="anchor-link" href="#Density-Plots">¶</a></h3><p>A density plot avoids the need to create bins and plots out the distribution as a continuous curve. Making a density plot generally involves two operations. First, a density function performs the basic density computation, which is then displayed using the <code>plot()</code> function. Pandas provides a shorthand for these two operations to produce a simple ‘density plot’ (where <code>kind="kde"</code> stands for “kind of plot equals ‘kernel density estimator’”):</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [18]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">d</span> <span class="o">=</span> <span class="n">gal</span><span class="o">.</span><span class="n">height</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"kde"</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAY0AAAD9CAYAAABA8iukAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xt0FPX5+PF3JJEqKkEIQRNwgYRcCCYgIdXir1HACEIU
kRopCgh+UzwUsfT4lbbfFlulUL8oWPyeorYoqEC9EjEEGyEtRkIgCRcNlwCJJCFcFMJFkFz4/P4Y
Ewkk5LI7OzOffV7n5MDszu4+z5nJPpnPM/MZP6WUQgghhGiBK6wOQAghhHNI0RBCCNFiUjSEEEK0
mBQNIYQQLSZFQwghRItJ0RBCCNFiphaNjIwMIiMjCQ8PZ968eZc8v2vXLm699VZ+9KMfMX/+/Fa9
VgghhPf5mXWdRm1tLREREWRmZhISEkJ8fDzLly8nKiqqfp2jR4/y1Vdf8eGHH9KpUydmzpzZ4tcK
IYTwPtOONHJzcwkLC8PlchEQEEBKSgqrVq1qsE5QUBADBw4kICCg1a8VQgjhfaYVjfLycrp3716/
HBoaSnl5uemvFUIIYR5/s97Yz8/P9Ne68xlCCOHL2tqZMO1IIyQkhNLS0vrl0tJSQkNDPf5apZS2
P3/4wx8sj0Hyk/x8MT+dc1PKvTa2aUVj4MCBFBUVUVJSQlVVFStXriQ5ObnRdS9OojWv1VlJSYnV
IZhK8nM2nfPTOTd3mTY85e/vz6JFi0hKSqK2tpbJkycTFRXF4sWLAUhNTeXQoUPEx8dz8uRJrrji
ChYuXEhhYSHXXHNNo68VQghhLdNOufUGPz8/tw+17CwrK4vExESrwzCNFfl9+y2cPQtdupj/WbL9
nEvn3MC9704pGsJnfPoppKRAdTXMmAGzZ1sdkRDWcOe7U6YRsbGsrCyrQzCVN/P7+mv4+c/hnXeg
qAjefBPef9/cz5Tt51w65+YuKRrCJzz3HIwdC4mJEBQES5bAk0/CuXNWRyaEs8jwlNBeZSX06gXb
t8OFZ26PGAH33gupqdbFJoQVZHhKiMtYuRKGDGlYMABmzoSXXwb5u0OIlpOiYWO6j6t6K7/ly41+
xsXuvBOqqiA725zPle3nXDrn5i4pGkJr5eWwYwcMH37pc35+MGUKvP6618MSwrGkpyG0tngxbNhg
nC3VmLIyuPlmqKiA9u29G5sQVpGehhBNyMyEu+5q+vnQUIiNhTVrvBeTEE4mRcPGdB9XNTu/2lpY
t85ogl/OuHHw9tue/3zZfs6lc27ukqIhtLV1KwQHQ0jI5dcbMwbWroWTJ70TlxBOJj0Noa1584xG
+EsvNb9ucrJRPCZMMD8uIawmPQ0hGpGZCUOHtmzd8ePhrbfMjUcIHUjRsDHdx1XNzO/sWcjJgZ/+
tGXrjxoFmzcbZ1F5imw/59I5N3dJ0RBa+vxz6NcPOnZs2fpXXQX33QcrVpgblxBOJz0NoaVZsyAg
AP74x5a/JjMTnn4atmwxLy4h7EB6GkJcpDX9jDp33AEHD8LOnebEJIQOpGjYmO7jqmbld+wY7N4N
P/5x617Xrh1MnGhcRe4Jsv2cS+fc3CVFQ2hn/XoYPBiuvLL1r/3FL2DZMjh92vNxCaED6WkI7Uyd
CuHh8Ktfte31999vDG09/rhn4xLCLuQe4UJcICzMuJXrzTe37fUbNxr3Et+zRyYxFHqSRrimdB9X
NSO/4mI4dco43batbr3VKDh/+5t7scj2cy6dc3OXFA2hlU8/NYaW/Pzce5/nnoM5c+DwYc/EJYQu
ZHhKaCUlBZKSYNIk999r1izjLKz33nO/CAlhJ9LTEAI4f96Y1TYvD3r0cP/9zp2DQYOMAjRjhvvv
J4RdSE9DU7qPq3o6v+3b4frrPVMwwGiCp6XBX/4CH37Y+tfL9nMunXNzl7/VAQjhKW25Crw5N90E
H30EI0YYF/+NGuXZ9xfCaWR4Smjj7rshNRVGj/b8e2/eDCNHwt//bvwrhJNJT0P4vHPnoEsXOHAA
OnUy5zNyc40jDSkcwumkp6Ep3cdVPZnfxo0QHW1ewQCjKb56NUye3LIeh2w/59I5N3dJT0NowYx+
RmPi42HNGqPHUV0NY8ea/5lC2IkMTwkt3HqrcUHenXd65/O2bTN6KPPnw7hx3vlMITxFehrCp50+
Dd26wdGjxh34vOXLL+Guu+DRR+H3vzdu+iSEE0hPQ1O6j6t6Kr/PP4cBA7xbMAD69jUuJNy8GX7y
E+M6kQvJ9nMunXNzlxQN4XhZWZCYaM1nd+sG6enwX/9l9FR+9zv47jtrYhHCG2R4SjjebbfBn/4E
Q4ZYG0dFBUybBiUlRiG79lpr4xGiKbYdnsrIyCAyMpLw8HDmzZvX6DrTp08nPDyc2NhYCgoK6h//
85//TN++fenXrx/jxo3j3LlzZoYqHOr0aaMpfeutVkcCN9wA774Lt9xi3AhKCB2ZVjRqa2uZNm0a
GRkZFBYWsnz5cnbu3NlgnfT0dPbu3UtRURGvvPIKU7//TSspKeHVV18lPz+fHTt2UFtby4oVK8wK
1bZ0H1f1RH51/Yyrr3Y/Hk/w84MXX4TsbFi4MMvqcEyl8/6pc27uMq1o5ObmEhYWhsvlIiAggJSU
FFatWtVgnbS0NCZMmABAQkIClZWVHD58mOuuu46AgADOnDlDTU0NZ86cISQkxKxQhYNt2AD/7/9Z
HUVDHTrAb38Ly5dbHYkQnmda0SgvL6d79+71y6GhoZSXl7doneuvv56ZM2fSo0cPbrzxRgIDAxna
xJVbEydOZPbs2cyePZsFCxY0+AshKyvL0ct1j9klHjvmt2ZNVv3QlNX5XLg8fjzs2gVvv22PeMxY
rnvMLvF4cjkxMdFW8bi7nJWVxcSJE+u/L92iTPLuu++qKVOm1C8vW7ZMTZs2rcE6I0eOVJ999ln9
8pAhQ1ReXp7au3evioqKUl9//bWqrq5W9913n3rzzTcv+QwTwxcOUFurVMeOSh05YnUkjUtNVerP
f7Y6CiEu5c53p2lHGiEhIZSWltYvl5aWEhoaetl1ysrKCAkJYcuWLdx222107twZf39/7r//fj7/
/HOzQrWti/+q0427+e3ebdw/IyjIM/F4WkREFu+8Y3UU5tF5/9Q5N3eZVjQGDhxIUVERJSUlVFVV
sXLlSpKTkxusk5yczNKlSwHIyckhMDCQ4OBgIiIiyMnJ4ezZsyilyMzMJDo62qxQhUPl5kJCgtVR
NK1fP9i3D44csToSITzH1Os01qxZw4wZM6itrWXy5MnMmjWLxYsXA5CamgpQf4ZVhw4dWLJkCQMG
DADgL3/5C2+88QZXXHEFAwYM4LXXXiPgonka5DoN3/b44xAeDk8+aXUkTbv3XuO+5Q89ZHUkQvxA
5p4SPik+HhYsMKbwsKu//tWYXuTVV62ORIgf2PbiPuEe3cdV3cmvpgYKC+Hmmz0Xj6dlZRlnduXk
WB2JOXTeP3XOzV1SNIQj7d1rzPtk96k6br4Z9u+HU6esjkQIz5DhKeFI//wnrFgB779vdSTNu+02
mDPHukkVhbiYDE8Jn7N9u72Hpi40aJBxppcQOpCiYWO6j6u6k9+2bfYvGnX5xcfDli3WxmIGnfdP
nXNzlxQN4UhOOtLo1w+++MLqKITwDOlpCMc5cQJCQuDkSbjCAX/2VFVBx45QWQnt21sdjRDS0xA+
Zvdu6NPHGQUD4MoroVcvYwJDIZzOIb92vkn3cdW25rd7N0REeDYWM1yYX79+sGOHdbGYQef9U+fc
3CVFQzjOnj3OKBoXiomRvobQg/Q0hOP87Gdw330wbpzVkbTcBx/AP/4BH31kdSRCSE9D+BinDE9d
KDwcioqsjkII90nRsDHdx1Xbkt/588YUIn36eD4eT7swv969oaTEmDNLFzrvnzrn5i4pGsJRysog
MND+c05d7KqrIDgYvvrK6kiEcI/0NISjZGbCc8/B+vVWR9J6w4bBzJlw991WRyJ8nfQ0hM/Yu9fo
DzhReLhx5pcQTiZFw8Z0H1dtS37FxdCzp+djMcPF+fXpo1czXOf9U+fc3CVFQzhKcTG4XFZH0TZy
pCF0ID0N4SiDBsFLL8GPf2x1JK1XVAR33WUUPiGsJD0N4TOcNDx1MZcLDh6E6mqrIxGi7aRo2Jju
46qtze/0afj2W+ja1Zx4PO3i/AICjFvUlpZaE4+n6bx/6pybu6RoCMeo62f4+VkdSdu5XDI8JZxN
ehrCMT76CP72N/j4Y6sjabuJE+H222HyZKsjEb5MehrCJzi5n1GnZ0850hDOJkXDxnQfV21tfk47
3bax/HQqGjrvnzrn5i4pGsIxSkrkSEMIq0lPQzhGXJxxT4oBA6yOpO1KSyEhwTj1VgirSE9D+ISv
voIePayOwj033gjHjsHZs1ZHIkTbSNGwMd3HVVuT3+nT8N130LmzefF4WmP5tWsH3bvrMUW6zvun
zrm5S4qGcITycggNdfY1GnWkryGcTHoawhE+/dS4j8a6dVZH4r7UVIiNhccftzoS4aukpyG0V1Zm
HGnowOUyzgQTwomkaNiY7uOqrcnPiUWjqfx0GZ7Sef/UOTd3SdEQjuDEotGUnj3lSEM4l/Q0hCOM
GgWPPQbJyVZH4r5Dh6BfPzh61OpIhK+ybU8jIyODyMhIwsPDmTdvXqPrTJ8+nfDwcGJjYykoKKh/
vLKykgceeICoqCiio6PJyckxM1Rhc6Wl+hxpBAf/MM27EE5jWtGora1l2rRpZGRkUFhYyPLly9m5
c2eDddLT09m7dy9FRUW88sorTJ06tf65J554ghEjRrBz5062b99OVFSUWaHalu7jqr7a0/DzMy5S
dPq1Gjrvnzrn5i7TikZubi5hYWG4XC4CAgJISUlh1apVDdZJS0tjwoQJACQkJFBZWcnhw4c5ceIE
GzZs4NFHHwXA39+fjh07mhWqsLmzZ42/zIOCrI7Ec1wu5xcN4ZtMKxrl5eV07969fjk0NJTy8vJm
1ykrK6O4uJigoCAmTZrEgAEDeOyxxzhz5oxZodpWYmKi1SGYqqX5lZdDSIjzLuy7XH433eT8ZrjO
+6fOubnL36w39mvhb/jFzRg/Pz9qamrIz89n0aJFxMfHM2PGDObOncsf//jHS14/ceJEXN/Plx0Y
GEhcXFz9Bq87xJRlZy9DIqGh9onHE8suF2zYkEVUlD3ikWW9l7Oysnj99dcB6r8v20yZZOPGjSop
Kal+ec6cOWru3LkN1klNTVXLly+vX46IiFCHDh1SFRUVyuVy1T++YcMGdc8991zyGSaGbwvr16+3
OgRTtTS/ZcuUGjfO3FjMcLn83nxTqQcf9F4sZtB5/9Q5N6Xc++40bXhq4MCBFBUVUVJSQlVVFStX
riT5ovMlk5OTWbp0KQA5OTkEBgYSHBxMt27d6N69O3v27AEgMzOTvn37mhWqsDknNsGbIz0N4VSm
XqexZs0aZsyYQW1tLZMnT2bWrFksXrwYgNTUVID6M6w6dOjAkiVLGPD9zRK2bdvGlClTqKqqonfv
3ixZsuSSZrhcp+Ebpk2DiAj45S+tjsRzysogPh4qKqyORPgid7475eI+YXv33gsTJ8Lo0VZH4jm1
tXD11XDiBPzoR1ZHI3yNqRf33X///Xz88cecP3++TR8g2u6HRrCeWppfWZlxDwqnuVx+7doZQ24H
DngvHk/Tef/UOTd3NVs0pk6dyltvvUVYWBhPP/00u3fv9kZcQtTTsacB0tcQztTi4anKykpWrFjB
s88+S48ePXjssccYP348AQEBZsfYJBme0t+5c3DddcYFfldoNr3mo4/Crbcac2oJ4U2mzz31zTff
8Prrr/Paa68xYMAApk+fTl5eHsOGDWvThwrRUgcPwg036FcwQI40hDM1+6s4evRoBg8ezJkzZ/jo
o49IS0sjJSWFRYsWcerUKW/E6LN0H1dtSX5OHppqLj+nXxWu8/6pc27uavaK8Mcee4wRI0Y0eOzc
uXO0b9+evLw80wITApxdNJojRxrCiZrtafTv37/BlOUAAwYMID8/39TAWkJ6Gvp7/nnj/hPz51sd
ieeVlMDttxvTvgvhTe58dzZ5pFFRUcHBgwc5e/Ys+fn5KKXw8/Pj5MmTPjl5oLBGWZnxF7mOQkPh
yBGoqoIrr7Q6GiFapsmextq1a/n1r39NeXk5M2fO5Ne//jUzZ87khRdeYM6cOd6M0WfpPq7akvzK
y507PNVcfv7+0K2bURidSOf9U+fc3NXkkcbEiROZOHEi7733HmPGjPFmTELUKyszpkXXVV1fo1cv
qyMRomWa7GksW7aMhx9+mPnz5zeY5rxumOpXv/qV14JsivQ09Ne9O3z2mXGmkY4eeQTuuAMmTbI6
EuFLTOlp1PUtTp061WjREMJstbVw+LBxnYau5Awq4TQyYaGNZWVl1d9QRUfN5XfwIAwYYJw95UQt
2X5//zts2ADf3x/HUXTeP3XODUy+Ivypp57i5MmTVFdXM2TIELp06cKyZcva9GFCtEbdbV515nI5
+wI/4XuaPdKIjY1l27ZtfPDBB6xevZoXXniB22+/ne3bt3srxibpfqTh6z78EJYsgVWrrI7EPPv2
wdChUFxsdSTCl5h6pFFTUwPA6tWreeCBB+jYsaP0NIRX6H7mFBiN/vJy+P7XTAjba7ZojBo1isjI
SPLy8hgyZAhHjhzhR3LXGK/Q/Vzx5vJz+vBUS7bflVdC165G/8ZpdN4/dc7NXc0Wjblz55KdnU1e
Xh5XXnklHTp0YJXO4wXCNpxeNFrK6RMXCt/SorOnsrOz+eqrr6iurjZe5OfHI488YnpwzZGeht7u
vBN+8xtjzF9nP/853H03PPyw1ZEIX2HKdRp1xo8fz/79+4mLi6Ndu3b1j9uhaAi9yZGGEPbTbNHI
y8ujsLBQmt8W0P1c8cvlp5TzG+Et3X4uF+Tmmh6Ox+m8f+qcm7ua7WnExMRQUVHhjViEqHfiBLRr
Z9zqVXdypCGcpNmeRmJiIlu3bmXQoEG0b9/eeJGfH2lpaV4J8HKkp6GvL7+EBx6AnTutjsR8u3fD
yJFQVGR1JMJXmNrTmD179iUfIkNVwmxOH5pqjR49jBsxnT+v573QhV6a3UUTExNxuVxUV1eTmJjI
oEGD6N+/vzdi83m6nyt+ufycfB+NOi3dflddBYGBzptjS+f9U+fc3NVs0XjllVcYO3YsqampAJSV
lTF69GjTAxO+zVfOnKojfQ3hFM0WjZdffpnPPvuM677vSPbp04cjR46YHphA+7M3LpefDsNTrdl+
TpwiXef9U+fc3NVs0Wjfvn19AxyMuaikpyHMJkcaQthTs0Xjpz/9Kc899xxnzpzhX//6F2PHjmXU
qFHeiM3n6T6uKj2NHzjxSEPn/VPn3NzVormngoKC6NevH4sXL2bEiBE8++yz3ohN+DAdhqdaQ440
hFO0aO6puh5G165dTQ+oNeQ6DT19951xUd933/nOKai+dF2KsJ4p99NQSjF79my6dOlCREQEERER
dOnShWeeeUa+qIWp6o4yfKVggHGk8dVXxvQpQthZk7+WL774ItnZ2WzevJnjx49z/PhxcnNzyc7O
5sUXX/RmjD5L93HVpvI7cMD4EnW61my/a66Bq6+Go0fNi8fTdN4/dc7NXU0WjaVLl/L222/Ts2fP
+sd69erFW2+9xdKlS70SnPBNBw4YV0n7GulrCCdosmjU1NQQFBR0yeNBQUH1t4AV5tL9XPGm8vvq
Kz2KRmu3n8vlrKKh8/6pc27uarJoBAQENPmiyz0nhLt8+UjDaafdCt/TZNHYvn071157baM/O3bs
aNGbZ2RkEBkZSXh4OPPmzWt0nenTpxMeHk5sbCwFBQUNnqutraV///4+e12I7uOq0tNoyGlHGjrv
nzrn5q4mZ7mtra11641ra2uZNm0amZmZhISEEB8fT3JyMlFRUfXrpKens3fvXoqKiti0aRNTp04l
Jyen/vmFCxcSHR3NqVOn3IpFOIsvH2l88onVUQhxeaad1Jibm0tYWBgul4uAgABSUlJYtWpVg3XS
0tKYMGECAAkJCVRWVnL48GHAmBgxPT2dKVOm+OwpvrqPqzaWn1JG0eje3fvxeJr0NJxL59zcZVrR
KC8vp/sFv/mhoaGUl5e3eJ0nn3yS559/nit86WR9wdGjxqmn11xjdSTeJ9dqCCdo9iZMbdXSSQ0v
PopQSrF69Wq6du1K//79mx1bnDhxIi6XC4DAwEDi4uLq/0qoe61TlxcsWKBVPi3Jb/duuOkme8Tn
7e23dWsWSsGxY4l07mx9/L68f174vWOHeDyRz+uvvw5Q/33ZZsokGzduVElJSfXLc+bMUXPnzm2w
Tmpqqlq+fHn9ckREhKqoqFCzZs1SoaGhyuVyqW7duqmrr75aPfzww5d8honh28L69eutDsFUjeX3
3ntK3Xuv92MxQ1u2X2ysUnl5no/FDDrvnzrnppR7352mfetWV1erXr16qeLiYnXu3DkVGxurCgsL
G6zz8ccfq+HDhyuljCKTkJBwyftkZWWpkSNHNvoZuhcNX/Tii0r98pdWR2Gd5GSjcAphJne+O00b
nvL392fRokUkJSVRW1vL5MmTiYqKYvHixQCkpqYyYsQI0tPTCQsLo0OHDixZsqTR95L7d/gOXS7s
aysnTpEufEuLZrm1K91nuc3Kyqofn9RRY/mNGQMpKTB2rDUxeVJbtt8LLxhFY+FCc2LyJJ33T51z
A5NmuRXCCrqcbttWLhcUF1sdhRBNkyMNYStdukBhIdjs1i1es307PPSQcX8NIcziznenFA1hGydO
GPfROHUKfLWNdfq0UTBPn/at+4kI75LhKU1deK64ji7Ob/9+6NVLn4LRlu13zTXQsSMcPOj5eDxN
5/1T59zcJUVD2Mb+/dC7t9VRWC8sDPbutToKIRonw1PCNp5/Hg4dgvnzrY7EWpMmwU9+AlOmWB2J
0JUMTwkt1A1P+brevWHfPqujEKJxUjRsTPdx1Yvz27dPr6LR1u3nlOEpnfdPnXNzlxQNYRvS0zA4
pWgI3yQ9DWELNTXQoQOcPAnt21sdjbWOHzemST9xQp8zyYS9SE9DOF5ZGQQHS8EA6NQJAgKMe4sI
YTdSNGxM93HVC/Pbt0+/oSl3tp8Thqh03j91zs1dUjSELejWBHdXWJicQSXsSXoawhZmzjSGp556
yupI7OH3vzf6Gc88Y3UkQkfS0xCOt3s3RERYHYV9OGF4SvgmKRo2pvu46oX57d4NffpYF4sZ3O1p
2H14Suf9U+fc3CVFQ1iuqgpKS/VrhLtDjjSEXUlPQ1hu505IToaiIqsjsQ+l4LrrjJtSdepkdTRC
N9LTEI4m/YxL+fk5Y4hK+B4pGjam+7hqXX66Fg13t1+fPrBnj2diMYPO+6fOublLioaw3J49ehYN
d0VEGAVVCDuRnoaw3ODB8OyzkJhodST28tZbkJYGK1daHYnQjfQ0hKPpOjzlLjnSEHYkRcPGdB9X
zcrK4tgxOHcOunWzOhrP80RPo6gIzp/3TDyepvP+qXNu7pKiISy1cydERsoU4I257jro2NGYAVgI
u5CehrDUK6/Axo2wZInVkdjTnXfCrFkwbJjVkQidSE9DONaXX0JMjNVR2Jf0NYTdSNGwMd3HVbOy
svjiC+jb1+pIzOGJ7RcRAbt2uR+LGXTeP3XOzV1SNISlvvxS36LhCXKkIexGehrCMt98Y9x4qbJS
GuFN2b/fuH7lwAGrIxE6kZ6GcKS6owwpGE276SbjXuHffmt1JEIYpGjYmO7jqu+/n6X10JQntl+7
dsaU8XacAVjn/VPn3NwlRUNYpqREzpxqCTs3w4XvkZ6GsExiIvzudzB0qNWR2NtvfwtXXgl/+IPV
kQhdSE9DOI5SaH26rSdFR0NhodVRCGGQomFjOo+rHjkCVVVZWs45VcdT269vX6PA2o3O+6fOubnL
9KKRkZFBZGQk4eHhzJs3r9F1pk+fTnh4OLGxsRQUFABQWlrKHXfcQd++fYmJieGll14yO1ThRV9+
CS6XnDnVEpGRxqm3VVVWRyIEoExUU1OjevfurYqLi1VVVZWKjY1VhYWFDdb5+OOP1fDhw5VSSuXk
5KiEhASllFIVFRWqoKBAKaXUqVOnVJ8+fS55rcnhCxMtXKjUL35hdRTOERGh1I4dVkchdOHOd6ep
Rxq5ubmEhYXhcrkICAggJSWFVatWNVgnLS2NCRMmAJCQkEBlZSWHDx+mW7duxMXFAXDNNdcQFRXF
wYMHzQxXeJHMOdU6MTH2HKISvsffzDcvLy+ne/fu9cuhoaFs2rSp2XXKysoIDg6uf6ykpISCggIS
EhIu+YyJEyficrkACAwMJC4ujsTvbwFXNy7p1OUFCxZolc+Fy198AQEBC8jK0jM/8Oz2i4mBjz82
ekA65me35br/2yUeT+Tz+uuvA9R/X7aZB494LvHuu++qKVOm1C8vW7ZMTZs2rcE6I0eOVJ999ln9
8pAhQ1ReXl798qlTp9Qtt9yiPvjgg0ve3+TwLbd+/XqrQzBFba1S116rVFraeqtDMZUnt9877yh1
770eezuP0HX/VErv3JSy8fBUSEgIpaWl9culpaWEhoZedp2ysjJCQkIAqK6uZsyYMYwfP5777rvP
zFBtqe4vBt0UF0NgIIwalWh1KKby5Paz4/CUrvsn6J2bu0wtGgMHDqSoqIiSkhKqqqpYuXIlycnJ
DdZJTk5m6dKlAOTk5BAYGEhwcDBKKSZPnkx0dDQzZswwM0zhZdu2QWys1VE4S1gYlJfLHFTCeqYW
DX9/fxYtWkRSUhLR0dE8+OCDREVFsXjxYhYvXgzAiBEj6NWrF2FhYaSmpvJ///d/AGRnZ/Pmm2+y
fv16+vfvT//+/cnIyDAzXNu5cFxVJ1u3GkVD1/zqeDI/f39jOpGdOz32lm7TefvpnJu7TG2EAwwf
Ppzhw4c3eCw1NbXB8qJFiy553eDBgzl//rypsQlrbNsG48dbHYXzxMTA9u0wcKDVkQhfJnNPCa9z
ueCTT6BPH6sjcZYXXjD6QX/9q9WRCKeTuaeEY1RWwtdfG9N9i9YZMADy862OQvg6KRo2puO46vbt
xjBLu3YDev5VAAAMIUlEQVR65nchT+cXF2cM7dXWevRt20zn7adzbu6SoiG8ats248tPtF5gIHTr
JvcMF9aSnobwqilT4JZbYOpUqyNxpp/9DJKT5UQC4R7paQjHkGs03CN9DWE1KRo2ptu4ak2NMVFh
v37Gsm75XcyM/G65xT5FQ+ftp3Nu7pKiIbxm504IDYVrr7U6EueqO9KoqbE6EuGrpKchvGbJEsjM
hLfesjoSZ4uKgrffhv79rY5EOJX0NIQjbNkiVzN7wk9+AtnZVkchfJUUDRvTbVx182aIj/9hWbf8
LmZWfnYpGjpvP51zc5cUDeEVVVXG1N4ypOI+uxQN4ZukpyG8Ij8fJkyAHTusjsT5lILgYMjLgwtu
eilEi0lPQ9je5s3Sz/AUPz+47TY52hDWkKJhYzqNq+bmXlo0dMqvMWbmd8cd8Omnpr19i+i8/XTO
zV1SNIRXZGfD4MFWR6GPpCRYu9YYqhLCm6SnIUx39CiEh8M33xiz2wr3KWXclyQjw7huQ4jWkJ6G
sLXsbLj1VikYnuTnZxxtfPKJ1ZEIXyNFw8Z0GVf97LPGh6Z0ya8pZueXlATp6aZ+xGXpvP10zs1d
UjSE6TZskH6GGe6+G3Jy4NgxqyMRvkR6GsJUx4/DTTfB4cNw1VVWR6OfsWNh+HB49FGrIxFOIj0N
YVuffmpcwSwFwxxjx8I//2l1FMKXSNGwMR3GVdeuNYZRGqNDfpfjjfzuuQc2bYKKCtM/6hI6bz+d
c3OXFA1hmvPnYc0ao2ErzNGhg3G08fe/Wx2J8BXS0xCm+fxz457gX35pnCIqzFFQAPfeC8XFclqz
aBnpaQhb+uc/4cEHpWCYrX9/CAmBDz6wOhLhC6Ro2JiTx1Wrq38oGk1xcn4t4c38fvtbeOYZY0jQ
W3Tefjrn5i4pGsIUH3wAYWEQGWl1JL7hnnvg6qvhnXesjkToTnoawuOUMk6znTkTxoyxOhrfkZUF
jzxi9JCuvdbqaISdSU9D2Ep6unGV8r33Wh2Jb0lMhKFD4Te/sToSoTMpGjbmxHHVU6fgiSdg/nzw
97/8uk7MrzWsyO9//xfS0uC998z/LJ23n865uauZX2shWq6qCn7+c7jzTmOMXXjf9dcbBWP4cONW
sIMGWR2R0I30NITblDImzvvlL417PCxfDgEBVkfl21avNuaj+uADo78kxIXc+e6UoiFa7dtvjakr
cnKM27hu2mRcmfz00zB5slyXYRdr1xqN8d/8xijoV8hgtPieNMI1Zadx1cOHYcEC42ZKXbvC735n
3Ilv3DijeBQVGVd/t6Zg2Ck/M1idX1ISbNwIK1dCQoIxeaQn/8ayOj8z6Zybu0wtGhkZGURGRhIe
Hs68efMaXWf69OmEh4cTGxtLQUFBq16ru61bt1r6+WfOGENNI0ZARATk58Ps2fD118YUIfPnw89+
Zkx93pajC6vzM5sd8uvVy7gJ1hNPwPTpEBMD//M/8J//uH8fDjvkZxadc3OXaY3w2tpapk2bRmZm
JiEhIcTHx5OcnEzUBTc0Tk9PZ+/evRQVFbFp0yamTp1KTk5Oi17rCyorK73+mTU1sG4dvPWWcRbO
oEHw8MPGRWMdOnj2s6zIz5vskt8VV8D48cZJCtnZRr/jqaegsNB4rlMnCAw0ru245hrjp+7/ISHG
HwwREdC7d8NelV3yM4POubnLtKKRm5tLWFgYLpcLgJSUFFatWtXgiz8tLY0JEyYAkJCQQGVlJYcO
HaK4uLjZ1wr3nT9vDDsVFsL27fDvfxsXiPXpY3zBzJsH3bpZHaXwFD8/4w6KdXdRVMq4SdaJE8a/
p0//8HPqlPFz4ICxX+zaBQcPQng49O1r/OzaBXv2GEczzZ1eLfRh2qYuLy+ne/fu9cuhoaFs2rSp
2XXKy8s5ePBgs6/1htOnISXlh3FgT/zbmnWLikr41788814X/ltdDUePGsMTgYEQHQ39+hnzRL3y
itGz8IaSkhLvfJBF7J6fn59xiu7110PPns2vf+aMUSi+/NL4yc4uISkJSkuNI5POnaFjR+No5MKf
5mbebW5o83LPm3XSxdatJeTlGf8fNswY2hMG04qGXwu3prtnP7X0c5zqyJE3TH3/r782xrf/8x94
+WVTP6pRb7xhbn5W0z0/MPKrrDR+dFJWZuS2erXRExIG04pGSEgIpaWl9culpaWEhoZedp2ysjJC
Q0Oprq5u9rXgfsERQgjROqadPTVw4ECKioooKSmhqqqKlStXkpyc3GCd5ORkli5dCkBOTg6BgYEE
Bwe36LVCCCG8z7QjDX9/fxYtWkRSUhK1tbVMnjyZqKgoFi9eDEBqaiojRowgPT2dsLAwOnTowJIl
Sy77WiGEEBZTDlNTU6Pi4uLUyJEjlVJKffPNN2ro0KEqPDxcDRs2TB0/ftziCNvupptuUv369VNx
cXEqPj5eKaVPfsePH1djxoxRkZGRKioqSuXk5GiT265du1RcXFz9z3XXXacWLlyoTX5KKTVnzhwV
HR2tYmJi1EMPPaS+++47rfJbsGCBiomJUX379lULFixQSjn7d2/SpEmqa9euKiYmpv6xy+UzZ84c
FRYWpiIiItTatWsv+96OuyJ84cKFREdH1zfA586dy7Bhw9izZw9Dhgxh7ty5FkfYdn5+fmRlZVFQ
UEBubi6gT35PPPEEI0aMYOfOnWzfvp3IyEhtcouIiKCgoICCggLy8vK4+uqrGT16tDb5lZSU8Oqr
r5Kfn8+OHTuora1lxYoV2uT3xRdf8Nprr7F582a2bdvG6tWr2bdvn6PzmzRpEhkZGQ0eayqfwsJC
Vq5cSWFhIRkZGTz++OOcv9wtIE0rdSYoLS1VQ4YMUevWras/0oiIiFCHDh1SSilVUVGhIiIirAzR
LS6XS3399dcNHtMhv8rKStWzZ89LHtcht4utXbtWDR48WCmlT37ffPON6tOnjzp27Jiqrq5WI0eO
VJ988ok2+b3zzjtq8uTJ9ct/+tOf1Lx58xyfX3FxcYMjjabymTNnjpo7d279eklJSWrjxo1Nvq+j
jjSefPJJnn/+ea64YOa1w4cPExwcDEBwcDCHDx+2Kjy3+fn5MXToUAYOHMirr74K6JFfcXExQUFB
TJo0iQEDBvDYY4/x7bffapHbxVasWMFDDz0E6LHtAK6//npmzpxJjx49uPHGGwkMDGTYsGHa5BcT
E8OGDRs4duwYZ86cIT09nbKyMm3yq9NUPgcPHmxwdmrd9XJNcUzRWL16NV27dqV///5Nnmrr5+fn
6Os2srOzKSgoYM2aNbz88sts2LChwfNOza+mpob8/Hwef/xx8vPz6dChwyWH+k7N7UJVVVV89NFH
jB079pLnnJzfvn37WLBgASUlJRw8eJDTp0/z5ptvNljHyflFRkby3//939x1110MHz6cuLg42l10
RaKT82tMc/lc7jnHFI3PP/+ctLQ0evbsyUMPPcS6det4+OGHCQ4O5tChQwBUVFTQ1VuXM5vghhtu
ACAoKIjRo0eTm5urRX6hoaGEhoYSHx8PwAMPPEB+fj7dunVzfG4XWrNmDbfccgtBQUEAWmw7gC1b
tnDbbbfRuXNn/P39uf/++9m4caNW2+/RRx9ly5Yt/Pvf/6ZTp0706dNHm+1Xp6l8GrteLiQkpMn3
cUzRmDNnDqWlpRQXF7NixQruvPNOli1bRnJycv1Vt2+88Qb33XefxZG2zZkzZzh16hQA3377LZ98
8gn9+vXTIr9u3brRvXt39uzZA0BmZiZ9+/Zl1KhRjs/tQsuXL68fmgK02HZg/CWek5PD2bNnUUqR
mZlJdHS0VtvvyJEjABw4cID333+fcePGabP96jSVT3JyMitWrKCqqori4mKKiooYdLlbPprSgTFZ
VlaWGjVqlFLKaNINGTLEkafFXWj//v0qNjZWxcbGqr59+6o5c+YopfTJb+vWrWrgwIHq5ptvVqNH
j1aVlZXa5KaUUqdPn1adO3dWJ0+erH9Mp/zmzZtXf8rtI488oqqqqrTK7/bbb1fR0dEqNjZWrVu3
Tinl7O2XkpKibrjhBhUQEKBCQ0PVP/7xj8vm89xzz6nevXuriIgIlZGRcdn3dvSd+4QQQniXY4an
hBBCWE+KhhBCiBaToiGEEKLFpGgIIYRoMSkaQgghWkyKhhBCiBb7/z0NMEjbQNfHAAAAAElFTkSu
QmCC
"
>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Box-and-Whisker-Plots">Box-and-Whisker Plots<a class="anchor-link" href="#Box-and-Whisker-Plots">¶</a></h3><p>Box-and-whisker plots are made with the <code>boxplot()</code> command:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [19]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">b</span> <span class="o">=</span> <span class="n">gal</span><span class="o">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s">"height"</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAETtJREFUeJzt3X9oVfUfx/HXKeVLP+Zuy3mHm3mXMnQ63dVZ2B+1sBkG
CwsVNHLzN0Q/ZNIPCvmuf9oiJG1BiMy8Jbik0KRIYuDtB2bL9OrCcqG7Cbld/5gT26wxvd8/PN6+
Zl6veu7OPZ/7fMANz36cvU/Yy0+vc+5HKx6PxwUAMMotbg8AAHAe4Q4ABiLcAcBAhDsAGIhwBwAD
Ee4AYKBrhntDQ4MmTZqksrIyLVq0SH/99Zd6enpUVVWlkpISzZ49W729vUMxKwAgRUnDPRqNatOm
TTpw4IDa29t1/vx5tbS0qLGxUVVVVero6NCsWbPU2Ng4VPMCAFKQNNxHjBih4cOHq7+/X4ODg+rv
79fo0aO1a9cu1dTUSJJqamq0c+fOIRkWAJCapOGel5enNWvW6J577tHo0aPl8/lUVVWlWCwmv98v
SfL7/YrFYkMyLAAgNcOSffLYsWNav369otGocnNzNX/+fG3duvWyr7EsS5Zl/ev3X+3jAABnXG0H
maThvn//fj3wwAO6++67JUlPPvmkvvvuOxUUFKi7u1sFBQXq6urSqFGjrvsHA5mmvr5e9fX1bo8B
pCzZAjppLTNhwgTt27dP586dUzweV2trq0pLS1VdXa1QKCRJCoVCmjt3rrMTAwBuStKV+9SpU7V4
8WJVVFTolltu0bRp07Ry5UqdPXtWCxYsUHNzswKBgLZv3z5U8wJpE41G3R4BcIyVzi1/LcuiloFn
rF+/XqtXr3Z7DCBlyTKWd6gCtvJygh3mINwBWzjs9gSAcwh3wBaNht0eAXBM0huqgOnC4b9X7KGQ
FAhc/HVl5cUX4FXcUAVs9fUXX4BXcEMVALIM4Q7YfL6w2yMAjiHcAVt5udsTAM6hcwcAj6JzB4As
Q7gDtjDvYoJBCHcAMBBvYoKRhvIviuG+EjIR4Q4jEbjIdtQygI3OHSYh3AHAQDznDgAexXPuAJBl
CHfAVlsbdnsEwDHUMoDNssKKxyvdHgNIWbKMJdwBm2VJ/HaFl9C5A0CWIdyBhLDbAwCOIdwBwECE
O2D7738r3R4BcAw3VAHAo7ihCqSAvWVgEsIdAAxELQMAHkUtAwBZhnAHbOwtA5NQywA29paB17C3
DJAC9paB19C5A0CWuWa4Hz16VMFgMPHKzc3Vhg0bVF9fr6KiosTHd+/ePRTzAmkUdnsAwDHXVctc
uHBBhYWFamtr0+bNm5WTk6O6urqrn5xaBh5C5w6vcayWaW1t1fjx4zVmzBjF43GCG0ZhbxmY5LrC
vaWlRQsXLpR08U+MpqYmTZ06VcuWLVNvb29aBgSGSn292xMAzkm5lhkYGFBhYaGOHDmi/Px8nTp1
Svn5+ZKktWvXqqurS83NzZef3LJUU1OjQCAgSfL5fCovL1dlZaWkv/fy4JjjTDhev349vz85zujj
SCSSWEhHo1GFQqGbfxTy008/1XvvvfevN06j0aiqq6vV3t5++cnp3OEh4XA48R8S4AWOdO7btm1L
VDKS1NXVlfj1jh07VFZWdhMjAu4j2GGSlFbufX19Gjt2rDo7O5WTkyNJWrx4sSKRiCzLUnFxsTZu
3Ci/33/5yVm5A0Da8A5VIAW1tWFt2VLp9hhAygh3IAU85w6vIdyBFLC3DLyGvWUAIMsQ7kBC2O0B
AMcQ7gBgIMIdsLG3DEzCDVUA8ChuqAIpuLSXB2ACwh0ADEQtAwAeRS0DAFmGcAdstbVht0cAHEMt
A9jYWwZew94yQArYWwZeQ+cOAFmGcAcSwm4PADiGcAcAAxHugI29ZWASbqgCgEdxQxVIAXvLwCSE
OwAYiFoGADyKWgYAsgzhDtjYWwYmoZYBbOwtA69hbxkgBewtA6+hcweALEO4AwlhtwcAHEO4A4CB
CHfAxt4yMAk3VAHAo7ihCqSAvWVgEsIdAAxELQMAHkUtAwBZJmm4Hz16VMFgMPHKzc3VO++8o56e
HlVVVamkpESzZ89Wb2/vUM0LpA17y8AkKdcyFy5cUGFhodra2tTU1KSRI0fqpZde0ptvvqnTp0+r
sbHxypNTy8BD2FsGXuNILdPa2qrx48drzJgx2rVrl2pqaiRJNTU12rlzpzOTAq6qdHsAwDHDUv3C
lpYWLVy4UJIUi8Xk9/slSX6/X7FY7KrfV1tbq0AgIEny+XwqLy9XZWWlpL8fPeOY40w4lsIKhzNn
Ho45/udxJBJJ1ODRaFTJpFTLDAwMqLCwUEeOHFF+fr7uuusunT59OvH5vLw89fT0XHlyahl4CLUM
vOama5kvvvhC06dPV35+vqSLq/Xu7m5JUldXl0aNGuXQqAAAJ6QU7tu2bUtUMpL0+OOPKxQKSZJC
oZDmzp2bnukASXl5F/daT/dLqhySn5OX5/a/UWSDa9YyfX19Gjt2rDo7O5WTkyNJ6unp0YIFC3Ti
xAkFAgFt375dPp/vypNTy8ABpv0lGqZdD9zD38QETxuqMAyHw/93czV9CHc4hXeoAkCWYeWOjGfa
Ste064F7WLkDQJYh3AHbpTeNACYg3AHAQHTuyHimddSmXQ/cQ+cOAFmGcAdsdO4wCeEOAAaic0fG
M62jNu164B46dwDIMoQ7YKNzh0kIdwAwEJ07Mp5pHbVp1wP30LkDQJYh3AEbnTtMQrgDgIHo3JHx
TOuoTbseuIfOHQCyDOEO2OjcYRLCHQAMROeOjGdaR23a9cA9dO4AkGUId8BG5w6TDHN7AOBa4rIk
y+0pnBP/v38C6ULnjoxnWkdt2vXAPXTuAJBlCHfARucOkxDuAGAgOndkPNM6atOuB+6hcweALEO4
AzY6d5iEcAcAA9G5I+OZ1lGbdj1wz0117r29vZo3b54mTpyo0tJS7du3T/X19SoqKlIwGFQwGNTu
3bsdHxoAcOOuuXKvqanRQw89pKVLl2pwcFB9fX1av369cnJyVFdXl/zkrNzhgKFa6YbDYVVWVqb9
57Byh1OSZWzSvWXOnDmjb775RqFQ6OIXDxum3NxcSSK0ASCDJQ33zs5O5efna8mSJTp06JCmT5+u
DRs2SJKampr0wQcfqKKiQuvWrZPP5/vXc9TW1ioQCEiSfD6fysvLE6ujS08ncMxxsmNpaH7epY+Z
cj0cm3cciUTU29srSYpGo0omaS2zf/9+zZw5U3v37tWMGTO0evVqjRgxQs8995xGjhwpSVq7dq26
urrU3Nx85cmpZeAA02oM064H7rnhG6pFRUUqKirSjBkzJEnz5s3TgQMHlJ+fL8uyZFmWli9frra2
NuenBobY3ytrwPuShntBQYHGjBmjjo4OSVJra6smTZqk7u7uxNfs2LFDZWVl6Z0SAHBdrvm0zKFD
h7R8+XINDAxo3Lhx2rx5s55//nlFIhFZlqXi4mJt3LhRfr//ypNTy8ABptUYpl0P3JMsY3kTEzKe
aWFo2vXAPWwcBqSAzh0mIdwBwEDUMsh4ptUYpl0P3HPD71AFMoVluT2Bc+66y+0JkA0Id2S8oVrl
WlZY8Xjl0PwwIM3o3AHAQHTugI0uHF7Do5AAkGUIdyAh7PYAgGMId8BWU+P2BIBz6NwBwKPo3AEg
yxDugI29ZWASwh0ADETnDgAeRecOpKC+3u0JAOewcgds7C0Dr2HlDgBZhpU7YGNvGXgNK3cAyDKE
O5AQdnsAwDGEO2BjbxmYhM4dADyKzh0AsgzhDtjYWwYmIdwBwEB07gDgUXTuQArYWwYmYeUO2Nhb
Bl7Dyh0Asgwrd8DG3jLwGlbuAJBlCHcgIez2AIBjCHfAxt4yMMk1w723t1fz5s3TxIkTVVpaqu+/
/149PT2qqqpSSUmJZs+erd7e3qGYFUirLVsq3R4BcMw1w/2FF17QY489pp9//lmHDx/WhAkT1NjY
qKqqKnV0dGjWrFlqbGwcilkBAClK+rTMmTNnFAwGdfz48cs+PmHCBH311Vfy+/3q7u5WZWWlfvnl
lytPztMy8JBwOKzKykq3xwBSdsNPy3R2dio/P19LlizRtGnTtGLFCvX19SkWi8nv90uS/H6/YrGY
81MDAG7YsGSfHBwc1IEDB/Tuu+9qxowZWr169RUVjGVZsizrqueora1VIBCQJPl8PpWXlydWR5d2
4eOY40w4vvSxTJmHY47/eRyJRBL3OKPRqJJJWst0d3dr5syZ6uzslCR9++23amho0PHjx7Vnzx4V
FBSoq6tLDz/8MLUMPK++nv1l4C03XMsUFBRozJgx6ujokCS1trZq0qRJqq6uVigUkiSFQiHNnTvX
4ZGBoff662G3RwAcc83tBw4dOqTly5drYGBA48aN0/vvv6/z589rwYIFOnHihAKBgLZv3y6fz3fl
yVm5w0PYOAxekyxj2VsGsLG3DLyGvWUAIMsQ7kBC2O0BAMcQ7oCNvWVgEjp3APAoOncAyDKEO2C7
9I5AwASEOwAYiM4dADyKzh1IAfvKwCSs3AEb2w/Aa1i5A0CWYeUO2NhbBl7Dyh0AsgzhDiSE3R4A
cAzhDtjYWwYmoXMHAI+icweALEO4Azb2loFJCHcAMBCdOwB4FJ07kAL2loFJWLkDNvaWgdewcgeA
LMPKHbCxtwy8hpU7AGQZwh1ICLs9AOAYwh2wsbcMTELnDgAeRecOAFlmmNsDAOlgWdaQ/Sz+7xSZ
iJU7jBSPx6/7tWfPnhv6PiAT0bkDgEfRuQNAliHcARv7ucMkKYV7IBDQlClTFAwGdd9990mS6uvr
VVRUpGAwqGAwqN27d6d1UCDdIpGI2yMAjknpaRnLshQOh5WXl3fZx+rq6lRXV5e24YCh1Nvb6/YI
gGNSrmX+rbTnZikAZKaUwt2yLD3yyCOqqKjQpk2bEh9vamrS1KlTtWzZMlY98LxoNOr2CIBz4ik4
efJkPB6Px0+dOhWfOnVq/Ouvv47HYrH4hQsX4hcuXIi/9tpr8aVLl17xfZJ48eLFi1caX1dz3c+5
v/7667rzzju1Zs2axMei0aiqq6vV3t5+PacCAKTJNWuZ/v5+nT17VpLU19enL7/8UmVlZeru7k58
zY4dO1RWVpa+KQEA1+WaT8vEYjE98cQTkqTBwUE99dRTmj17thYvXqxIJCLLslRcXKyNGzemfVgA
QGrSuv0AMNSutyLcuHGjbr/9dj399NNX/ZotW7boxx9/VFNT0xWfe+ONN/Tqq6/e8LxAuvAOVWS1
VatWJQ12KfkOkw0NDU6PBDiCcIdxzp8/r5UrV2ry5Ml69NFH9eeff+rYsWOaM2eOKioq9OCDD+ro
0aOSLr7Tet26dZKkH374IfFO7BdffDFxHykej+vkyZOaM2eOSkpK9PLLL0uSXnnlFZ07d07BYPCa
f0AAQ41wh3F+/fVXPfvss/rpp5/k8/n0ySefaNWqVWpqatL+/fv11ltv6ZlnnpF0cVV+aWW+ZMkS
bdq0SQcPHtSwYcMuW7FHIhFt375d7e3t+uijj/T777+rsbFRt912mw4ePKgPP/zQlWsFroa/rAPG
KS4u1pQpUyRJ06dPVzQa1d69ezV//vzE1wwMDFz2PWfOnNEff/yh+++/X5K0aNEiffbZZ4nPz5o1
Szk5OZKk0tJS/fbbbyosLEz3pQA3jHCHcf7zn/8kfn3rrbcqFovJ5/Pp4MGDKZ/jn88Z/POcg4OD
Nz8okEbUMjDeiBEjdO+99+rjjz+WdDG4Dx8+nPh8PB5Xbm6ucnJy1NbWJklqaWlJ6dzDhw8n6JGR
CHcY559Pt1iWpa1bt6q5uVnl5eWaPHmydu3adcXXNzc3a8WKFQoGg+rv71dubm7i81d7YmblypWa
MmUKN1SRcXjOHbD19fXpjjvukCQ1NjYqFovp7bffdnkq4MbQuQO2zz//XA0NDRocHFQgENCWLVvc
Hgm4YazcAcBAdO4AYCDCHQAMRLgDgIEIdwAwEOEOAAYi3AHAQP8DJL3+EoSWCWoAAAAASUVORK5C
YII=
"
>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The median is represented by the red line in the middle. Outliers, if any, are marked by <code>x</code><span class="quo">‘</span>s outside the whiskers.</p>
<p>Note that instead of using <code>gal.height</code>, the <code>boxplot()</code> function operatres on the data frame, and takes the column name via the <code>column</code> argument. This is because the real power of the box-and-whisker plot is in <em>comparing</em> distributions. This will be raised again more systematically in later tutorials, but just to illustrate, here is how to compare the heights of males and females:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [20]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">b</span> <span class="o">=</span> <span class="n">gal</span><span class="o">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s">"height"</span><span class="p">,</span> <span class="n">by</span><span class="o">=</span><span class="s">"sex"</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_png output_subarea ">
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXcAAAEYCAYAAACnYrZxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz
AAALEgAACxIB0t1+/AAAHwVJREFUeJzt3X1UFXX+B/D3+NCmhgKFkKBdS0WFCxdFysq6LIJl0WKZ
iplg6VYbpmtm27ptmLsru2ctK39bHtaC1kfqrFsuGxUr41OaJSCUFh3hapuAKIKID6j3+/uDuEnA
hUvMDDPzfp1zd537MPMhxg9f3/OdL5IQQoCIiAylh9YFEBFR12NzJyIyIDZ3IiIDYnMnIjIgNnci
IgNicyciMiA2d+qQnj17IiIiAjabDWPHjsWePXu6dP+yLCM+Pt7te7Zv397lx1WDxWJBdXV1i+ev
ueYaDaohs+ildQGkD3379kVBQQEA4KOPPsJzzz0HWZZVrSEvLw9eXl4YP358pz7fdEuHJEldWVa7
2jqe2nWQuXDkTh6rra2Fr68vgMaG+cwzz8BqtSIsLAxZWVkAgIULF2L58uUAgA8//BB33nknhBBI
Tk7G448/jnHjxiE4OBjZ2dkt9l9dXY2EhASEh4dj/PjxKC4uhsPhwJo1a/Dyyy8jIiICu3btavaZ
qqoqxMbGIjQ0FPPmzXONlh0OB4KDg5GUlASr1Ypvv/221Xp//C+HlJQUZGZmAmgceT/77LMICwvD
zTffjMOHD7uOOXXqVERFRSEqKgqffPIJAODkyZOIi4tz1eLuPsFFixYhNDQUEydOxIkTJ3D48GGM
HTvW9fo333zTbLvJq6++ipCQEISHhyMxMREAUF9fj0ceeQQ333wzxowZg/fff7/N7wWZgCDqgJ49
ewqbzSZGjhwpBgwYIPLz84UQQrz77rsiNjZWOJ1OUVlZKYYMGSIqKirE2bNnRUhIiNi2bZsIDg4W
paWlQgghkpKSxN133y2EEOKbb74RQUFB4vz58yIvL0/ce++9QgghUlJSxIsvviiEEGLbtm3CZrMJ
IYRITU0VK1eubLW+J598UqSlpQkhhMjJyRGSJImTJ0+KsrIy0aNHD/Hpp5+2WW95eXmz4zfVkJmZ
KYQQwmKxiD/96U9CCCHefvtt1/sSExPFrl27hBBCHDlyRIwaNUoIIcT8+fPF8uXLhRBCZGdnu2r5
MUmSxIYNG4QQQrz44osiJSVFCCFEdHS0KCwsFEII8dxzz4nVq1e3+OygQYNEQ0ODEEKI2tpa13vX
rVsnhBDi1KlTYsSIEeLs2bNtfi/I2Dhypw7p06cPCgoKcOjQIeTk5ODhhx8GAOzatQszZ86EJEkY
OHAg7rzzTuzbtw99+vRBeno6YmNjMX/+fAwdOhRAYxQxbdo0AMCwYcNw44034quvvmp2rN27d7v2
Hx0djZMnT6Kurg4A2hwF7969GzNmzAAATJo0CT4+Pq7XbrjhBkRFRbne9+N6P/vss3YjkqbR8YwZ
M1y5f25uLlJSUhAREYFf/OIXqKurQ319PXbu3IlZs2YBACZPntysliv16NED06dPBwDMmjXL9a+R
uXPn4q233oLT6URWVhZmzpzZ4rNhYWGYOXMm1q9fj549ewJojMvS0tIQERGB6OhoXLhwAUePHm3z
e0HGxsydPHbLLbfgxIkTqKqqgiRJzRquEMLVKIuKiuDn54fvvvvO7f569Gg5xmiribvT1mf69evn
9n2SJKFXr15wOp2u586dO9fmcZq+PiEEPv30U1x11VUdrqUtV/53u//++7Fs2TL8/Oc/R2RkZKs/
HLKzs7Fjxw5s3boVf/zjH1FcXAwA+Oc//4nhw4e3eH9HvxdkHBy5k8e++uorOJ1OXHfddZgwYQI2
b94Mp9OJqqoq7Ny5E1FRUThy5AheeuklFBQU4IMPPsC+ffsANDaxd955B0IIHD58GKWlpQgODm62
/wkTJmD9+vUAGrNwPz8/eHl5wcvLyzWC/7HbbrvNlZ9/9NFHOHXqVKvv+3G9O3bsQFRUFIYMGYKD
Bw+ioaEBNTU12LZtW7PPbd682fX/t956KwAgLi4Or776qus9Bw4cAADccccd2LBhAwDggw8+aLMW
p9OJd955BwCwYcMGTJgwAQBw9dVXY9KkSXjiiScwZ86cFp8TQuDo0aOw2+1IS0tDbW0tzpw5g0mT
JjWrp+kCeFvfCzI4bdIg0pumzN1ms4nw8HDxn//8x/XaM888I0JDQ4XVahVZWVlCCCEmTpwotm7d
KoQQYv/+/cJqtYrz58+L5ORk8fjjj4vIyEgxYsQIkZ2dLYQQQpZlER8fL4QQorq6WiQkJIiwsDAx
fvx4UVxcLIQQoqSkRISFhQmbzebKupscP35cxMTEiNDQUDFv3jxx/fXXi4aGBlFWViasVmuz97ZW
rxBCLFmyRAwfPlzExcWJBx54oFnm/uyzz4qwsDARFRUlDh8+LIQQ4sSJE2L69OkiLCxMjB49Wjzx
xBNCCCFOnjwp4uLiREhIiJg3b56wWCytZu7XXHONWLRokQgNDRUxMTHixIkTrtf27NkjgoKChNPp
bPG5ixcvittvv11YrVYRGhoq/vznPwshhDh37px47LHHhNVqFSEhIa7/nq19Ly5cuNDWt5oMQhKC
S/6SeubMmYP4+Hjcf//9XbZPi8WCN954A7GxsejZsyf27NmDJ598Evn5+W4/Fxoair/97W+44447
3L5v6NChuHTpEjIyMhATE9Nldbvz17/+FXV1dVi2bJkqxyPjYeZOuidJEqqqqjBu3Dg4nU5cddVV
SE9Pb/dzX3zxRYf33/RojSzLePjhh/Htt996VHdbpkyZgrKyshbREJEn2NxJVW+99ZYi+w0MDGx3
pN5ZpaWlqs4w2bJli2rHIuPiBVUyhIKCAoSHh8Pb2xszZszAhQsXAAD//ve/YbPZ4OPjg9tuu801
qwRojHP++9//AmicHZOUlARfX1+MHj0af/nLXzB48OB2j1FfX4+7774bx44dg5eXF/r374+Kigr1
vnCiNrC5k+6J72fgfPjhhygrK0NRUREyMjJQUFCARx99FOnp6aiursZjjz2G++67DxcvXgSAZlHL
smXLcPToUZSVleHjjz/GunXrmsUwbR2jX79+yMnJwaBBg1BXV4fTp08jICBAk/8ORFdicyfdkyQJ
Tz31FAICAuDj44P4+HgUFhYiPT0djz32GMaNGwdJkjB79mz87Gc/w969e1vs45133sFvf/tbDBgw
AIGBgViwYEGzueptHQPo3Jx8IqWxuZMhXDla7tu3L86cOYMjR45g5cqV8PHxcT3+97//4dixYy0+
f+zYsWYxTFBQkNtj9OnTB2fOnOnir4Ko67C5k2ENHjwYS5cuxalTp1yPM2fOuG75v9L111/fbLaL
JzNfuLojdUds7mQ4TTHJvHnz8MYbb2Dfvn0QQqC+vh7Z2dmtjrinTZuGFStWoKamBt999x1Wr17d
4abt7++PkydP4vTp0136dRD9FGzuZDhNF0rHjh2L9PR0pKSkwNfXF8OHD8fbb7/datP+/e9/j6Cg
IAwdOhRxcXF48MEHW10z5sfHAICRI0ciMTERN954I3x9fTlbhroF3qFK1IrXX38dWVlZyMvL07oU
ok5pd+S+YsUKhISEwGq1YubMmbhw4QKqq6sRGxuLESNGIC4uDjU1NWrUSqSYiooK7N69G06nE19/
/TVeeuklTJkyReuyiDrNbXN3OBxIT09Hfn4+iouLcfnyZWzatAlpaWmIjY1FSUkJYmJikJaWpla9
RIpoaGjA448/jv79+yMmJgYJCQn41a9+pXVZRJ3mdvmB/v37o3fv3jh79ix69uyJs2fPYtCgQVix
YgW2b98OAEhKSnItPUqkV0OGDGl29yqR3rkdufv6+uLpp5/GkCFDMGjQIHh7eyM2NhaVlZXw9/cH
0DhToLKyUpViiYioY9yO3A8fPoxVq1bB4XBgwIABePDBB7Fu3bpm73G3Wh7n/xIRKautOTFum/vn
n3+OW2+9Fddeey2Axl//tWfPHgQEBKCiogIBAQEoLy/HwIEDPT4wdV5qaipSU1O1LoOow3jOKsPd
ANptLDNy5Ejs3bsX586dgxACubm5GD16NOLj45GZmQkAyMzMREJCQtdWTEREP4nbkXt4eDhmz56N
yMhI9OjRA2PGjMEvf/lL1NXVYdq0aVi7di0sFovrd1eSOhwOh9YlEHmE56z62v1lHUuWLMGSJUua
Pefr64vc3FzFiiL3bDab1iUQeYTnrPoUvUNVkiRm7kRECnHXY7m2DBEpTpa1rsB82Nx1SObfFNKZ
jAxZ6xJMh82diMiA2r2gSt2P3W7XugSidsnyD3FMZqYdFkvjn+32xgcpi82diBTx4ybOe5jUxVhG
h5i5k944HLLWJZgOmzsRKY7T3NXHee5ERDrFee5EpCkmiepjc9chZu6kN5znrj42dyIiA+JUSB3i
PHfSA85z1xabOxEpgvPctcVYRoeYuZPecJ67+tjciUhxnOeuPs5zJyLSKc5zJyIyGTZ3HWLmTnrD
c1Z9bO5ERAbEzJ2ISKeYuRMRmQybuw4xvyS9WbVK1roE02FzJyLFFRZqXYH5sLnrENeWIb2xWOxa
l2A6XFuGiBRx5cJhy5b98DwXDlMHZ8vokCzLHL2TriQny8jIsGtdhuFwtgwRkclw5E5EipNlRjFK
cNdjmbkTUZeQJKlTn+MAUBmMZXSI89ypOxJCtPnIy8tr8zVSBps7ESkuI0PrCsyHmTsRKU6SALaC
rsfZMkREJtNuc//6668RERHhegwYMACvvPIKUlNTERQU5Ho+JydHjXoJzNxJj2StCzAdj2IZp9OJ
wMBA7Nu3D2+++Sa8vLywaNGitnfOWEYRvImJ9EaSZAhh17oMw+myWCY3NxfDhg3D4MGDeaVbQ2zs
pD92rQswHY+a+6ZNm5CYmAig8SfGa6+9hvDwcDz66KOoqalRpEAi0r8XXtC6AvPpcCzT0NCAwMBA
HDx4EH5+fjh+/Dj8/PwAAM8//zzKy8uxdu3a5juXJCQlJcFisQAAvL29YbPZXCPPpuyY255tNz3X
XerhNrfb2/7xuat1PXrdLiwsdA2kHQ4HMjMz20xQOtzc33vvPbz++uutXjh1OByIj49HcXFx850z
c1eEzMyddIbnrDK6JHPfuHGjK5IBgPLycteft2zZAqvV+hNKJE/wLwnpDc9Z9XVo5F5fX48bbrgB
ZWVl8PLyAgDMnj0bhYWFkCQJQ4cOxZo1a+Dv79985xy5ExEpxl2P5R2qOsR/4pLe8JxVBu9QJSJN
cW0Z9XHkTkSK49oyyuDInYjIZNjcdejKOcNE+iBrXYDpsLkTERkQM3ciUhwzd2UwcyciTXFtGfWx
uesQM3fSG7td1roE02FzJyIyIGbuREQ6xcydiMhk2Nx1iJk76Q3PWfWxuROR4ri2jPqYuROR4jjP
XRnM3ImITIbNXYeYX5L+yFoXYDps7kREBsTMnYgUx8xdGczciUhTXFtGfWzuOsTMnfSGa8uoj82d
iMiAmLkTEekUM3ciIpNhc9chZu6kNzxn1cfmTkSK49oy6mPmTkSK4zx3ZTBzJyIyGTZ3HWJ+Sfoj
a12A6bC5ExEZEDN3IlIcM3dlMHMnoi7j69vYrD15AJ5/xtdX269T79jcdWjKFFnrEsjETp1qHIV7
8sjLkz3+zKlTWn+l+sbmrkN79mhdARF1d8zcdchiARwOrasgs1IrP2dO3z5m7gaQktLY1C0W4MiR
H/6ckqJtXUTUPbkduX/99deYMWOGa7u0tBTLly/HrFmzMH36dBw5cgQWiwVZWVnw9vZuuXOO3BUR
ECCjosKudRlkUp0ZUcuyDLvdrvhxzKbTI/fg4GAUFBSgoKAA+/fvR9++fTFlyhSkpaUhNjYWJSUl
iImJQVpamiKFExFR53Q4lsnNzcWwYcMwePBgvP/++0hKSgIAJCUl4V//+pdiBVJLU6fatS6ByCOe
jtrpp+vV0Tdu2rQJiYmJAIDKykr4+/sDAPz9/VFZWdnm55KTk2GxWAAA3t7esNlsrm9002303G65
LTVNDm7D//1f68/n5eV1i/q5bdxtQJ3jATJkWfuvtzttFxYWoqamBgDgaGdWRYdmyzQ0NCAwMBAH
Dx6En58ffHx8cOqKSai+vr6orq5uuXNm7oqQJBlC2LUug0yKmXv38ZNny3zwwQcYO3Ys/Pz8ADSO
1isqKgAA5eXlGDhwYBeVSkREXaFDzX3jxo2uSAYA7rvvPmRmZgIAMjMzkZCQoEx11KoXXrBrXQKR
RzwdtdNP124sU19fjxtuuAFlZWXw8vICAFRXV2PatGk4evQop0ISmQxvYuo+3PVY3qGqQ53JL4m6
CjP37oN3qBIRmQxH7kTkmXam6XYp9g+33PXYDs9zJyICAAlCvcxd+cMYFmMZHUpOlrUugcgjP9wA
RWphc9eh72ehEhG1iZm7DnEWAWmJUyG7D86WISIyGTZ3XZK1LoDII8zc1cfmTkRkQGzuOsS1ZUhv
eEe1+nhBlYg8wguq3QcvqBoM80vSG56z6mNzJyIyIMYyROQRxjLdB2MZIiKTYXPXIa4tQ3rDzF19
jGV0iL8gm7TUuRV/ZQB2jz7h4wNUV3fmWObB38RkMMwiSW94ziqDmTsRkcmwueuSrHUBRB6StS7A
dNjciYgMiM1dh7i2DOmPXesCTIfNXYdSU7WugMgzL7ygdQXmw+auQ5wzTHpjt8tal2A6bO5ERAbE
ee5ERDrFee5ERCbD5q5DXFuG9IbXidTH5q5DmZlaV0DkmYwMrSswH2buOsR1OkhveM4qg5k7EZHJ
sLnrkqx1AUQekrUuwHTY3ImIDIiZu8Z8fYFTp5Q/Dn/xAWmJmbsyflLmXlNTg6lTp2LUqFEYPXo0
9u7di9TUVAQFBSEiIgIRERHIycnp8qLN4tSpxpNe6YcaP0CI2sK1ZdTXbnNfsGABJk+ejEOHDqGo
qAijRo2CJElYtGgRCgoKUFBQgLvuukuNWul7nDNMesO1ZdTXy92LtbW12LlzJzK/n1jdq1cvDBgw
AAAYtxARdWNum3tZWRn8/PwwZ84cHDhwAGPHjsUrr7wCAHjttdfw9ttvIzIyEitXroS3t3er+0hO
TobFYgEAeHt7w2azwW63A/hhBGr27aa1rrtLPdzmdldv2+32blWPXrcLCwtRU1MDAHA4HHDH7QXV
zz//HOPHj8cnn3yCcePGYeHChejfvz/mz5+P6667DgDw/PPPo7y8HGvXrm25c15QbZdaF5p4QYvI
eDp9QTUoKAhBQUEYN24cAGDq1KnIz8+Hn58fJEmCJEmYO3cu9u3b1/VVU5uafqIT6QXPWfW5be4B
AQEYPHgwSkpKAAC5ubkICQlBRUWF6z1btmyB1WpVtkoi0jWuLaO+due5HzhwAHPnzkVDQwNuuukm
vPnmm3jqqadQWFgISZIwdOhQrFmzBv7+/i13zlimXYxlyAx4/inDXY/lTUwaY3MnM+D5pwwuHGYw
zC9Jf2StCzAdNnciIgNiLKMxxjJkBjz/lOGux7q9iYmUJyABkhrH+eF/idTGtWXUx1hGYxI8XwVM
zsvz+DMSGztpiGvLqI/NnYjIgJi5a4yZOxF1FqdCEhGZDJu7DnGeO+kNz1n1sbkTkeK4toz6mLlr
jJk7mQHPP2UwcyciMhk2dx1ifkn6I2tdgOmwuRMRGRAzd40xcycz4PmnDGbuRKQpri2jPo7cNSZ1
atEwGYDdo0/4+ADV1Z05FlHHSJ07mdkjfgKO3LsxD9f/cv3T1tPPsLGT0oQQbT7y8vLafI2UwZG7
DjG/JCKAI3ciItNhc9clWesCiDzCezPUx+ZORGRAbO469MILdq1LIPKI3W7XugTT4QVVIiKd4gVV
g2F+SXrDc1Z9bO5ERAbEWIaISKcYyxARmQybuw4lJ8tal0DkEWbu6mNz16HMTK0rIKLujpm7DnFt
GSICmLkTEZkOm7suyVoXQOQRZu7qY3MnIjKgdpt7TU0Npk6dilGjRmH06NH49NNPUV1djdjYWIwY
MQJxcXGoqalRo1b6HteWIb3h2jLqa7e5L1iwAJMnT8ahQ4dQVFSEkSNHIi0tDbGxsSgpKUFMTAzS
0tLUqJW+l5qqdQVEnmEqoz63s2Vqa2sRERGB0tLSZs+PHDkS27dvh7+/PyoqKmC32/HVV1+13Dln
yyhClmWOhEhXkpNlZGTYtS7DcDo9W6asrAx+fn6YM2cOxowZg3nz5qG+vh6VlZXw9/cHAPj7+6Oy
srLrqyYiok7r5e7FS5cuIT8/H6tXr8a4ceOwcOHCFhGMJEluf+t5cnIyLBYLAMDb2xs2m8016my6
gs5tbnPbeNurVskoLAQsFjsyM+1omuWVnGyH3a59fXrcLiwsdF3jdDgccMdtLFNRUYHx48ejrKwM
ALBr1y6sWLECpaWlyMvLQ0BAAMrLyxEdHc1YhojalJrKa0VK6HQsExAQgMGDB6OkpAQAkJubi5CQ
EMTHxyPz+3vgMzMzkZCQ0MUlkztcW4b0xuGQtS7BdNpdfuDAgQOYO3cuGhoacNNNN+Gtt97C5cuX
MW3aNBw9ehQWiwVZWVnw9vZuuXOO3BUhSTKEsGtdBlGHrVolY+FCu9ZlGI67Hsu1ZXSIa8sQEcC1
ZYiITIfNXZdkrQsg8kjTzA9SD5s7EZEBsbnrENeWIb1pmqtN6uEFVSIineIFVYNhfkl6w3NWfWzu
REQGxFiGiEinGMsQEZkMm7sOcW0Z0htm7upjc9eh79dsIyJqEzN3HeLaMkQEuO+xbn9ZB2nH3S9A
aXy99ef5w5SIAMYy3ZYQos3Hyy/ntfkaUXfEzF19bO46VFiodQVE1N2xueuQxWLXugQij3BtGfUx
c9cJWW58AMCyZT88b7c3PoiIrsTZMjqUnCwjI8OudRlEHSbLMkfvCuAdqgZz8KDWFRBRd8eRuw7Z
bLyoSkQcuRuOt7fWFRBRd8fmrhOrVv1w8XT7dtn151WrtK2LqCM4z119nC2jEwsXNj6AxliGf1eI
yB1m7jpkt7O5ExEzd8NJSNC6AiLq7tjcdchmk7UugcgjzNzVx+auQ5wGSUTtYXPXoZoau9YlEHmE
d6eqj82diMiAOBVSJ5ovHCYDsAPgwmGkD1xbRn1s7jpxZRN3OIDUVO1qIaLuj7GMDnE9d9IbjtrV
x+auQ/x7QkTtYXPXJVnrAog8wnnu6utQc7dYLAgLC0NERASioqIAAKmpqQgKCkJERAQiIiKQk5Oj
aKH0g0JOdCed4Tmrvg5dUJUkCbIsw9fXt9lzixYtwqJFixQrjlpXU1OjdQlEHuE5q74OxzKtLU7D
RcGIiLqnDjV3SZIwceJEREZGIj093fX8a6+9hvDwcDz66KP8yawih8OhdQlEHuE5qwHRAceOHRNC
CHH8+HERHh4uduzYISorK4XT6RROp1MsXbpUPPLIIy0+B4APPvjggw8FH23xeD33ZcuW4ZprrsHT
Tz/tes7hcCA+Ph7FxcWe7IqIiBTSbixz9uxZ1NXVAQDq6+vx0UcfwWq1oqKiwvWeLVu2wGq1Klcl
ERF5pN3ZMpWVlZgyZQoA4NKlS3jooYcQFxeH2bNno7CwEJIkYejQoVizZo3ixRIRUcco+mv2qGv1
7NkTYWFhru333nsPQ4YM0bAiotb16NEDDz30EP7xj38AaBwYXn/99bjllluwdetWjaszBy4cpiN9
+/ZFQUGB1mUQtatfv3748ssvcf78eVx99dX4+OOPERQUBEmStC7NNLj8ABEpYvLkycjOzgYAbNy4
EYmJibw3RkVs7jpy7tw513IPDzzwgNblELk1ffp0bNq0CRcuXEBxcTFuvvlmrUsyFcYyOtKnTx/G
MqQbVqsVDocDGzduxD333KN1OabD5k5EirnvvvuwePFibN++HVVVVVqXYyps7kSkmEceeQQ+Pj4I
CQnhsr8qY+auI5xpQHrRdK4GBgYiJSXF9RzPYfVwnjsRkQFx5E5EZEBs7kREBsTmTkRkQGzuREQG
xOZORGRAbO5kaA6HA3369MGYMWO6bJ/R0dHw8vLC/v37u2yfRF2NzZ0Mb9iwYcjPz++y/eXl5SEy
MpJztqlbY3Mn06ivr8c999wDm80Gq9WKrKwsAMD+/ftht9sRGRmJu+66CxUVFaitrcXIkSNRUlIC
AEhMTMTf//53Lcsn8giXHyDTyMnJQWBgoGsZ2tOnT+PixYuYP38+tm7dimuvvRabN2/G0qVLsXbt
WqxevRrJycl46qmnUFtbi7lz52r8FRB1HJs7mUZYWBgWL16M3/zmN7j33ntx++2344svvsCXX36J
iRMnAgAuX76MQYMGAQAmTpyIrKwspKSkoKioSMvSiTzG5k6mMXz4cBQUFCA7Oxu/+93vEBMTgylT
piAkJASffPJJi/c7nU4cOnQI/fr1Q3V1tavpE+kBM3cyjfLyclx99dV46KGHsHjxYhQUFCA4OBhV
VVXYu3cvAODixYs4ePAgAODll19GSEgI1q9fjzlz5uDSpUtalk/kEY7cyTSKioqwZMkS9OjRA717
98Ybb7yB3r17491333Xl6pcuXcKvf/1r9OrVC2vXrsVnn32Gfv364Y477sAf/vAHpKamav1lEHUI
V4UkQ3M4HIiPj0dxcXGX7jc6OhorV67s0vnzRF2JsQwZWq9evVBbW9vlNzGVlZWhd+/eXbZPoq7G
kTsRkQFx5E5EZEBs7kREBsTmTkRkQGzuREQGxOZORGRA/w+s9D/FODtmvwAAAABJRU5ErkJggg==
"
>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Displays-of-Categorical-Variables">Displays of Categorical Variables<a class="anchor-link" href="#Displays-of-Categorical-Variables">¶</a></h3><p>For categorical variables, it makes no sense to compute descriptive statistics such as the mean, standard deviation, or variance. Instead, look at the number of cases at each level of the variable.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [21]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">gal</span><span class="o">.</span><span class="n">sex</span><span class="o">.</span><span class="n">value_counts</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[21]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>M 465
F 433
dtype: int64</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Proportions can be found in a similar way (use <code>float()</code> to return a ‘float’ rather than an ‘integer’):</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [22]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">gal</span><span class="o">.</span><span class="n">sex</span><span class="o">.</span><span class="n">value_counts</span><span class="p">()</span><span class="o">/</span><span class="nb">float</span><span class="p">(</span><span class="n">gal</span><span class="o">.</span><span class="n">sex</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[22]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>M 0.517817
F 0.482183
dtype: float64</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Reference">Reference<a class="anchor-link" href="#Reference">¶</a></h3><p>As with all ‘Statistical Modeling: A Fresh Approach for Python’ tutorials, this tutorial is based directly on material from <a href="http://www.mosaic-web.org/go/StatisticalModeling/">‘Statistical Modeling: A Fresh Approach (2nd Edition)’</a> by <a href="http://www.macalester.edu/~kaplan/">Daniel Kaplan</a>. This tutorial is based on Chapter 3: Describing Variation.</p>
<p>I have made an effort to keep the text and explanations consistent between the original (R-based) version and the Python tutorials, in order to keep things comparable. With that in mind, any errors, omissions, and/or differences between the two versions are mine, and any questions, comments, and/or concerns should be <a href="mailto:carson.farmer@gmail.com">directed to me</a>.</p>
</div>
</div>
</div></p>MSc Positions at University of Victoria2013-11-11T11:57:00-05:00cfarmertag:carsonfarmer.com,2013-11-11:2013/11/spar-msc-positions-2014/<p>Two MSc positions are available at the <a href="http://www.uvic.ca/">University of Victoria</a> in the
<a href="http://geography.uvic.ca/">Department of Geography</a>‘s
<a href="http://www.geog.uvic.ca/spar">Spatial Pattern Analysis and Research (<span class="caps">SPAR</span>) Lab</a>.
Students will be involved in the development of the thesis topic with potential
research areas including web mapping (e.g., bike accidents), mining and mapping
social media (people’s hunting activities), spatial environmental modelling,
and spatial ecological research.</p>
<p>Preference will be given to students with experience in <span class="caps">GIS</span>, spatial analysis,
spatial statistics, programming, and/or statistics. Funding includes a graduate
student stipend, as well as, support through teaching assistantships, research
assistantships, and internal fellowships. Students can anticipate funding of
about $15K per year. Start date is <strong>September 2014</strong>. </p>
<p>If interested, please contact <a href="mailto:trisalyn@uvic.ca">Dr. Trisalyn Nelson</a> by <strong>January
15th, 2014</strong>. </p>Data: Cases, Variables, Samples2013-11-09T12:00:00-05:00cfarmertag:carsonfarmer.com,2013-11-09:2013/11/statistical-modeling-python-data/<p>The second in a <a href="http://www.carsonfarmer.com/category/statistical-modeling-for-python.html">series of tutorials</a> on using Python for introductory
statistical analysis, this tutorial covers <strong>data</strong>, including cases, variables,
samples, and a whole lot more. As always, the <code>iPython Notebook</code> associated with
this tutorial is <a href="https://github.com/cfarmer/stat-mod-fresh-approach-python">available here on github</a>.</p>
<p>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Data used in statistical modeling are usually organized into tables, often created using spreadsheet software. Most people presume that the same software used to create a table of data should be used to display and analyze it. This is part of the reason for the popularity of spreadsheet programs such as ‘Excel’ and ‘Google Spreadsheets’.</p>
<p>For serious statistical work, it’s helpful to take another approach that strictly separates the processes of data collection and of data analysis: use one program to create data files and another program to analyze the data stored in those files. By doing this, one guarantees that the original data are not modified accidentally in the process of analyzing them. This also makes it possible to perform many different analyses of the data; modelers often create and compare many different models of the same data.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Reading-Tabular-Data-into-Python">Reading Tabular Data into Python<a class="anchor-link" href="#Reading-Tabular-Data-into-Python">¶</a></h2><p>Data is central to statistics, and the tabular arrangement of data is very common. Accordingly, Python provides a large number of ways to read in tabular data. These vary depending on how the data are stored, where they are located, etc. To help keep things as simple as possible, the ‘pandas’ Python library iprovides an operator, <code>read_csv()</code> that allows you to access data files in tabular format on your computer as well as data stored in repositories such as the one associated with the ‘Statistical Modeling: A Fresh Approach’ book, or one that a course instructor might set up for his or her students.</p>
<p>The ‘pandas’ library is <a href="http://pandas.pydata.org/">available here</a>, and you can follow these <a href="http://pandas.pydata.org/pandas-docs/stable/install.html">installation instructions</a> to get it working on your computer (installation via <code>pip</code> is the easiest method). Once you have ‘pandas’ installed, you need to <code>import pandas</code> in order to to use <code>read_csv()</code>, as well as a variety of other ‘pandas’ operators that you will encounter later (it is also usually a good idea to <code>import numpy as np</code> at the same time that we <code>import pandas as pd</code>).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note left shadow">
An alternative to writing `pds.xxx` when calling each ‘pandas’ operator is to import all available operators from ‘pandas’ at once: `from pandas import *`. This makes things a bit easier in terms of typing, but can sometimes lead to confusion when operators from different libraries have the same name.
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [1]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>You need do this only once in each session of Python, and on systems such as IPython, the library will sometimes be reloaded automatically (if you get an error message, it’s likely that the ‘pandas’ library has not been installed on your system. Follow the installation instructions provided at the link above.)</p>
<p>Reading in a data table that’s been connected with <code>read_csv()</code> is simply a matter of knowing the name (and location) of the data set. For instance, one data table used in examples in the ‘Statistical Modeling: A Fresh Approach’ book is <code>"swim100m.csv"</code>. To read in this data table and create an object in Python that contains the data, use a command like this:</p>
<p><span class="dataset shadow"><i class="icon-flag" style="font-size: 1.5em;"></i> [`swim100m.csv`][link]</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [2]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"http://www.mosaic-web.org/go/datasets/swim100m.csv"</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The csv part of the name in <code>"swim100m.csv"</code> indicates that the file has been stored in a particular data format, comma-separated values that is handled by spreadsheet software as well as many other kinds of software. The part of this command that requires creativity is choosing a name for the Python object that will hold the data. In the above command it is called <code>swim</code>, but you might prefer another name (e.g., <code>s</code> or <code>sdata</code> or even <code>ralph</code>). Of course, it’s sensible to choose names that are short, easy to type and remember, and remind you what the contents of the object are about.</p>
<p>To help you identify data tables that can be accessed through <code>read_csv()</code>, examples from these tutorials will be marked with a flag <i class="icon-flag"></i> containing the name of the data file. The files themselves are mostly available automatically through the web site for the ‘Statistical Modeling: A Fresh Approach’ book.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Data-Frames">Data Frames<a class="anchor-link" href="#Data-Frames">¶</a></h3><p>The type of Python object created by <code>read_csv()</code> is called a data frame and is essentially a tabular layout. To illustrate, here are the first several cases of the <code>swim</code> data frame created by the previous use of <code>read_csv()</code>:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [3]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[3]:</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<div style="max-height:1000px;max-width:1500px;overflow:auto;">
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>year</th>
<th>time</th>
<th>sex</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td> 1905</td>
<td> 65.8</td>
<td> M</td>
</tr>
<tr>
<th>1</th>
<td> 1908</td>
<td> 65.6</td>
<td> M</td>
</tr>
<tr>
<th>2</th>
<td> 1910</td>
<td> 62.8</td>
<td> M</td>
</tr>
<tr>
<th>3</th>
<td> 1912</td>
<td> 61.6</td>
<td> M</td>
</tr>
<tr>
<th>4</th>
<td> 1918</td>
<td> 61.4</td>
<td> M</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Note that the <code>head()</code> function, one of several functions built-into ‘pandas’ data frames, is a function of the Python object (data frame) itself; not from the main ‘pandas’ library.</p>
<p>Data frames, like tabular data generally, involve variables and cases. In ‘pandas’ data frames, each of the variables is given a name. You can refer to the variable by name in a couple of different ways. To see the variable names in a data frame, something you might want to do to remind yourself of how names a spelled and capitalized, use the <code>columns</code> attribute of the data frame object:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [4]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="o">.</span><span class="n">columns</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[4]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>Index([u'year', u'time', u'sex'], dtype=object)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Note that we have <strong>not</strong> used brackets <code>()</code> in the above command. This is because <code>columns</code> is not a function; it is an <em>attribute</em> of the data frame. Attributes add ‘extra’ information (or metadata) to objects in the form of additional Python objects. In this case, the attributes describe the names (and data types) of the columns. Another way to get quick information about the variables in a data frame is with <code>describe()</code>:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [5]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="o">.</span><span class="n">describe</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[5]:</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<div style="max-height:1000px;max-width:1500px;overflow:auto;">
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>year</th>
<th>time</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td> 62.000000</td>
<td> 62.000000</td>
</tr>
<tr>
<th>mean</th>
<td> 1952.145161</td>
<td> 59.924194</td>
</tr>
<tr>
<th>std</th>
<td> 29.472881</td>
<td> 9.916588</td>
</tr>
<tr>
<th>min</th>
<td> 1905.000000</td>
<td> 47.840000</td>
</tr>
<tr>
<th>25%</th>
<td> 1924.500000</td>
<td> 53.642500</td>
</tr>
<tr>
<th>50%</th>
<td> 1956.500000</td>
<td> 56.880000</td>
</tr>
<tr>
<th>75%</th>
<td> 1975.750000</td>
<td> 65.200000</td>
</tr>
<tr>
<th>max</th>
<td> 2004.000000</td>
<td> 95.000000</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>This provides a numerical summary of each of the variables contained in the data frame. To keep things simple, the output from <code>describe()</code> is itself a data frame.</p>
<p>There are lots of different functions and attributes available for data frames (and any other Python objects). For instance, to see how many cases and variables there are in a data frame, you can use the <code>shape</code> attribute:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [6]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="o">.</span><span class="n">shape</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[6]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>(62, 3)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Variables-in-Data-Frames">Variables in Data Frames<a class="anchor-link" href="#Variables-in-Data-Frames">¶</a></h3><p>Perhaps the most common operation on a data frame is to refer to the values in a single variable. The two ways you will most commonly use involve referring to a variable by string-quoted name (<code>swim["year"]</code>) and as an attribute of a data frame without quotes (<code>swim.year</code>).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note right shadow">
Each column or variable in a ‘pandas’ data frame is called a ‘series’, and each series can contain one of many different data types. For more information on series’, data frames, and other objects in ‘pandas’, [have a look here][intro].
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Most of the statistical modeling functions you will encounter in these tutorials are designed to work with data frames and allow you to refer directly to variables within a data frame. For instance:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [7]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="o">.</span><span class="n">year</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[7]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>1952.1451612903227</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [8]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="p">[</span><span class="s">"year"</span><span class="p">]</span><span class="o">.</span><span class="n">min</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[8]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>1905</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>It is also possible to combine ‘numpy’ operators with ‘pandas’ variables:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [9]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">min</span><span class="p">(</span><span class="n">swim</span><span class="p">[</span><span class="s">"year"</span><span class="p">])</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[9]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>1905</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [10]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">min</span><span class="p">(</span><span class="n">swim</span><span class="o">.</span><span class="n">year</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[10]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>1905</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The <code>swim</code> portion of the above commands tells Python which data frame we want to operate on. Leaving off that argument leads to an error:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [11]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">year</span><span class="o">.</span><span class="n">min</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_subarea output_text output_error">
<pre>
<span class="ansired">---------------------------------------------------------------------------</span>
<span class="ansired">NameError</span> Traceback (most recent call last)
<span class="ansigreen"><ipython-input-11-2ef03df1cde8></span> in <span class="ansicyan"><module></span><span class="ansiblue">()</span>
<span class="ansigreen">----> 1</span><span class="ansiyellow"> </span>year<span class="ansiyellow">.</span>min<span class="ansiyellow">(</span><span class="ansiyellow">)</span><span class="ansiyellow"></span>
<span class="ansired">NameError</span>: name 'year' is not defined</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Of course, you know that the variable year is defined <em>within</em> the data frame <code>swim</code>, but you have to tell Python which data frame you want to operate on explicitly, otherwise it doesn’t know where to find the variable(s). Think of this notation as referring to the variable by both its family name (the data frame’s name,<code>"swim"</code>) and its given name (<code>"year"</code>), something like <code>einstein.albert</code>.</p>
<p>The advantage of referring to variables by name becomes evident when you construct statements that involve more than one variable within a data frame. For instance, here’s a calculation of the mean year, separately for (grouping by) the different sexes:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [12]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="s">'sex'</span><span class="p">)[</span><span class="s">'year'</span><span class="p">]</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[12]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>sex
F 1950.677419
M 1953.612903
Name: year, dtype: float64</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>You will see much more of the <code>groupby</code> function, starting in Tutorial 4 (Group-wise Models). It’s the ‘pandas’ way of grouping or aggregating data frames. In subsequent chapters, we will build on this notion to develop more complex ways of “grouping” and “modeling” variables “by” other variables.</p>
<p>Both the <code>mean()</code> and <code>min()</code> functions have been arranged by the ‘pandas’ library to look in the data frame when interpreting variables, but not all Python functions are designed this way. For instance:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [13]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="o">.</span><span class="n">year</span><span class="o">.</span><span class="n">sqrt</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_subarea output_text output_error">
<pre>
<span class="ansired">---------------------------------------------------------------------------</span>
<span class="ansired">AttributeError</span> Traceback (most recent call last)
<span class="ansigreen"><ipython-input-13-e6382fdf6716></span> in <span class="ansicyan"><module></span><span class="ansiblue">()</span>
<span class="ansigreen">----> 1</span><span class="ansiyellow"> </span>swim<span class="ansiyellow">.</span>year<span class="ansiyellow">.</span>sqrt<span class="ansiyellow">(</span><span class="ansiyellow">)</span><span class="ansiyellow"></span>
<span class="ansired">AttributeError</span>: 'Series' object has no attribute 'sqrt'</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>When you encounter a function that isn’t supported by data frames, you can use ‘numpy’ functions and the special <code>apply</code> function built-into data frames (note that the <code>func</code> argument is optional):</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [14]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="o">.</span><span class="n">year</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="n">func</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span> <span class="c"># There are 62 cases in total</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[14]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>0 43.646306
1 43.680659
2 43.703547
3 43.726422
4 43.794977
Name: year, dtype: float64</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Alternatively, since columns are basically just arrays, we can use built-in numpy functions directly on the columns:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [15]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">swim</span><span class="o">.</span><span class="n">year</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span> <span class="c"># Again, there are 62 cases in total</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[15]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>0 43.646306
1 43.680659
2 43.703547
3 43.726422
4 43.794977
Name: year, dtype: float64</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Adding-a-New-Variable">Adding a New Variable<a class="anchor-link" href="#Adding-a-New-Variable">¶</a></h3><p>Sometimes you will compute a new quantity from the existing variables and want to treat this as a new variable. Adding a new variable to a data frame can be done similarly to <em>accessing</em> a variable. For instance, here is how to create a new variable in <code>swim</code> that holds the <code>time</code> converted from seconds to units of minutes:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [16]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="p">[</span><span class="s">'minutes'</span><span class="p">]</span> <span class="o">=</span> <span class="n">swim</span><span class="o">.</span><span class="n">time</span><span class="o">/</span><span class="mf">60.</span> <span class="c"># or swim['time']/60.</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>By default, columns get inserted at the end. The <code>insert</code> function is available to insert at a particular location in the columns.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [17]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s">'mins'</span><span class="p">,</span> <span class="n">swim</span><span class="o">.</span><span class="n">time</span><span class="o">/</span><span class="mf">60.</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>You could also, if you want, redefine an existing variable, for instance:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [18]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="p">[</span><span class="s">'time'</span><span class="p">]</span> <span class="o">=</span> <span class="n">swim</span><span class="o">.</span><span class="n">time</span><span class="o">/</span><span class="mf">60.</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>As always, we can take a quick look at the results of our operations by using the <code>head()</code> fuction of our data frame:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [19]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">swim</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[19]:</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<div style="max-height:1000px;max-width:1500px;overflow:auto;">
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>year</th>
<th>mins</th>
<th>time</th>
<th>sex</th>
<th>minutes</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td> 1905</td>
<td> 1.096667</td>
<td> 1.096667</td>
<td> M</td>
<td> 1.096667</td>
</tr>
<tr>
<th>1</th>
<td> 1908</td>
<td> 1.093333</td>
<td> 1.093333</td>
<td> M</td>
<td> 1.093333</td>
</tr>
<tr>
<th>2</th>
<td> 1910</td>
<td> 1.046667</td>
<td> 1.046667</td>
<td> M</td>
<td> 1.046667</td>
</tr>
<tr>
<th>3</th>
<td> 1912</td>
<td> 1.026667</td>
<td> 1.026667</td>
<td> M</td>
<td> 1.026667</td>
</tr>
<tr>
<th>4</th>
<td> 1918</td>
<td> 1.023333</td>
<td> 1.023333</td>
<td> M</td>
<td> 1.023333</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Such assignment operations do not change the original file from which the data were read, only the data frame in the current session of Python. This is an advantage, since it means that your data in the data file stay in their original state and therefore won’t be corrupted by operations made during analysis.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Sampling-from-a-Sample-Frame">Sampling from a Sample Frame<a class="anchor-link" href="#Sampling-from-a-Sample-Frame">¶</a></h2><p>Much of statistical analysis is concerned with the consequences of drawing a sample from the population. Ideally, you will have a sampling frame that lists every member of the population from which the sample is to be drawn. With this in hand, you could treat the individual cases in the sampling frame as if they were cards in a deck of hands. To pick your random sample, shuffle the deck and deal out the desired number of cards.</p>
<p>When doing real work in the field, you would use the randomly dealt cards to locate the real-world cases they correspond to. Sometimes in these tutorials, however, in order to let you explore the consequences of sampling, you will select a sample from an existing data set. For example, the <code>"kidsfeet.csv"</code> data set has <code>n=39</code> cases.</p>
<p><span class="dataset shadow"><i class="icon-flag" style="font-size: 1.5em;"></i> [`kidsfeet.csv`][link]</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [20]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">kids</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"http://www.mosaic-web.org/go/datasets/kidsfeet.csv"</span><span class="p">)</span>
<span class="n">kids</span><span class="o">.</span><span class="n">shape</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[20]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>(39, 8)</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>There are a number of procedures to draw a random sample of 5 cases from this data frame. The preferred option however, is to randomly select a subset of case ids (in this case 5) using <code>np.random.choice</code>, and return a subsetted data frame using the <code>ix[]</code> operator.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note left shadow">
The `ix[]` property is a bit tricky to figure out at first. For more information, see [the official docs][selecting].
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [21]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">rows</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">kids</span><span class="o">.</span><span class="n">index</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="k">False</span><span class="p">)</span>
<span class="n">kids</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="n">rows</span><span class="p">]</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[21]:</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<div style="max-height:1000px;max-width:1500px;overflow:auto;">
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name</th>
<th>birthmonth</th>
<th>birthyear</th>
<th>length</th>
<th>width</th>
<th>sex</th>
<th>biggerfoot</th>
<th>domhand</th>
</tr>
</thead>
<tbody>
<tr>
<th>23</th>
<td> Erica</td>
<td> 9</td>
<td> 88</td>
<td> 24.5</td>
<td> 9.0</td>
<td> G</td>
<td> L</td>
<td> R</td>
</tr>
<tr>
<th>16</th>
<td> Caroline</td>
<td> 12</td>
<td> 87</td>
<td> 24.0</td>
<td> 8.7</td>
<td> G</td>
<td> R</td>
<td> L</td>
</tr>
<tr>
<th>4 </th>
<td> Lang</td>
<td> 2</td>
<td> 88</td>
<td> 25.1</td>
<td> 8.9</td>
<td> B</td>
<td> L</td>
<td> R</td>
</tr>
<tr>
<th>32</th>
<td> Leigh</td>
<td> 3</td>
<td> 88</td>
<td> 24.5</td>
<td> 8.6</td>
<td> G</td>
<td> L</td>
<td> R</td>
</tr>
<tr>
<th>7 </th>
<td> Caitlin</td>
<td> 6</td>
<td> 88</td>
<td> 23.0</td>
<td> 8.8</td>
<td> G</td>
<td> L</td>
<td> R</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note right shadow">
To make things a bit more concise, you can `import np.random.choice as choice`, which will allow you to simply use `choice()` without including the library *and* module when typing.
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>This can also be done in a single line:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [22]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">kids</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">kids</span><span class="o">.</span><span class="n">index</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="k">False</span><span class="p">)]</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[22]:</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<div style="max-height:1000px;max-width:1500px;overflow:auto;">
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name</th>
<th>birthmonth</th>
<th>birthyear</th>
<th>length</th>
<th>width</th>
<th>sex</th>
<th>biggerfoot</th>
<th>domhand</th>
</tr>
</thead>
<tbody>
<tr>
<th>19</th>
<td> Heather</td>
<td> 3</td>
<td> 88</td>
<td> 25.5</td>
<td> 9.5</td>
<td> G</td>
<td> R</td>
<td> R</td>
</tr>
<tr>
<th>4 </th>
<td> Lang</td>
<td> 2</td>
<td> 88</td>
<td> 25.1</td>
<td> 8.9</td>
<td> B</td>
<td> L</td>
<td> R</td>
</tr>
<tr>
<th>3 </th>
<td> Josh</td>
<td> 1</td>
<td> 88</td>
<td> 25.2</td>
<td> 9.8</td>
<td> B</td>
<td> L</td>
<td> R</td>
</tr>
<tr>
<th>31</th>
<td> Caitlin</td>
<td> 7</td>
<td> 88</td>
<td> 22.5</td>
<td> 8.6</td>
<td> G</td>
<td> R</td>
<td> R</td>
</tr>
<tr>
<th>7 </th>
<td> Caitlin</td>
<td> 6</td>
<td> 88</td>
<td> 23.0</td>
<td> 8.8</td>
<td> G</td>
<td> L</td>
<td> R</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The results returned by the above methods will never contain the same case more than once (because we told the function <em>not</em> to sample with replacement), just as if you were dealing cards from a shuffled deck. In contrast, ‘re-sampling with replacement’ replaces each case after it is dealt so that it can appear more than once in the result. You wouldn’t want to do this to select from a sampling frame, but it turns out that there are valuable statistical uses for this sort of sampling with <strong>replacement</strong>. You’ll make use of re-sampling in Tutorial 5 (Confidence Intervals).</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [23]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1237</span><span class="p">)</span> <span class="c"># Set seed so results are reproducible</span>
<span class="n">kids</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">kids</span><span class="o">.</span><span class="n">index</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="k">True</span><span class="p">)]</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[23]:</div>
<div class="output_html rendered_html output_subarea output_execute_result">
<div style="max-height:1000px;max-width:1500px;overflow:auto;">
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>name</th>
<th>birthmonth</th>
<th>birthyear</th>
<th>length</th>
<th>width</th>
<th>sex</th>
<th>biggerfoot</th>
<th>domhand</th>
</tr>
</thead>
<tbody>
<tr>
<th>11</th>
<td> Ray</td>
<td> 3</td>
<td> 88</td>
<td> 24.8</td>
<td> 8.9</td>
<td> B</td>
<td> L</td>
<td> R</td>
</tr>
<tr>
<th>25</th>
<td> Glen</td>
<td> 7</td>
<td> 88</td>
<td> 27.1</td>
<td> 9.4</td>
<td> B</td>
<td> L</td>
<td> R</td>
</tr>
<tr>
<th>36</th>
<td> Teshanna</td>
<td> 3</td>
<td> 88</td>
<td> 26.0</td>
<td> 9.0</td>
<td> G</td>
<td> L</td>
<td> R</td>
</tr>
<tr>
<th>7 </th>
<td> Caitlin</td>
<td> 6</td>
<td> 88</td>
<td> 23.0</td>
<td> 8.8</td>
<td> G</td>
<td> L</td>
<td> R</td>
</tr>
<tr>
<th>25</th>
<td> Glen</td>
<td> 7</td>
<td> 88</td>
<td> 27.1</td>
<td> 9.4</td>
<td> B</td>
<td> L</td>
<td> R</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Notice that ‘Glen’ was sampled twice.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Reference">Reference<a class="anchor-link" href="#Reference">¶</a></h3><p>As with all ‘Statistical Modeling: A Fresh Approach for Python’ tutorials, this tutorial is based directly on material from <a href="http://www.mosaic-web.org/go/StatisticalModeling/">‘Statistical Modeling: A Fresh Approach (2nd Edition)’</a> by <a href="http://www.macalester.edu/~kaplan/">Daniel Kaplan</a>. This tutorial is based on Chapter 2: Data: Cases, Variables, Samples.</p>
<p>I have made an effort to keep the text and explanations consistent between the original (R-based) version and the Python tutorials, in order to keep things comparable. With that in mind, any errors, omissions, and/or differences between the two versions are mine, and any questions, comments, and/or concerns should be <a href="mailto:carson.farmer@gmail.com">directed to me</a>.</p>
</div>
</div>
</div></p>A Fresh Approach using Python: Introduction2013-11-01T12:00:00-04:00cfarmertag:carsonfarmer.com,2013-11-01:2013/11/statistical-modeling-python-introduction/<p>Welcome to the first in a series of tutorials on using Python for introductory
statistical analysis. As I put more of these tutorials online, you should be
able to access them easily by <a href="http://www.carsonfarmer.com/category/statistical-modeling-for-python.html">clicking</a> or searching for the relevant
category: “Statistical Modeling for Python”.</p>
<p>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>This series of tutorials is based on the ‘Computational Technique’ sections of each chapter from <a href="http://www.mosaic-web.org/go/StatisticalModeling/">‘Statistical Modeling: A Fresh Approach (2nd Edition)’</a>. The goal of this series of tutorials is to show how all of the R analysis and commands used in the book can be done just as easily using the <a href="http://python.org/">Python</a> programming language. This has the dual goal of introducing ‘Scientific Python’ to students learning statistics, as well as showcasing the recent advances in statistical computing that have been introduced to Python in recent years. Each tutorial in the series will cover the Computational Technique section of a different chapter from the book, starting with Section 1.4.3 from the introduction (which technically <em>isn’t</em> a Computational Technique section, but is a useful introduction none-the-less), which is <a href="http://www.mosaic-web.org/go/StatisticalModeling/Chapters/">available online here</a>. Note that many of these tutorials will require you to have read the corresponding chapter(s) from the book in order to be useful.</p>
<p>All the tutorials assume that Python is installed and running on your computer. To use the notebooks <a href="https://github.com/cfarmer/stat-mod-fresh-approach-python">associated with these tutorials</a>, you’ll also need to have <a href="http://ipython.org/">IPython</a> (with notebook) installed. There are plenty of resources online with information on <a href="http://ipython.org/">IPython</a>, <a href="http://ipython.org/notebook.html">Notebooks</a>, and <a href="http://ipython.org/install.html">installation instructions</a>) for both. You’ll also need to have the <a href="http://www.scipy.org/">Scientific Python</a> libraries installed for additional statistical functionality (we’ll also introduce some <a href="http://statsmodels.sourceforge.net/">other statistical libraries</a> in later tutorials), and <a href="http://matplotlib.org/">matplotlib</a> for plotting and visualization.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="The-IPython-Command-Console">The IPython Command Console<a class="anchor-link" href="#The-IPython-Command-Console">¶</a></h2><p>Once you have IPython installed, you are ready to perform all sorts of statistical and other mathematical and scientific operations. However, to use the powerful range of tools, functions, commands, and statistical methods available in Python, you first need to learn a little bit about the syntax and meaning of Python commands. Once you have learned this, operations become simple to perform. Before staring this tutorial, read section 1.4 from ‘Statistical Modeling: A Fresh Approach’, which outlines some general concepts surrounding computational statistics (in the context of R). In particular, it provides some explanation of a ‘language-based approach’ to statistical computing - which is an important concept throughout the book and this series of tutorials.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Invoking-an-Operation">Invoking an Operation<a class="anchor-link" href="#Invoking-an-Operation">¶</a></h2><p>People often think of computers as <em>doing things</em>: sending email, playing music, storing files. Your job in using a computer is to tell the computer <em>what</em> to do. There are many different words used to refer to the “what”: a procedure, a task, a function, a routine, and so on. Like in the book, I’ll use the word <strong>computation</strong>. Admittedly, this is a bit circular, but it is easy to remember: computers perform computations.</p>
<p>Complex computations are built up from simpler computations. This may seem obvious, but it is a powerful idea. An <strong>algorithm</strong> is just a description of a computation in terms of other computations that you already know how to perform. To help distinguish between the computation as a whole and the simpler parts, it is helpful to introduce a new word: an <strong>operator</strong> performs a computation.</p>
<p>It’s helpful to think of the computation carried out by an operator as involving four parts:</p>
<ol>
<li>The name of the operator</li>
<li>The input arguments</li>
<li>The output value</li>
<li>Side effects</li>
</ol>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>A typical operation takes one or more <strong>input arguments</strong> and uses the information in these to produce an <strong>output value</strong>. Along the way, the computer might take some action: display a graph, store a file, make a sound, etc. These actions are called <strong>side effects</strong>.</p>
<p>Because R is a programing language designed specifically for statistical analysis, many of the ‘base’ commands (such as <code>sqrt()</code>) are available ‘out-of-the-box’. However, since Python is a more general-purpose programming language, we usually need to <code>import</code> statistical commands (think of this as adding words to a language) before we can use them. For Scientific Python, the most important library that we need is <code>numpy</code> (Numerical Python), which can be loaded like this:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [1]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note left shadow">
It is best to ensure you are using the latest version of `numpy`, which is available [from here][numpy].
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>To tell the computer to perform a computation - call this <strong>invoking an operation</strong> or giving a <strong>command</strong> - you need to provide the name and the input arguments in a specific format. The computer then returns the output value. For example, the command <code>np.sqrt(25)</code> invokes the square root operator (named <code>sqrt</code> from the <code>numpy</code> library) on the argument <code>25</code>. The output from the computation will, of course, we <code>5</code>.</p>
<p>The syntax of invoking an operation consists of the operator’s name, followed by round parentheses. The input arguments go inside the parentheses.</p>
<p>The software program that you use to invoke operators is called an <strong>interpreter</strong> (the interpreter is the program you are running when you start Python). You enter your commands as a ‘dialog’ between you and the interpreter (just like when converting between any two languages!). Commands can be entered as part of a script (a text file with a list of commands to perform) or directly at a ‘command prompt’:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [2]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">25</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[2]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>5.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>In the above situation, the ‘prompt’ is <code>In [ ]:</code>, and the ‘command’ is <code>np.sqrt(25)</code>. When you press ‘Enter’, the interpreter reads your command and performs the computation. For commands such as the one above, the interpreter will print the output value from the computation:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [3]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">25</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[3]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>5.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>In the above example, the ‘output marker’ is <code>Out[ ]:</code>, and the output value is <code>5.0</code>. The dialog continues as the interpreter prints another prompt and waits for your further command.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Often, operations involve more than one argument. The various arguments are separated by commas. For example, here is an operation named <code>arange</code> from the <code>numpy</code> library that produces a range of numbers (increasing values between 3 and 10):</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [4]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[4]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>array([3, 4, 5, 6, 7, 8, 9])</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The first argument tells where to start the range and the second tells where to end it. The order of the arguments is important. For instance, <em>here</em> is the range produced when 10 is the first argument, 3 is the second, and the third is -1 (decreasing values between 10 and 3):</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [5]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[5]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>array([10, 9, 8, 7, 6, 5, 4])</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>For some operators, particularly those that have many input arguments, some of the arguments can be referred to by name rather than position. This is particularly useful when the named argument has a sensible default value. For example, the <code>arange</code> operator from the <code>numpy</code> library can be instructed what type of output values to produce (integers, floats, etc). This is accomplished using an argument named <code>dtype</code>:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [6]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="s">'float'</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[6]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>array([ 10., 9., 8., 7., 6., 5., 4.])</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Note that all the values in the range now have decimal places. Depending on the circumstances, all four parts of an operation need not be present. For example, the <code>ctime</code> operation from the <code>time</code> library returns the current time and date; no input arguments are required:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [7]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="kn">import</span> <span class="nn">time</span>
<span class="n">time</span><span class="o">.</span><span class="n">ctime</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[7]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>'Mon Sep 23 15:58:25 2013'</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>In the above example, we first imported the <code>time</code> library, which provides a series of commands that help us work with dates and times. Next, even though there are no arguments, the parentheses are still used when calling the <code>ctime</code> command. Think of the pair of parentheses as meaning, ‘<em>do this</em>‘.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Naming-and-Storing-Values">Naming and Storing Values<a class="anchor-link" href="#Naming-and-Storing-Values">¶</a></h3><p>Often the value returned by an operation will be used later on. Values can be stored for later use with the <strong>assignment operator</strong>. This has a different syntax that reminds the user that a value is being stored. Here’s an example of a simple assignment:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [8]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">x</span> <span class="o">=</span> <span class="mi">16</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The command has stored the value 16 under the name <code>x</code>. The syntax is always the same: an equal sign (=) with a name on the left side and a value on the right.
Such stored values are called <strong>objects</strong>. Making an assignment to an object defines the object. Once an object has been defined, it can be referred to and used in later computations. Notice that an assignment operation does not return a value or display a value. Its sole purpose is to have the side effects of defining the object and thereby storing a value under the object’s name.</p>
<p>To refer to the value stored in the object, just use the object’s name itself. For instance:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [9]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">x</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[9]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>16</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Doing a computation on the value store in an object is much the same (and provides and extremely rich syntax for performing complex calculations):</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [10]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[10]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>4.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>You can create as many objects as you like and give them names that remind you of their purpose. Some examples: <code>wilma</code>, <code>ages</code>, <code>temp</code>, <code>dog_houses</code>, <code>foo3</code>. There <em>are</em> some general rules for object names:</p>
<ul>
<li>Use only letters and numbers and ‘underscores’ (_)</li>
<li>Do <span class="caps">NOT</span> use spaces anywhere in the name (Python won’t let you)</li>
<li>A number cannot be the first character in the name</li>
<li>Capital letters are treated as distinct from lower-case letters (i.e., Python is <em>case-sensitive</em>)<ul>
<li>the objects named <code>wilma</code>, <code>Wilma</code>, and <code>WILMA</code> are all different</li>
</ul>
</li>
<li>If possible, use an ‘underscore’ between words (i.e., <code>my_object</code>)</li>
</ul>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>For the sake of readability, keep object names short. But if you really must have an object named something like <code>ages_of_children_from_the _clinical_trial</code>, feel free (it’s just more typing for you later!).</p>
<p>Objects can store all sorts of things, for example a range of numbers:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [11]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">7</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>When you assign a new value to an existing object, as just done to <code>x</code> above, the former values of that object is erased from the computer memory. The former value of <code>x</code> was 16, but after the new assignment above, it is:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [12]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">x</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[12]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>array([1, 2, 3, 4, 5, 6])</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The value of an object is changed only via the assignment operator. Using an object in a computation does not change the value. For example, suppose you invoke the square-root operator on <code>x</code>:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [13]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[13]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>array([ 1. , 1.41421356, 1.73205081, 2. , 2.23606798,
2.44948974])</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The square roots have been returned as a value, but this doesn’t change the value of <code>x</code>:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [14]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">x</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[14]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>array([1, 2, 3, 4, 5, 6])</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note right shadow" markdown="1">
An assignment command like <code>x=np.sqrt(x)</code> can be confusing to people who are used to algebraic notation. In algebra, the equal sign describes a relationship between the left and right sides. So, $x = \sqrt{x}$ tells us about how the quantity $x$ and the quantity $\sqrt{x}$ are related. Students are usually trained to ‘solve’ such a relationship, going through a series of algebraic steps to find values for $x$ that are consistent with the mathematical statement (for $x = \sqrt{x}$, the solutions are $x = 0$ and $x = 1$). In contrast, the assignment command <code>x = np.sqrt(x)</code> is a way of replacing the previous values stored in <code>x</code> with new values that are the square-root of the old ones.
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>If you want to change the value of <code>x</code>, you need to use the assignment operator:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [15]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Connecting-Computations">Connecting Computations<a class="anchor-link" href="#Connecting-Computations">¶</a></h3><p>The brilliant thing about organizing operators in terms of unput arguments and output values is that the output of one operator can be used as an input to another. This lets complicated computations be built out of simpler ones.</p>
<p>For example, suppose you have a list of 10000 voters in a precinct and you want to select a random sample of 20 of them for a survey. The <code>np.arange</code> operator can be used to generate a set of 10000 choices. The <code>np.random.choice</code> operator can then be used to select a subset of these values at random.</p>
<p>One way to connect the computations is by using objects to store the intermediate outputs:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [16]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">choices</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">choices</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="k">False</span><span class="p">)</span> <span class="c"># sample _without_ replacement</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[16]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>array([7863, 8378, 9128, 3340, 5674, 9055, 6374, 8668, 3768, 6798, 8066,
6443, 5154, 5991, 1535, 3580, 8516, 4872, 8618, 7240])</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>You can also pass the output of an operator <em>directly</em> as an argument to another operator. Here’s another way to accomplish exactly the same thing as the above (note that the values will differ because we are performing a <em>random</em> sample):</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [17]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">10000</span><span class="p">),</span> <span class="mi">20</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="k">False</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[17]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>array([5732, 6833, 7705, 4459, 3131, 3515, 4177, 6312, 2820, 2705, 4580,
9125, 7395, 1927, 728, 4725, 1854, 6147, 4421, 2756])</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Numbers-and-Arithmetic">Numbers and Arithmetic<a class="anchor-link" href="#Numbers-and-Arithmetic">¶</a></h3><p>The <code>Python</code> language has a concise notation for arithmetic that looks very much like the traditional one:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [18]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="mf">7.</span> <span class="o">+</span> <span class="mf">2.</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[18]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>9.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [19]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="mf">3.</span> <span class="o">*</span> <span class="mf">4.</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[19]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>12.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [20]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="mf">5.</span> <span class="o">/</span> <span class="mf">2.</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[20]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>2.5</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [21]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="mf">3.</span> <span class="o">-</span> <span class="mf">8.</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[21]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>-5.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [22]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="o">-</span><span class="mf">3.</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[22]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>-3.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [23]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="mf">5.</span><span class="o">**</span><span class="mf">2.</span> <span class="c"># same as 5^2 (or 5 to the power of 2)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[23]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>25.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Arithmetic operators, like any other operators, can be connected to form more complicated computations. For instance:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [24]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="mf">8.</span> <span class="o">+</span> <span class="mf">4.</span> <span class="o">/</span> <span class="mf">2.</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[24]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>10.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The a human reader, the command <code>8+4/2</code> might seem ambiguous. Is it intended to be <code>(8+4)/2</code> or <code>8+(4/2)</code>? The computer uses unambiguous rules to interpret the expression, but it’s a good idea for you to use parentheses so that you can make sure that what you <em>intend</em> is what the computer carries out:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [25]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="p">(</span><span class="mf">8.</span> <span class="o">+</span> <span class="mf">4.</span><span class="p">)</span> <span class="o">/</span> <span class="mf">2.</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[25]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>6.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Traditional mathematical notations uses superscripts and radicals to indicate exponentials and roots, e.g. $3^2$ or $\sqrt{3}$ or $\sqrt[3]{8}$. This special typography doesn’t work well with an ordinary keyboard, so <code>Python</code> and most other computer languages uses a different notation:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [26]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="mf">3.</span><span class="o">**</span><span class="mf">2.</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[26]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>9.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [27]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mf">3.</span><span class="p">)</span> <span class="c"># or 3.**0.5</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[27]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>1.7320508075688772</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [28]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="mf">8.</span><span class="o">**</span><span class="p">(</span><span class="mf">1.</span><span class="o">/</span><span class="mf">3.</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[28]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>2.0</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>There is a large set of mathematical functions: exponentials, logs, trigonometric and inverse trigonometric functions, etc. Some examples:</p>
<table border="0">
<tr>
<th>Traditional</th>
<th>Python</th>
</tr>
<tr>
<td>$e^2$</td>
<td><code>np.exp(2)</code></td>
</tr>
<tr>
<td>$\log_{e}(100)$</td>
<td><code>np.log(100)</code></td>
</tr>
<tr>
<td>$\log_{10}(100)$</td>
<td><code>np.log10(100)</code></td>
</tr>
<tr>
<td>$\log_{2}(100)$</td>
<td><code>np.log2(100)</code></td>
</tr>
<tr>
<td>$\cos(\frac{\pi}{2})$</td>
<td><code>np.cos(np.pi/2)</code></td>
</tr>
<tr>
<td>$\sin(\frac{\pi}{2})$</td>
<td><code>np.sin(np.pi/2)</code></td>
</tr>
<tr>
<td>$\tan(\frac{\pi}{2})$</td>
<td><code>np.tan(np.pi/2)</code></td>
</tr>
<tr>
<td>$\cos^{-1}(-1)$</td>
<td><code>np.acos(-1)</code></td>
</tr>
</table><p>Numbers can be written in <strong>scientific notation</strong>. For example, the ‘universal gravitational constant’ that describes the gravitational attraction between masses is $6.67428 \times 10^{11}$ (with units meters-cubed per kilogram per second squared). In the computer notation, this would be written as <code>6.67428e-11</code>. The Avogadro constant, which gives the number of atoms in a mole, is $6.02214179 \times 10^{23}$ per mole, or <code>6.02214179e+23</code>.</p>
<p>The computer language does not directly support the recording of units. This is unfortunate, since in the real world numbers often have units and the units matter. For example, in 1999 the Mars Climate Orbiter crashed into Mars because the design engineers specified the engine’s thrust in units of pounds, while the guidance engineers thought the units were newtons.</p>
<p><span class="note right shadow">
There are *some* Python packages for handling units, including [pint], [quantities], [units], [sympy.physics.units], [etc].
</span></p>
<p>Computer arithmetic is accurate and reliable, but it often involves very slight rounding of numbers. Ordinarily, this is not noticeable. However, it can become apparent in some calculations that produce results that are (near) zero. For example, mathematically, $sin(\pi) = 0$, however, the computer does not duplicate the mathematical relationship exactly:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [29]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">pi</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[29]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>1.2246467991473532e-16</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Whether a number like this is properly interpreted as ‘close to zero’ depends on the context and, for quantities that have units, on the units themselves. For instance, the unit ‘parsec’ is used in astronomy in reporting distances between stars. The closest start to the Sun is Proxima, at a distance of 1.3 parsecs. A distance of $1.22 \times 10^{-16}$ parsecs is tiny in astronomy but translates to about 2.5 meters - not so small on the human scale. In statistics, many calculations relate to probabilities which are always in the range 0 to 1. On this scale, <code>1.22e-16</code> is very close to zero.</p>
<p>There are several ‘special’ numbers in the <code>Python</code> world; two of which are <code>inf</code>, which stands for $\infty$ (infinity), and <code>nan</code>, which stands for ‘not a number’ (nan results when a numerical operation isn’t define), for instance:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note right shadow">
Mathematically oriented readers will wonder why <code>Python</code> should have any trouble with a computation like $\sqrt{-9}$; the result is the imaginary number $3\jmath$ (imaginary numbers may be represented by a $\jmath$ or a $\imath$, depending on the field). <code>Python</code> works with complex numbers, but you have to explicitly tell the system that this is what you want to do. To calculate $\sqrt{-9}$ for example, simply use <code>np.sqrt(-9+0j)</code>.
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [30]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">float64</span><span class="p">(</span><span class="mf">1.</span><span class="p">)</span> <span class="o">/</span> <span class="mf">0.</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_subarea output_stream output_stderr output_text">
<pre>-c:1: RuntimeWarning: divide by zero encountered in double_scalars
</pre>
</div>
</div>
<div class="output_area"><div class="prompt output_prompt">Out[30]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>inf</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [31]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">np</span><span class="o">.</span><span class="n">float64</span><span class="p">(</span><span class="mf">0.</span><span class="p">)</span> <span class="o">/</span> <span class="mf">0.</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_subarea output_stream output_stderr output_text">
<pre>-c:1: RuntimeWarning: invalid value encountered in double_scalars
</pre>
</div>
</div>
<div class="output_area"><div class="prompt output_prompt">Out[31]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>nan</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Types-of-Objects">Types of Objects<a class="anchor-link" href="#Types-of-Objects">¶</a></h3><p>Most of the examples used so far have dealt with numbers. But computers work with other kinds of information as well: text, photographs, sounds, sets of data, and so on. The word <strong>type</strong> is used to refer to the <em>kind</em> of information. Modern computer languages support a great variety of types. It’s important to know about the types of data because operators expect their input arguments to be of specific types. When you use the wrong type of input, the computer might not be able to process your command.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><span class="note left shadow">
In Python, data frames are not ‘built in’ as part of the basic language, but the excellent [‘pandas’][pandas] library provides data frames and a whole slew of other functionality to researchers doing data analysis with Python. We will be learning more about ‘pandas’ in future tutorials.
</span></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>For the purposes of starting computational statistics with Python, it’s important to distinguish among three basic types:</p>
<ul>
<li><strong>numeric</strong> positive and negative numbers, decimal and fractional numbers (<code>floats</code>), and whole numbers (<code>integers</code>) - numbers of the sort encountered so far</li>
<li><strong>data frames</strong> collections of data (more or less) in the form of a spreadsheet table - the ‘Computational Technique’ section from chapter 2 will introduce data frames and the operators and libraries for working with data frames</li>
<li><strong>strings</strong> textual data - you indicate string data to the computer by enclosing the text in quotation marks (e.g., <code>name = "python"</code>)</li>
</ul>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h4 id="A-Note-on-Strings">A Note on Strings<a class="anchor-link" href="#A-Note-on-Strings">¶</a></h4><p>There is something a bit subtle going on in the previous command involving the string <code>"python"</code>, so look at it carefully. The purpose of the command is to create a new object, called <code>name</code>, which stores a little bit of text data. Notice that the name of the object is not put in quotes, but the text characters are.</p>
<p>Whenever you refer to an object name, make sure that you don’t use quotes. For example, in the following, we are first assigning the string <code>"python"</code> to the <code>name</code> object, and then returning (and printing automatically) the <code>name</code> object.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [32]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="s">"python"</span>
<span class="n">name</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[32]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>'python'</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>If you make a command with the object name in quotes, it won’t be treated as referring to an object. Instead, it will merely mean the text itself:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [33]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="s">"name"</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt output_prompt">Out[33]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>'name'</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Similarly, if you omit the quotation marks from around the text, the computer will treat it as if it were an object name and will look for the object of that name. For instance, the following command directs the computer to look up the value contained in an object named <code>python</code> and insert that value into the object <code>name</code>:</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [34]:</div>
<div class="inner_cell">
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span class="n">name</span> <span class="o">=</span> <span class="n">python</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area"><div class="prompt"></div>
<div class="output_subarea output_text output_error">
<pre>
<span class="ansired">---------------------------------------------------------------------------</span>
<span class="ansired">NameError</span> Traceback (most recent call last)
<span class="ansigreen"><ipython-input-34-43a69bc65ba8></span> in <span class="ansicyan"><module></span><span class="ansiblue">()</span>
<span class="ansigreen">----> 1</span><span class="ansiyellow"> </span>name <span class="ansiyellow">=</span> python<span class="ansiyellow"></span>
<span class="ansired">NameError</span>: name 'python' is not defined</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>As it happens, there was no object named <code>python</code> because it had not been defined by any previous assignment command. So, the computer generated an error. For the most part, you will not need to use vary many operators on text data when doing statistical analyses; you just need to remember to include text, such as file names, in quotation marks, <code>"like this"</code>.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Reference">Reference<a class="anchor-link" href="#Reference">¶</a></h3><p>As with all ‘Statistical Modeling: A Fresh Approach for Python’ tutorials, this tutorial is based directly on material from <a href="http://www.mosaic-web.org/go/StatisticalModeling/">‘Statistical Modeling: A Fresh Approach (2nd Edition)’</a> by <a href="http://www.macalester.edu/~kaplan/">Daniel Kaplan</a>. This tutorial is based on Chapter 1: Introduction.</p>
<p>Another useful source of information for statistics in Python is this <a href="http://work.thaslwanter.at/Stats/html/">introduction to statistics</a> web-book by Thomas Haslwanter which is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.</p>
<p>I have made an effort to keep the text and explanations consistent between the original (R-based) version and the Python tutorials, in order to keep things comparable. With that in mind, any errors, omissions, and/or differences between the two versions are mine, and any questions, comments, and/or concerns should be <a href="mailto:carson.farmer@gmail.com">directed to me</a>.</p>
</div>
</div>
</div></p>ftools is dead… long live ftools!2013-10-15T09:40:00-04:00cfarmertag:carsonfarmer.com,2013-10-15:2013/10/ftools-is-no-more/<p>I recently decided to drop <code>ftools.ca</code>, since I hadn’t updated it in a very long
time, and it was really just costing me money to keep a ‘dead’ website up and
running. Additionally, with the new <span class="caps">QGIS</span> <a href="http://plugins.qgis.org/">plugin infrastructure</a>,
hosting my own plugins (the website’s primary purpose) was no longer needed.
The site has served me well for many years, and really helped get fTools (the
plugin) into the <span class="caps">QGIS</span> core codebase. The website has served its purpose, and
now that I have very little involvement with fTools and the <span class="caps">QGIS</span> Processing
Toolbox that is poised to replace it, I’m moving on: <code>ftools.ca</code> is dead, long
live <code>ftools.ca</code>!</p>
<p>However, now that <a href="http://www.qgis.org/en/docs/index.html#documentation-for-qgis-2-0"><span class="caps">QGIS</span> 2.0</a> has rolled out, it seems that at least
one part of ftools.ca is missed: my old cartogram plugin. If I
have some spare time, I’ll try to update the plugin to the latest and
greatest <a href="http://www.qgis.org/en/docs/pyqgis_developer_cookbook/plugins.html"><span class="caps">QGIS</span> 2.0 standards</a> and upload it to the new <span class="caps">QGIS</span> plugins
system. In the mean time, for those out there who would like to use it right
away, you can get the <a href="http://carsonfarmer.com/uploads/cartogram.zip">original code from here</a> or <a href="https://github.com/cfarmer/cartogram-plugin">grab it from
github</a>. In fact, if someone is able and willing, they can grab
the code from github, update it for <span class="caps">QGIS</span> 2.0, and submit a pull request which I
will (more than likely) happily accept.</p>
<!--more-->Nathan Storey Guest Lecture2013-10-10T12:41:00-04:00cfarmertag:carsonfarmer.com,2013-10-10:2013/10/guest-speaker-nathan-storey/<p><a href="https://twitter.com/npstorey">Nathan Storey</a> will be speaking to my spatial data analysis class
later today, October 10th, 2013, and you are invited to attend! Nathan is a
former <a href="http://www.hunteruap.org/">Hunter Urban Affairs</a> student and current Open Data Guru working
with <a href="http://www.ontodia.com/">Ontodia</a> doing <span class="caps">GIS</span>/open data projects for <span class="caps">NYC</span>. Check out
<a href="http://nyc.pediacities.com/">PediaCities</a>, their platform to curate, organize, and link data
about cities.</p>
<p>This will be a great opportunity to see <span class="caps">GIS</span> data applied to real-world problem
sets here in <span class="caps">NYC</span>. The discussion will go approx 1 hour, beginning at 5:35 in
the large lab of Hunter College, <span class="caps">CUNY</span>, North Building, Room 1090B. Due to the
location, no refreshments will be served, but its a good opportunity to feed
your brain :)</p>
<p>Here is what Nathan (the speaker) has to say about this talk:</p>
<blockquote>
<p>I will be speaking to Carson Farmer’s spatial data analysis class next week
about open data in <span class="caps">NYC</span>, data discovery, and <a href="http://nyc.pediacities.com/">PediaCities</a>, a
data encyclopedia project I’ve been working on for the past 6 months. </p>
</blockquote>
<!--more-->Re-imagining New York Streets2013-10-08T14:15:00-04:00cfarmertag:carsonfarmer.com,2013-10-08:2013/10/reimagine-new-york-streets/<p>Here’s a <a href="http://www.ted.com/talks/janette_sadik_khan_new_york_s_streets_not_so_mean_any_more.html"><span class="caps">TED</span> talk</a> from Janette Sadik-Khan, the New York City Transportation
Commissioner, on how they’ve transformed New York City streets over the past
several years.</p>
<p>I love the <a href="http://citibikenyc.com/">Citi Bike</a> program (especially the data), which she
helped introduce, so it is interesting to hear her talk about how and why the
program was started. There is also an interesting article on Sadik-Khan on the
<span class="caps">TED</span> <a href="http://blog.ted.com/2013/10/08/better-roads-for-bikes-and-walkers-what-cities-inspire-janette-sadik-khan/">blog</a> which provides some additional insight into her and the various
programs she’s helped develop.</p>
<div class="youtube" align="center">
<iframe width="560" height="315"
src="http://embed.ted.com/talks/janette_sadik_khan_new_york_s_streets_not_so_mean_any_more.html"
frameborder="0" scrolling="no" webkitAllowFullScreen mozallowfullscreen allowFullScreen wmode="Opaque">
</iframe>
</div>
<!--more-->Public transportation time warp2013-10-01T13:15:00-04:00cfarmertag:carsonfarmer.com,2013-10-01:2013/10/public-transportation-time-warp/<p>I recently came across two extremely cool videos while preparing lectures for my
transportation geography course. It is pretty cool to see the development of the
regions around the transportation network while the network itself remains
pretty much unchanged. Worth a quick watch!</p>
<p>The first video depicts the London to Brighton Train Journey for three time
periods. In 1953, the <span class="caps">BBC</span> made a point-of-view film from a London to Brighton
train, 30 years later (1983) they did the same trip again, and again
30 years after that (2013).</p>
<div class="youtube" align="center">
<iframe width="640" height="360"
src="//www.youtube.com/embed/tGTwSNPqAqs?wmode=transparent"
frameborder="0" allowfullscreen wmode="Opaque">
</iframe>
</div>
<p>The second video is from the Vancouver SkyTrain, with older footage (1985) from
<span class="caps">BCRTC</span>/Translink, and newer footage (2013) by Celgen Studios. You can get the
original Translink footage <a href="http://www.youtube.com/watch?feature=player_detailpage&v=SV7rwEB_SOA">from here</a>.</p>
<div class="youtube" align="center">
<iframe width="640" height="390"
src="//www.youtube.com/embed/erodkwTS7sQ?wmode=transparent"
frameborder="0"
allowfullscreen wmode="Opaque">
</iframe>
</div>
<!--more-->Maps as Art and Other Experiments2013-09-27T17:25:00-04:00cfarmertag:carsonfarmer.com,2013-09-27:2013/09/maps-art-other-experiments/<p>With the recent (and long anticipated) release of <a href="http://www.qgis.org/">Quantum <span class="caps">GIS</span> 2.0</a>,
there has been a lot of ‘buzz’ in the open source geospatial community about
all the cool new features that <span class="caps">QGIS</span> now boasts, and how far it has come in such
a short time. I was recently inspired by <a href="http://anitagraser.com/2013/09/17/fun-with-data-defined-labels/">such a post by Anita Graser (aka
Underdark)</a> (who is a wonderfully talented cartographer/designer) on
data driven labeling in <span class="caps">QGIS</span>, so I thought I’d throw something together on a
gray Friday afternoon to test it out. I also wanted an excuse to play around
with <a href="http://lab.hakim.se/reveal-js/#/">Reveal.js</a> slides in <a href="http://ipython.org/notebook.html">IPython notebook</a>, so I produced
the following <a href="http://www.carsonfarmer.com/examples/map_art/">slide show</a> using the images from <span class="caps">QGIS</span> and some
IPython magic:
<iframe src=http://www.carsonfarmer.com/examples/map_art/ width=700 height=500></iframe></p>
<!--more-->Early Stage Researcher Position in GeoInformatics2013-09-23T11:57:00-04:00cfarmertag:carsonfarmer.com,2013-09-23:2013/09/researcher-position-geoInformatics/<p>This is just a quick note about a great opportunity for early career researchers
interested in the field of <a href="http://en.wikipedia.org/wiki/Geoinformatics">geoinformatics</a>. The <a href="http://www.st-andrews.ac.uk/geoinformatics/">Centre for GeoInformatics</a> at the
<a href="http://www.st-andrews.ac.uk/">University of St Andrews</a> in Scotland has two new early career researcher positions
available to start right away. These are really great opportunities for someone
in the first 4-years (full-time equivalent) of their research careers who has not
yet have been awarded a doctoral degree. This is also especially good for foreign
students, as the <a href="http://ec.europa.eu/research/mariecurieactions/index_en.htm">Marie Curie</a> regulations require that candidates must not have
resided or carried out his/her main activity in the <span class="caps">UK</span> for more than 12 months
in the past 3 years.</p>
<p>For more information, check out <a href="http://www.st-andrews.ac.uk/geoinformatics/jobs-and-studentship/">these links</a> on the Centre for GeoInformatics <a href="http://www.st-andrews.ac.uk/geoinformatics/">website</a>.</p>Essential Python Geospatial Libraries2013-07-12T17:25:00-04:00cfarmertag:carsonfarmer.com,2013-07-12:2013/07/essential-python-geo-libraries/<p>Just so I don’t forget, here is a list of really awesome Python libraries that
I’m using these days to do lots of fun things with spatial data [<span class="caps">UPDATE</span>: I’ve
added a few more]:</p>
<ul>
<li><a href="http://pandas.pydata.org/">pandas</a> - For data handling and munging</li>
<li><a href="https://pypi.python.org/pypi/Shapely">shapely</a> - For geometry handling</li>
<li><a href="http://scitools.org.uk/cartopy/">cartopy</a> - For plotting spatial data</li>
<li><a href="http://toblerity.github.io/rtree/">rtree</a> - For efficiently querying spatial data</li>
<li><a href="http://www.cityinabottle.org/nodebox/">nodebox-opengl</a> - For playing around with animations</li>
<li><a href="http://statsmodels.sourceforge.net/">statsmodels</a> - For models and stats in Python (otherwise I’d use R)</li>
<li><a href="http://www.numpy.org/">numpy</a> - For pretty much anything that involves arrays</li>
<li><a href="https://code.google.com/p/geopy/">geopy</a> - For geolocating and things like that</li>
<li><a href="http://ipython.org/">ipython</a> - For a wondering interactive environment in which to play</li>
<li><a href="https://code.google.com/p/freetype-py/">freetype-py</a> - For converting font glyphs to polygons (odd I know…)</li>
<li><a href="https://pypi.python.org/pypi/GDAL/">ogr/gdal</a> - For reading, writing, and transforming geospatial data formats</li>
<li><a href="http://www.qgis.org/pyqgis-cookbook/">pyqgis</a> - For anything and everything <span class="caps">GIS</span></li>
<li><a href="http://toblerity.github.io/fiona/">fiona</a> - For making it <em>easy</em> to read/write geospatial data formats</li>
<li><a href="http://matplotlib.org/">matplotlib</a> - For all my plotting needs</li>
<li><a href="http://networkx.github.io/">networkx</a> - For working with networks (duh!)</li>
<li><a href="http://docs.getpelican.com/en/3.2/">pelican</a> - For blogging about all this stuff…</li>
<li><a href="http://pythonhosted.org/PySAL/">pysal</a> - For all your spatial econometrics needs (and more)</li>
<li><a href="https://pypi.python.org/pypi/descartes">descartes</a> - For plotting geometries in matplotlib</li>
</ul>
<p>Based on Twitter and some of the comments below, I should also add:</p>
<ul>
<li><a href="http://geographiclib.sourceforge.net/">geographiclib</a> - For solving geodesic problems</li>
<li><a href="https://code.google.com/p/pyshp/">pyshp</a> - For reading and writing shapefiles (in <em>pure</em> Python)</li>
<li><a href="https://code.google.com/p/pyproj/">pyproj</a> - For conversions between projections</li>
</ul>
<p>Any others I’ve missed?</p>
<!--more-->New York City Panel on Climate Change2013-06-12T15:55:00-04:00cfarmertag:carsonfarmer.com,2013-06-12:2013/06/nyc-panel-climate-change/<p>This was recently announced on the <a href="http://www.geo.hunter.cuny.edu/">Hunter Geography Department</a> website:</p>
<blockquote>
<h3><span class="caps">CISC</span> Produced Map Spearhead of Mayor Bloomberg’s <span class="caps">SIRR</span> Announcement</h3>
<p>The <a href="http://www.cunysustainablecities.org/climate-projections-future-flood-risk-maps-inform-a-stronger-resilient-york/">New York City Panel on Climate Change (<span class="caps">NPCC2</span>) Climate Risk Information
2013 Report</a> was released on June 11, 2013 in conjunction with the
release of the <span class="caps">NYC</span> Special Initiative for Rebuilding and Resiliency’s (<span class="caps">SIRR</span>)
report entitled <a href="http://www.nyc.gov/html/sirr/html/report/report.shtml">“A Stronger, More Resilient New York.”</a> The reports
were released by <span class="caps">NYC</span> Mayor Michael Bloomberg during a
<a href="http://www.nytimes.com/2013/06/12/nyregion/bloomberg-outlines-20-billion-plan-to-protect-city-from-future-storms.html?hp&_r=0">press conference</a> on Tuesday June 11 at the Brooklyn Navy Yard. Work
by the <a href="http://www.cunysustainablecities.org/"><span class="caps">CUNY</span> Institute for Sustainable Cities (<span class="caps">CISC</span>)</a> is heavily
featured in the reports. Hunter College Geography Professor
<a href="http://www.geo.hunter.cuny.edu/people/fac/solecki.html">William Solecki</a> and director of the <span class="caps">CUNY</span> Institute for Sustainable
Cities, is co-chair of the <span class="caps">NPCC2</span> and <a href="http://www.cunysustainablecities.org/lesley-patrick-program-manager/">Lesley Patrick</a>, <span class="caps">CISC</span> program
manager is the <span class="caps">CISC</span> lead of the <span class="caps">NPCC2</span> Technical Team.</p>
<p><a href="http://www.cunysustainablecities.org/climate-projections-future-flood-risk-maps-inform-a-stronger-resilient-york/"><img alt="image1" class="left" src="http://www.geo.hunter.cuny.edu/images/climate_risk_info2013.jpg" title="A Stronger, More Resilient New York" /></a>
<a href="http://www.nyc.gov/html/sirr/html/report/report.shtml"><img alt="image2" src="http://www.geo.hunter.cuny.edu/images/sirr_report_cover.jpg" /></a></p>
</blockquote>New Journal: Spatial Demography2013-06-06T13:11:00-04:00cfarmertag:carsonfarmer.com,2013-06-06:2013/06/new-journal-spatial-demography/<p>I have recently joined the <a href="http://spatialdemography.org/editors-2/editorial-board/">editorial team</a> at <a href="http://spatialdemography.org/">Spatial Demography</a>
— a new journal outlet for demographers and others who use spatial data,
methods, and theory. </p>
<p>A bit more <a href="http://spatialdemography.org/about/">about the journal</a>:</p>
<blockquote>
<p>Spatial Demography [<span class="caps">ISSN</span>: 2164-7070 (online)] focuses on the spatial analysis
of demographic processes. This cross-disciplinary work involves modern
demographic data visualization, enhanced geo-referenced data availability, and
spatial statistics, facilitated through full color graphics, motion video
tools, and a quick time-to-publication. The journal publishes research
articles, essays, research reports, data sources, computing software, teaching
notes, and book reviews on a wide range of topics of interest to the social demographer.</p>
</blockquote>
<!--more-->
<p>Spatial Demography is more than just another new online journal - the editors
strongly promote the use of the <a href="http://spatialdemography.org/?page_id=669">forums</a> as a place to engage with others on
general research topics, as well as discuss specific articles from the journal
and <a href="http://spatialdemography.org/?page_id=42">Online First</a> area. One of the really nice thing about the forums
is that they have a familiar ‘blog-feel’, where users are encourage to post
their thoughts about the work being done by others as well as their own work. </p>
<p>Another thing I really like about Spatial Demography is that the Associate
Editors embrace technology and sharing of ideas in a very open way. For example,
Corey Sparks does a cool column/forum for the journal on <a href="http://spatialdemography.org/introduction-to-software-and-code-forum/">‘Software and Code’</a>:</p>
<blockquote>
<p>The purpose of this forum is to highlight the tools of the trade, our
methodological toolbox, if you will. With so many scientists in so many
disciplines contributing to the area known as “Spatial Demography”, we all
have our old stand by routines, our tricks and our tips for new researchers.</p>
</blockquote>
<p>The ‘Software and Code’ forum will routinely feature ‘how-to guides’ for various
spatial analysis techniques using primarily open source computing applications,
with lots of annotated code to help others learn everything from the basics to
the more advanced spatial statistics methods.</p>
<p>A big thanks to Frank Howell and Jeremy Porter (Editors-in-Chief) for inviting
me on-board. I’m excited to see where this journal goes, and I hope <em>you’ll</em>
check it out as well.</p>Tim Stojanovic Guest Lecture2013-05-28T18:29:00-04:00cfarmertag:carsonfarmer.com,2013-05-28:2013/05/guest-speaker-june-2013/<p><a href="http://www.st-andrews.ac.uk/gsd/people/tas21/">Dr. Tim Stojanovic</a> will be coming to the <a href="http://www.geo.hunter.cuny.edu/">Department of Geography
at Hunter College</a>, <span class="caps">CUNY</span> next week to give a talk entitled: “Analysing
Change in the Human Development of the World’s Oceans”. The talk will be held
on <strong>June 5th, 2013</strong> at <strong>3:00 p.m.</strong> in the Hunter North building, room 1004.
The talk is hosted by the Department of Geography, and the poster is <a href="http://carsonfarmer.com/uploads/stojanovic_human_dev_oceans.pdf">available
here</a>.
<!--more--></p>
<h2>Abstract</h2>
<p><a href="http://carsonfarmer.com/images/global-oceans.png"><img alt="image1" class="right" src="http://carsonfarmer.com/images/global-oceans-400x265.png" /></a></p>
<p>The World’s oceans are a frontline for sustainability, given the expansion of
human activity in coasts and seas, changes brought about by human forcing
factors, and changes in the global ocean-atmosphere system. Making an
assessment of the extent and significance of change is a challenge for science,
given the relative paucity of data and the multiple variables involved. This
paper reports on a study to empirically tests notions of ‘industrialisation’ and
‘colonisation’ in the oceans for the first time. The approach draws on a key
global spatial dataset developed by Halpern et al (2008). The methods include
the combined use of Raster and R to overcome methodological challenges in
analysing large spatial datasets which map the footprint of human activity in
the world’s oceans. The findings show that human activity in the oceans has
increased by multiple factors in the most recent long term wave of economic
development and show distinct spatial patterns of development. Further
refinement of this assessment is required and is likely to be driven by a
recently established <span class="caps">UN</span> Global Assessment of the State of the Marine Environment.</p>
<h2>Speaker bio</h2>
<p><a href="http://www.st-andrews.ac.uk/gsd/people/tas21/"><img alt="image2" class="left" src="http://carsonfarmer.com/images/stojanovic.jpg" title="Dr. Tim Stojanovic, Lecturer in Geography & Sustainable Development, Department of Geography & Sustainable Development, Scottish Oceans Institute, Sustainability Institute, University of St Andrews, UK" /></a></p>
<p>Dr. Tim Stojanovic is a Lecturer (i.e. Associate Professor) in Geography and
Sustainable Development at the University of St Andrews, <span class="caps">UK</span>. He has a PhD and
BSc in Marine Geography from Cardiff University, <span class="caps">UK</span>. Dr Stojanovic’s research
interests relate to the sustainability of oceans and coasts, including work
with interdisciplinary research teams in issues such as climate adaptation and
ecosystem services, and social science research on effective governance.</p>
<p></br>
</br></p>
<p>If you would like to learn more about the talk, the venue, or Dr Stojanovic’s
work, please <a href="http://carsonfarmer.com/contact/">contact me</a> for further details.</p>Making the switch to Pelican2013-05-12T11:16:00-04:00cfarmertag:carsonfarmer.com,2013-05-12:2013/05/making-the-switch-to-pelican/<p>Welcome to the new and improved <code>carsonfarmer.com</code>! If you are reading this, then
you are enjoying my new, responsive static website/blog. The new site is powered
by <a href="http://blog.getpelican.com/">Pelican</a> — a static website generator written in Python — and is hosted
on <a href="https://github.com/">GitHub</a> using <a href="http://pages.github.com/">GitHub Pages</a>. Most of the content on the
site is written in Markdown, which makes it really easy to add headings, anchors,
and all sorts of goodies to simplify writing blog posts and web-pages.
<!--more--></p>
<p>The move from WordPress to Pelican was relatively painless, though there were
some issues with comments and converting (some) existing posts to Markdown. I
also took the opportunity to update the site, change the page structure a bit
and try out a few things like adding icons (<a href="http://fortawesome.github.io/Font-Awesome/">FontAwesome</a>), using
Twitter <a href="http://twitter.github.io/bootstrap/">Bootstrap</a> for some of the <span class="caps">UI</span>, and some other tweaks. To get
me through the process, I took advantage of several blogs and sites dedicated to
documenting the switch to Pelican:</p>
<ul>
<li><a href="http://docs.getpelican.com/">Pelican documentation</a> (which is great)</li>
<li><a href="http://magically.us/2013-02-03/creating-a-pelican-powered-site-on-github-pages.html">Creating A Pelican-Powered Site on GitHub Pages</a></li>
<li><a href="http://www.macdrifter.com/2012/08/pelican-guide-moving-from-wordpress-and-initial-setup.html">Pelican Guide - Moving From WordPress and Initial Setup</a></li>
<li><a href="http://blog.aclark.net/2012/09/21/yes-this-blog-is-now-powered-by-pelican/">Yes, this blog is now powered by Pelican</a></li>
</ul>
<p>Once I get things working, I’ll also start to think about some of the
<a href="http://arunrocks.com/moving-blogs-to-pelican/">points here</a>, to make things even <em>more</em> responsive and readable.</p>
<p><span class="note right shadow">
One of the things that I did have trouble with was getting my <code>RSS</code> feeds set
up like it was in my WordPress site: <code>/?feed=rss2</code>.
For now, I’m just rerouting things to <code>/feeds/all.rss.xml</code>, but search engines
won’t recognize this, and I’m sure there is a better solution out there… any
thoughts?
</span>
I am still missing some things that WordPress did quite nicely, including
comments (I’m now relying on <a href="http://disqus.com/">Disqus</a> for comments), site search (I’ve
started using <a href="http://tapirgo.com/">Tapir</a> for this, and have implemented a cool search tool
that I may turn into a Pelican plug-in if I find some time), and the plethora of
plug-ins and themes available for WordPress sites. Having said that, it <em>is</em>
relatively easy to create new themes, and adding social networking components
like a Twitter feed using standard html is pretty simple.
</br></p>Paper published in Marine Policy2013-03-20T14:52:00-04:00cfarmertag:carsonfarmer.com,2013-03-20:2013/03/marine-policy-paper/<p>An article I worked on with <a href="http://www.st-andrews.ac.uk/gsd/people/tas21/">Dr Tim Stojanovic</a>, “The development of
world oceans <span class="amp">&</span> coasts and concepts of sustainability”, has recently been
<a href="http://www.sciencedirect.com/science/article/pii/S0308597X13000481">published online</a>, with the journal <a href="http://www.sciencedirect.com/science/journal/0308597X">Marine Policy</a>. The article
can be cited as:</p>
<blockquote>
<p>Stojanovic T. <span class="amp">&</span> Farmer <span class="caps">C. J.</span> Q., (2013) The development of world oceans <span class="amp">&</span>
coasts and concepts of sustainability. <em>Marine Policy</em> 42 157-165</p>
</blockquote>
<!--more-->
<p>I am particularly excited about this article because it marks
a major shift from any previous work that I’ve been involved with, and
is aimed directly at the world of policy - something I’ve been thinking
about for a while now. Additionally, the <a href="http://www.nceas.ucsb.edu/globalmarine">data that we used</a> for this
paper provides enumerable avenues for further research and questions, so
hopefully this paper is the first in a series of collaborative works
with Dr. Stojanovic… stay tuned for more to come!</p>
<p>If you would like a copy of the paper but do not have access to the
article online, please <a href="http://carsonfarmer.com/contact/">contact me</a> and I will forward you a <span class="caps">PDF</span>
version. The <span class="caps">DOI</span> for the paper is:
<a href="http://dx.doi.org/10.1016/j.marpol.2013.02.005">http://dx.doi.org/10.1016/j.marpol.2013.02.005</a>.</p>Geocomputational landscapes and spaces2013-03-15T17:15:00-04:00cfarmertag:carsonfarmer.com,2013-03-15:2013/03/geocomp-special-session/<p>Special session titled “Geocomputational landscapes and spaces” at the
<a href="http://www.ammcs2013.wlu.ca/">International Conference on Applied Mathematics, Modeling and
Computational Science</a> (<span class="caps">AMMCS</span>-2013), to be held at <a href="https://www.wlu.ca/">Wilfrid Laurier
University</a>, August 26-30, 2013.</p>
<p>The deadline for abstracts is <strong>April 15, 2013</strong>.</p>
<p>Description of the session:</p>
<blockquote>
<p>This session focuses on new applications and theoretical developments
in geocomputation. Increasingly, spatial analysis and modelling
requires advanced computational techniques (<span class="caps">MCMC</span>, <span class="caps">INLA</span>, GAs etc.) to
address problems related to land and resource management. This session
is devoted to current developments and advances in geocomputation for
understanding landscape patterns, processes, and building spatial
decision-support systems. For example, talks will explore
computational solutions to problems related to multi-objective
optimization of spatial configuration of land-uses, new computational
approaches to spatial model development and assessment and map
comparison analysis. We encourage submissions of additional
presentations related to geocomputation including spatial modelling,
genetic algorithm development, spatial databases, spatial
representation, visualization, and spatial analysis of ‘Big Data’.</p>
</blockquote>
<p>Abstract <a href="http://www.ammcs2013.wlu.ca/submit-abstract.html">submission is here</a>, and be sure to note the <strong>special session</strong>.</p>
<p>Please share with colleagues and feel free to contact the organizers of
this session <a href="mailto:crobertson@wlu.ca">Dr Colin Robertson</a> and <a href="mailto:sroberts@wlu.ca">Dr Steve Roberts</a>.</p>Humans as systems2013-02-14T03:38:00-05:00cfarmertag:carsonfarmer.com,2013-02-14:2013/02/humans-as-systems/<p>From <a href="http://xkcd.com/1173/">xkcd</a> comes another brilliant insight into the human condition:</p>
<blockquote>
<p>A human is a system for converting dust billions of years ago into
dust billions of years from now via a roundabout process which
involves checking email a lot.</p>
</blockquote>
<!--more-->
<p>And since it is Valentines day…</p>
<p><a href="http://xkcd.com/1016/" title="The worst resolution to the Valentine Prisoner's Dilemma when YOU decide not to give your partner a present but your PARTNER decides to testify against you in the armed robbery case."><img alt="image" src="http://imgs.xkcd.com/comics/valentine_dilemma.png" /></a></p>Will it Python?2013-02-12T22:31:00-05:00cfarmertag:carsonfarmer.com,2013-02-12:2013/02/will-it-python/<p>Over the past few weeks, I’ve been following a really great blog by
<a href="http://slendrmeans.wordpress.com/">Carl Vogel</a>. This blog has an excellent (growing) collection of
Python examples based on porting code and examples from R to Python. In
general, it is useful for those “interested in the Python data analysis
toolkit and its viability as an alternative to R”. Carl draws on
examples from <em><a href="http://shop.oreilly.com/product/0636920018483.do">Machine Learning for Hackers</a></em> by <a href="http://www.drewconway.com/">Drew Conway</a> and
<a href="http://www.johnmyleswhite.com/">John Miles White</a>, as well as <a href="http://www.stat.columbia.edu/~gelman/">Gelman</a> and <a href="http://steinhardt.nyu.edu/faculty_bios/view/Jennifer_Hill">Hill’s</a> <em><a href="http://www.stat.columbia.edu/~gelman/arm/">Data
Analysis Using Regression and Multilevel/Hierarchical Models</a></em>.
<!--more--></p>
<p>From the blog:</p>
<blockquote>
<p>The objective [of this blog] isn’t to just make a key that translates
functions and methods in R into Python equivalents. Instead, the goal
is to reproduce the results and insights of the analysis in idiomatic
Python […] Sometimes there will be a direct translation from a line
of R to a line of Python; other times Python will suggest an
altogether different approach to the problem.</p>
</blockquote>
<p>To make things even more useful for us, Carl has made the code,
examples, and <a href="http://ipython.org/">IPython</a> notebooks available in his <a href="https://github.com/carljv/Will_it_Python">Github repo</a>,
which makes it really easy to work through the examples. He’s also open
to requests/suggestions, so if you have a good R resource you’d like to
see ported to Python, maybe give him a shout?</p>
<p>As an extra bonus, I suggest you check out Blendtec’s hilarious <a href="http://www.willitblend.com/">“Will
it blend?”</a> ad campaign, from which Carl derived his “Will in Python”
name and logo…</p>
<p>Happy coding!</p>Science, Canadian Style!2013-01-20T21:55:00-05:00cfarmertag:carsonfarmer.com,2013-01-20:2013/01/science-canadian-style/<p>Recently, a <a href="http://www.wlu.ca/homepage.php?grp_id=12616">colleague of mine</a> from Wilfred Laurier University has
started a website with some colleagues called RinkWatch
(<a href="http://rinkwatch.org/">http://rinkwatch.org/</a>). The website is designed to track
climate-change by keeping tabs on local community ice rink(s). This
innovative use of <a href="http://en.wikipedia.org/wiki/Citizen_science">Citizen Science</a> is getting quite a bit of buzz up
in Canada, but the idea has implications for any country with a culture
of outdoor skating rinks. It is also a great way to get the public
involved in the climate debate, and science in general! RinkWatch
received a mention in the Canadian Association of Geographers’ Mailing
list, which I’m posting here to help spread the good word. You can also
check out the <a href="http://www.cbc.ca/hamilton/news/story/2013/01/18/hamilton-climate-change-rinks.html">full story and video here</a>.</p>
<!--more-->
<blockquote>
<p>The site, created by a
group of geographers at Wilfred Laurier University, invites people to
register online and record the state of their homemade ice surfaces.
Researchers will use the data - when the flooding is done, how many
weeks the ice is useable - to track the progression of climate change.
According to Robert McLeman, one of the project’s creators, it’s also a
chance to educate Canadians about the real-world implications of climate
change, an issue that can seem abstract when discussed on <span class="caps">TV</span> or in the
classroom. “People appreciate the scale of the problem, but don’t
understand personally how it fits in with their life,” McLeman, an
associate professor of geography and environmental science, told <span class="caps">CBC</span>
Hamilton. “We thought this sort of provides an opportunity to connect
people to environmental research literally through their own backyards.”
He said the website, which has already registered 375 rinks across
Canada and the northern U.S., has spawned some unintended, but very
welcome uses. Participants, he said, are sharing ice-making tips with
each other, creating a richer, more interactive online community than
he’d anticipated. “People who are coming to it really just are passionate.”</p>
</blockquote>Postdocs and Lectureships at St Andrews2012-11-08T16:54:00-05:00cfarmertag:carsonfarmer.com,2012-11-08:2012/11/postdocs-lectureships-st-andrews/<p>The <a href="http://www.st-andrews.ac.uk/geoinformatics">Centre for GeoInformatics (<span class="caps">CGI</span>)</a> at the <a href="http://www.st-andrews.ac.uk/">University of St
Andrews</a> in Scotland has the following open positions:</p>
<ol>
<li><a href="http://www.st-andrews.ac.uk/geoinformatics/jobs-and-studentship/lectureship-in-geoinformatics/">Lectureship in Geoinformatics</a>, permanent position, application
<span class="caps">DL</span> 21 Dec 2012</li>
<li><a href="http://www.st-andrews.ac.uk/geoinformatics/jobs-and-studentship/research-fellow-in-geoinformatics/">Postdoctoral researcher in Geoinformatics</a>, preferably Visual
Analytics/Visualization (three year fixed post, application <span class="caps">DL</span> 7 Dec 2012)</li>
<li><a href="http://www.st-andrews.ac.uk/geoinformatics/research-fellow-in-geoinformatics-2/">Postdoctoral researcher in Geoinformatics</a> (two year fixed post,
application <span class="caps">DL</span> 21 Dec 2012)</li>
</ol>
<p>In addition, there are three lectureships open at the Department of
Geography <span class="amp">&</span> Sustainable Development in which <span class="caps">CGI</span> is located:</p>
<ol>
<li><a href="http://www.st-andrews.ac.uk/geoinformatics/jobs-and-studentship/environmental-geography-lectureships/">Two lectureships</a> in Environmental Geography</li>
<li><a href="http://www.st-andrews.ac.uk/geoinformatics/jobs-and-studentship/lecturer-in-human-geography/">Lectureship</a> in Human Geography</li>
</ol>
<!--more-->
<p>I’ve just recently moved on from the <span class="caps">CGI</span> and St Andrews and I
can tell you that it is a great place to work! The people are fun to
work with, there are plenty of opportunities for collaborations and
cross-disciplinary research, there are funding opportunities for
geoinformatics research in the <span class="caps">UK</span> and <span class="caps">EU</span> right now, and you simply can’t
beat the setting - St Andrews is a beautiful coastal town on the East
Coast of Scotland. If you are interested in any of the above positions,
don’t hesitate to contact the <span class="caps">CGI</span>. Additionally, the <span class="caps">CGI</span> would greatly
appreciate if you could please forward this message to any potential
candidates. You can find more information about the above positions,
plus other <a href="http://www.st-andrews.ac.uk/geoinformatics/jobs-and-studentship/">exciting opportunities at <span class="caps">CGI</span></a>.</p>PhD position in GeoInformatics and Housing research2012-10-25T14:37:00-04:00cfarmertag:carsonfarmer.com,2012-10-25:2012/10/phd-position-in-geoinformatics-and-housing-research/<p>The <span class="caps">CGI</span> is looking for a talented PhD student to fill an
interdiciplinary PhD Studentship co-funded by the Economic and Social
Research Council and <span class="caps">HOME</span> Housing Association. Suitable candidates
should apply by <strong>23 November 2012</strong> for a start date of <strong>1 January
2013</strong>. Have a look at the <a href="http://www.st-andrews.ac.uk/geoinformatics/jobs-and-studentship/"><span class="caps">CGI</span> vacancies webpage</a> for more details.
This is a great opportunity for students interested in using and
developing state-of-the-art methods in geoinformatics and applying them
to fundamental questions in the social housing sector.</p>Carson moves to the Big Apple2012-10-16T15:05:00-04:00cfarmertag:carsonfarmer.com,2012-10-16:2012/10/carson-moves-to-the-big-apple/<p><a href="http://carsonfarmer.com/images/postcards.png"><img alt="image" src="http://carsonfarmer.com/images/postcards-300x135.png" /></a></p>
<p>I am very excited to announce that I will soon be moving to New York to
take up a faculty position in the <a href="http://www.geo.hunter.cuny.edu/">Department of Geography</a> at <a href="http://www.hunter.cuny.edu/">Hunter
College</a> of the <a href="http://www.cuny.edu/">City University of New York</a>. The plans have been
in motion for quite a while, and now that my wife and I have started to
pack up our stuff, I thought I’d post something here to make it
‘official’. I start full time in <span class="caps">NY</span> at the end of January, at which
point I will be diving head first into teaching and research. I am
really looking forward to engaging with the active <span class="caps">NY</span> open source
community, teaching a whole slew of new courses, interacting more
directly with students, and working with my new colleagues on all sorts
of cool research projects.</p>
<p>With that in mind, I will also be sad to leave <a href="http://www.st-andrews.ac.uk/">St
Andrews</a>. The past year working in the <a href="http://www.st-andrews.ac.uk/gg/">School of Geography and
Geosciences</a> has been a great experience; I’ve learned a lot about the
inner workings of a research centre by helping to set up the <a href="http://www.st-andrews.ac.uk/geoinformatics/">Centre for
GeoInformatics</a> and I’ve made a lot of great personal and professional
connections. Combined, the past four years in Ireland and the <span class="caps">UK</span> have
been very exciting, and I have had excellent opportunities to make
friends, build collaborative networks, hone my research skills, carve
out my own research niche, and develop a longer-term research program
that should keep me busy for years to come!</p>
<p>I’ll try to continue posting the occasional blog post over the next few
months, and hopefully compile a decent record of my progression from
postdoc to professor (lecturer to my <span class="caps">EU</span> friends) along the way. In the
mean time, feel free to <a href="http://carsonfarmer.com/contact/">get in touch</a> with me if you are ever in the
<span class="caps">NY</span> area!</p>manageR and rpy2 installation problems2012-10-06T14:26:00-04:00cfarmertag:carsonfarmer.com,2012-10-06:2012/10/manager-and-rpy2-installation-problems/<p>Unfortunately, I haven’t had much time recently to update or work on
<code>manageR</code>, but I’m hoping that will change in the next few months…
Having said that, there are quite a few people out there that have been
having trouble installing <code>manageR</code> (and the required <code>rpy2</code>) on their
system to get things working at all! I have had some individuals provide
possible fixes and suggestions on how to get things working properly on
various platforms, and I’m going to use this post to amalgamate them,
and hopefully create a one stop post for all your <code>rpy2</code> and <code>manageR</code>
needs. I’m also hoping that people will post potential fixes in the
comments to help others with more specific problems?
<!--more-->
For now, I have the following potential fix for <code>OSX</code> (Lion 10.7.2 at least):</p>
<ol>
<li>Update <code>R</code> to latest version (<a href="http://www.r-project.org/">binary from r-project.org</a>);</li>
<li>Reinstall <code>QGIS</code> (<a href="http://www.kyngchaos.com/software/qgis">Kyngchaos installer</a>);</li>
<li>Reinstall <code>GDAL_complete</code> (<a href="http://www.kyngchaos.com/software/frameworks">Kyngchaos again</a>);</li>
<li>Reinstall <code>rpy2</code> (latter via <a href="http://rpy.sourceforge.net/rpy2_download.html">pip</a>)</li>
<li>Reboot…</li>
</ol>
<p>Apparently the (potential) problem is actually related to previous <code>R</code>
builds, where symlinks were referring to the wrong location. This
potential fix is courtesy of <a href="http://gis.stackexchange.com/questions/17169/is-it-possible-to-run-an-r-script-on-a-layer-in-qgis" title="Run an R Script on layer in QGIS">this Stackexchange thread</a></p>
<p>Ok, so does anyone else have some suggestions to get things working on
Windows? Perhaps someone out there has a Windows build they’d like to
share? As far as I know (i.e., it works for me), things are working fine
on Linux, but if someone else has a different experience, please share
in the comments.</p>cartogram updates2012-08-10T14:07:00-04:00cfarmertag:carsonfarmer.com,2012-08-10:2012/08/cartogram-updates/<p>It seems my <a href="http://www.carsonfarmer.com/examples/olympic_countries/">Olympic medals cartogram</a> is getting a bit more attention
(<a href="http://www.guardian.co.uk/sport/datablog/interactive/2012/aug/10/olympics-medals-visualised">Guardian data blog</a>, and <a href="http://www.telegraph.co.uk/sport/olympics/olympic_infographics_and_data/9467077/London-2012-Olympics-interactive-world-medal-map-by-population-GDP-and-geographical-size.html">Telegraph data and graphics blog</a>), so
I’ve updated a few things and wanted to highlight/explain them a bit here.</p>
<p><a href="http://www.carsonfarmer.com/examples/olympic_countries/"><img alt="image" src="http://carsonfarmer.com/images/grenada.png" title="Grenada" /></a></p>
<p>Firstly, you can now explore the medal data together with
population <strong>and</strong> <span class="caps">GDP</span> as well as without any warping to get a feel for
how much things change. Secondly, in order to be able to display the map
in a way that is familiar to most people (i.e., landscape style), I had
to take a few liberties in terms of representation. For instance, the
‘projection’ used is not area preserving (in fact, it is just the
geographical coordinates), so some countries appear larger (smaller)
than they should, even with the warping. We all make compromises though
right? Third, because I wasn’t happy with adding artificial medal counts
to make my algorithm happy, I decided to create a more ‘realistic’
graphic, so this time around, countries with zero medals have zero area
(i.e. are removed). Related to this, because Grenada only has about
110,821 people, its <a href="http://www.medalspercapita.com/">per capita medal ranking</a> is off the charts (note
that I’m not using the the <a href="http://www.medalspercapita.com/">http://www.medalspercapita.com/</a> figures for the graphic)! This, coupled with its tiny
size, means that my cartogram algorithm emphasises Grenada at the
expense of all the other countries, making the graphic pretty boring
unless you live in Grenada. As a result, I’ve excluded all countries
with less than two medals in total (I know, I’m sorry, but it had to be
done). Having said that, if you click to view the ‘normal’ map, then all
countries are added back, and you get medal counts for all of them.</p>Cross-browser iframe scaling2012-08-06T18:18:00-04:00cfarmertag:carsonfarmer.com,2012-08-06:2012/08/cross-browser-iframe-scaling/<p>This is just a quick post to document an annoyance (and solution) that
I’ve recently discovered when trying to scale a webpage embedded in
another page using an <code>iframe</code>. When trying to come up with a nice way
to embed <a href="examples/olympic_countries/">this page</a> inside <a href="http://www.st-andrews.ac.uk/geoinformatics/">this page</a>, I found that webkit based
browsers were not behaving as they should. After a lot of fiddling
about, I discovered that the following <code>css</code> seems to fix the issues:</p>
<div class="highlight"><pre><span class="nf">#wrap</span> <span class="p">{</span>
<span class="k">width</span><span class="o">:</span> <span class="m">630px</span><span class="p">;</span>
<span class="k">height</span><span class="o">:</span> <span class="m">300px</span><span class="p">;</span>
<span class="k">padding</span><span class="o">:</span> <span class="m">0</span><span class="p">;</span>
<span class="k">overflow</span><span class="o">:</span> <span class="k">hidden</span><span class="p">;</span>
<span class="p">}</span>
<span class="nf">#frame</span> <span class="p">{</span>
<span class="o">-</span><span class="n">ms</span><span class="o">-</span><span class="n">zoom</span><span class="o">:</span> <span class="m">0</span><span class="o">.</span><span class="m">5</span><span class="p">;</span>
<span class="o">-</span><span class="n">ms</span><span class="o">-</span><span class="n">transform</span><span class="o">-</span><span class="n">origin</span><span class="o">:</span> <span class="m">0</span> <span class="m">0</span><span class="p">;</span>
<span class="o">-</span><span class="n">moz</span><span class="o">-</span><span class="n">transform</span><span class="o">:</span> <span class="n">scale</span><span class="p">(</span><span class="m">0</span><span class="o">.</span><span class="m">5</span><span class="p">);</span>
<span class="o">-</span><span class="n">moz</span><span class="o">-</span><span class="n">transform</span><span class="o">-</span><span class="n">origin</span><span class="o">:</span> <span class="m">0px</span> <span class="m">50px</span><span class="p">;</span>
<span class="o">-</span><span class="n">o</span><span class="o">-</span><span class="n">transform</span><span class="o">:</span> <span class="n">scale</span><span class="p">(</span><span class="m">0</span><span class="o">.</span><span class="m">5</span><span class="p">);</span>
<span class="o">-</span><span class="n">o</span><span class="o">-</span><span class="n">transform</span><span class="o">-</span><span class="n">origin</span><span class="o">:</span> <span class="m">0px</span> <span class="m">50px</span><span class="p">;</span>
<span class="o">-</span><span class="n">webkit</span><span class="o">-</span><span class="n">transform</span><span class="o">:</span> <span class="n">scale</span><span class="p">(</span><span class="m">0</span><span class="o">.</span><span class="m">5</span><span class="p">);</span>
<span class="o">-</span><span class="n">webkit</span><span class="o">-</span><span class="n">transform</span><span class="o">-</span><span class="n">origin</span><span class="o">:</span> <span class="m">0</span> <span class="m">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="nf">#frame</span> <span class="p">{</span>
<span class="k">width</span><span class="o">:</span> <span class="m">1230px</span><span class="p">;</span>
<span class="k">height</span><span class="o">:</span> <span class="m">530px</span><span class="p">;</span>
<span class="k">overflow</span><span class="o">:</span> <span class="k">hidden</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Note that if instead of <code>-ms-zoom</code> you use <code>zoom</code>, webkit browsers seem
to ‘double scale’ everything, which turned out to be the root of my
problem. With the above tweaks, everything works fine (for now) using
the following <code>HTML</code>:</p>
<div class="highlight"><pre><span class="nt"><div</span> <span class="na">id=</span><span class="s">"wrap"</span><span class="nt">></span>
<span class="nt"><iframe</span> <span class="na">id=</span><span class="s">"frame"</span> <span class="na">src=</span><span class="s">"http://www.website.com/"</span><span class="nt">></iframe></span>
<span class="nt"></div></span>
</pre></div>
<p>Hopefully this post will save someone (or me in the future) some
frustration and time. The above fix was cobbled together based on
suggestions from <a href="http://stackoverflow.com/questions/166160/how-can-i-scale-the-content-of-an-iframe">here</a> (see answers from <code>Kip</code>, <code>lxs</code>, and <code>r3cgm</code>).</p>
<p>Carson</p>Olympic cartogram2012-08-02T12:18:00-04:00cfarmertag:carsonfarmer.com,2012-08-02:2012/08/olympic-cartogram/<p>The London 2012 Summer Olympics have generated quite a bit of buzz in
terms of Visualizations and interesting data analysis. In fact, news
sites here in the <span class="caps">UK</span> are doing all sorts of cools things with Olympic
data, and <a href="http://www.guardian.co.uk/">The Guardian</a> has an <a href="http://www.guardian.co.uk/sport/series/london-2012-olympics-data">entire series</a> devoted to Olympic
data. A colleague of mine also pointed out a <a href="http://www.telegraph.co.uk/sport/olympics/9436640/London-2012-Olympics-dynamic-world-medal-map.html">cool graphic</a> on <a href="http://www.telegraph.co.uk/">The
Telegraph</a> website, which is essentially a live cartogram of Olympic
medal counts. The cartogram is basically a spatial bubble plot, with the
size of the bubbles representing the number of medals obtained by each
country. The location of each bubbled is based on the corresponding
country’s approximate geographic location. The graphic is pretty
effective, and it certainly tells a clear story.</p>
<p>I’m a big fan of these types of abstract <a href="http://www.unicef.org/sowc2012/urbanmap/">representations of space</a>,
so I thought The Telegraph’s graphic was pretty fun. Having said that,
I’m always a sucker for a more ‘traditional’ rubber-sheet cartogram,
which is generally less abstract than a bubble plot, but can sometimes
lead to <a href="http://www.worldmapper.org/">dramatic results</a>. Since I felt like the only person on the
internet without their own Olympics Visualization, I decided to throw
together a cartogram to visualize Olympic medal achievements. Drawing
inspiration from The Telegraph graphic, I created a rubber-sheet
cartogram based on an <a href="http://lambert.nico.free.fr/tp/biblio/Dougeniketal1985.pdf">iterative warping method</a>. The ‘live’ version
of the cartogram is <a href="{filename}/examples/olympic_countries/">available here</a> (or by clicking on the image
below). [<span class="caps">UPDATE</span>] If you’d like to include the map on a web page, you can
now do that by including this in your <span class="caps">HTML</span> source:</p>
<div class="highlight"><pre><span class="nt"><iframe</span> <span class="na">src=</span><span class="s">"http://www.carsonfarmer.com/examples/olympic_countries/map.html"</span> <span class="na">width=</span><span class="s">1230</span> <span class="na">height=</span><span class="s">545\</span><span class="nt">></iframe></span>
</pre></div>
<!--more-->
<p><a href="{filename}/examples/olympic_countries/"><img alt="image" src="http://carsonfarmer.com/images/olympic_carto.png" title="Olympic cartogram" /></a></p>
<p>The cartgram is interactive, and was created using <a href="http://www.python.org/">Python</a>
(<a href="http://pandas.pydata.org/">pandas</a>, <a href="https://github.com/esnme/ultrajson">ujson</a>, and <a href="http://toblerity.github.com/shapely/manual.html">shapely</a>) and <a href="http://d3js.org/">D3.js</a>. If I get a
chance, I’ll try to post the code at some stage. In the mean time, here
is a bit more info about the graphic:</p>
<p>The size of each country is based on the total number of medals that
they have achieved, weighted by the type of medal (gold, silver,
bronze). For example, a country with one gold medal should be
approximately the same size as a country with three bronze medals.
Because the algorithm attempts to maintain the topological relationships
between countries, this relationship might not be perfect, but the
general trend is clear: the <span class="caps">US</span> and China and cleaning up! [<span class="caps">UPDATE</span>] I’ve
also added relative per capita medal counts and counts by <span class="caps">GDP</span>, which
shrinks China down significantly… now who’s winning ‘big’?!</p>
<p>I don’t have full access to the server that this website is running on,
<strike>so I couldn’t get things set up with regular updates,</strike> however, the
entire process is now automated, so <strike>I’ll try to have the code run at
fairly regular intervals so that the results are relatively up-to-date</strike>
I now have the maps updating every hour. In any case, have a play around
and let me know what you think, and if you know of any other cool
Olympics graphics or cool applications of cartograms, please let me know
in the comments!</p>Paper published in Environmental Monitoring and Assessment2012-07-30T18:02:00-04:00cfarmertag:carsonfarmer.com,2012-07-30:2012/07/paper-published-in-environmental-monitoring-and-assessment/<p>An article I worked on with Margaret E. Andrew, Trisalyn A. Nelson,
Michael A. Wulder, George W. Hobart, and Nicholas C. Coops, “Ecosystem
classifications based on summer and winter conditions”, has recently
been <a href="http://www.springerlink.com/content/q9gnq10vp35m34r2/?MUD=MP">published on-line</a>, with <a href="http://www.springerlink.com/content/0167-6369/">Environmental Monitoring and
Assessment</a>. For now, the article can be cited as:</p>
<blockquote>
<p>Andrew, M. E., <span class="caps">T. A.</span> Nelson, <span class="caps">M. A.</span> Wulder, <span class="caps">G. W.</span> Hobart, <span class="caps">N. C.</span> Coops,
and <span class="caps">C. J. Q.</span> Farmer (in press). Ecosystem classifications based on
summer and winter conditions. Environmental Monitoring and Assessment. doi:10.1007/s10661-012-2773-z</p>
</blockquote>
<p>If you would like a copy, but do not have access to the article, please
email me and I can forward you a <span class="caps">PDF</span> version.</p>Visualization featured on wired.co.uk2012-05-29T00:01:00-04:00cfarmertag:carsonfarmer.com,2012-05-29:2012/05/Visualization-featured-in-wired-magazine/<p>A few weeks back the <a href="http://www.st-andrews.ac.uk/geoinformatics/"><span class="caps">CGI</span></a> and I were approached by <a href="http://nacenta.com/about/">Miguel Nacenta</a>
from <a href="http://sachi.cs.st-andrews.ac.uk/"><span class="caps">SACHI</span></a> about putting together an infographic for an article on
<a href="http://fatfonts.org/">FatFonts</a> (also see my <a href="http://carsonfarmer.com/2012/05/visualising-data-with-fatfonts/">previous article</a>) to be featured on the
<a href="http://www.wired.co.uk/">Wired (<span class="caps">UK</span>) magazine</a> website. Since Wired is one of my favourite
magazines, I jumped at the chance! Using data that <a href="http://www.st-andrews.ac.uk/gsd/people/tas21/">Dr Timothy
Stojanovic</a> and I are working with as part of Tim’s work linking
off-shore cumulative human impacts to on-shore terrestrial urbanization,
I put together several infographics depicting human impacts on the
oceans surrounding Edinburgh and London. One of these graphics was
selected for the article, which is now <a href="http://www.wired.co.uk/news/archive/2012-05/25/fatfonts">available online</a>. So go check
out the article, and click on “View Gallery” to check out my
infographic, along with several cool images from <a href="http://nacenta.com/about/" title="About Miguel">Miguel Nacenta</a>,
<a href="http://www.utahinrichs.de/" title="FatFonts: Uta Hinrichs">Uta Hinrichs</a>, and <a href="http://pages.cpsc.ucalgary.ca/~sheelagh/wiki/pmwiki.php" title="Sheelagh's home page">Sheelagh Carpendale</a>!</p>
<p>The infographic in the article doesn’t have a title or legend per se, so
I’m going to create an amended version soon with an explanation of what
the values represent, and a bit of context to the data and what we are
hoping to show with it. So stayed tuned for updates!</p>Presentation at CAG 2012 in Waterloo2012-05-28T17:23:00-04:00cfarmertag:carsonfarmer.com,2012-05-28:2012/05/presentation-at-cag-2012-in-waterloo/<p>I am giving a talk and chairing a session at the <a href="http://www.cag2012.org/">Canadian Association
of Geographers (<span class="caps">CAG</span>) Annual Meeting</a> this week in Waterloo, Ontario,
Canada. My session, entitled “Spatial Modelling”, is <a href="http://www.cag2012.org/program.html">scheduled</a> for
Thursday at 08:30, and is supported by the <span class="caps">CAG</span> <span class="caps">GIS</span> Study Group. If you
are in the neighbourhood, check out the session, and stick around for my
talk: “Spatial interaction modelling of commuting flows within local
labour markets”, which is also <a href="/presentations/cag_2012/">available online</a> (best viewed at 90%
on full screen).</p>
<p>Hope I see you there!</p>Visualising data with FatFonts2012-05-26T20:30:00-04:00cfarmertag:carsonfarmer.com,2012-05-26:2012/05/visualising-data-with-fatfonts/<p>I recently posted an <a href="http://www.st-andrews.ac.uk/geoinformatics/visualising-with-fatfonts/">article on the <span class="caps">CGI</span> blog</a> about some
Visualizations that I produced with researchers from [St Andrews’
Computer Human Interaction Research Group (<span class="caps">SACHI</span>)][] using <a href="http://fatfonts.org/">FatFonts</a>,
a tpographic Visualization technique developed by <span class="caps">SACHI</span> co-founder
<a href="http://nacenta.com/about/">Miguel Nacenta</a> and colleagues (<a href="http://www.utahinrichs.de/">Uta Hinrichs</a>, and <a href="http://pages.cpsc.ucalgary.ca/~sheelagh/wiki/pmwiki.php">Sheelagh
Carpendale</a>).
The initial Visualizations are now on-line, and feature flow matrices for
<a href="http://www.st-andrews.ac.uk/geoinformatics/examples/fatfonts/uk-migration/">English internal migration</a> and <a href="http://www.st-andrews.ac.uk/geoinformatics/examples/fatfonts/irish-commuting/">commuting between Irish local labour markets</a>. We
have also produced several inforgraphics based on global oceans data
that <a href="http://www.st-andrews.ac.uk/gsd/people/tas21/">Dr Timothy Stojanovic</a> and I are working with as part of
Tim’s work linking off-shore cumulative human impacts to
on-shore terrestrial urbanization (more on these graphics soon). I’m
still experimenting with FatFonts at this stage, but so far, I’m quite
pleased with the results, and find that they offer a nice way to add
beauty to my potentially boring data!
<!--more--></p>
<p><a href="http://carsonfarmer.com/images/fat_fonts.png"><img alt="image" src="http://carsonfarmer.com/images/fat_fonts-300x122.png" title="Subset of FatFont image" /></a></p>
<p>Carson</p>Introducing the Centre for GeoInformatics2012-05-24T09:58:00-04:00cfarmertag:carsonfarmer.com,2012-05-24:2012/05/introducing-the-centre-for-geoinformatics/<p><a href="http://carsonfarmer.com/images/cgi_logo_full.png"><img alt="image" src="http://carsonfarmer.com/images/cgi_logo_full-300x78.png" title="cgi_logo_full" /></a></p>
<p>The new Centre for GeoInformatics (<span class="caps">CGI</span>) within the <a href="http://www.st-andrews.ac.uk/gg/">School of Geography
and Geosciences</a> and the <a href="http://www.st-andrews.ac.uk/gsd/">Department of Geography and Sustainable
Development</a> is now an official university <a href="http://www.st-andrews.ac.uk/gsd/research/centres/#cgi">Research Centre</a> at the
<a href="http://www.st-andrews.ac.uk/">University of St Andrews</a> in Scotland! A few months ago, I started my
new job as <a href="http://www.st-andrews.ac.uk/gsd/people/cjqf/">Research Fellow</a> with <span class="caps">CGI</span>, where I am continuing to work
with <a href="http://www.st-andrews.ac.uk/gsd/people/asf7/">Prof. A. Stewart Fotheringham</a>, who is the director of the new
centre, along with several new and continuing <a href="http://www.st-andrews.ac.uk/geoinformatics/people/students/">PhD students</a> and
<a href="http://www.st-andrews.ac.uk/geoinformatics/people/faculty/">faculty members</a>. Now that we have the centre officially up and
running, we are quickly getting ready for our <a href="http://www.st-andrews.ac.uk/geoinformatics/featured-item/">official launch</a> next
month, as well as the launch of <a href="http://www.st-andrews.ac.uk/geoinformatics/">our website</a> (which is still ‘in
development’). If you are interested in any aspect of GeoInformatics,
GIScience, Geocomputation, or spatial analysis, I strongly recommend you
check out our website to get a feel for what we are working on. We will
also be featuring a <a href="http://www.st-andrews.ac.uk/geoinformatics/blog/">research blog</a>, where we will post code, ideas,
thoughts, and results from our various research endeavours. Check it
out, and let us know what you think!</p>Paper published in International Journal of Applied Earth Observation and Geoinformation2012-04-13T15:48:00-04:00cfarmertag:carsonfarmer.com,2012-04-13:2012/04/paper-in-int-journal-applied-earth-observation-geoinformation/<p>An article I worked on with <a href="http://www2.le.ac.uk/departments/geography/people/ajc36">Lex Comber</a> and <a href="http://www.liv.ac.uk/geography/staff/brunsdon.htm">Chris Brunsdon</a>,
“Community detection in spatial networks: Inferring land use from a
planar graph of land cover objects”, has recently been <a href="http://www.sciencedirect.com/science/article/pii/S0303243412000220">published
on-line</a>, with the <a href="http://www.journals.elsevier.com/international-journal-of-applied-earth-observation-and-geoinformation/">International Journal of Applied Earth Observation
and Geoinformation</a>. The article can be cited as:</p>
<blockquote>
<p>Comber, A. J., Brunsdon, C. F., and Farmer, <span class="caps">C. J. Q.</span>(2012) Community
detection in spatial networks: Inferring land use from a planar graph
of land cover objects. <em>International Journal of Applied Earth
Observation and Geoinformation</em>, 18: 274–282</p>
</blockquote>
<p>If you would like a copy, but do not have access to the article, please
email me and I can forward you a <span class="caps">PDF</span> version.</p>Research dissemination and interactive visuals2012-04-10T22:12:00-04:00cfarmertag:carsonfarmer.com,2012-04-10:2012/04/research-dissemination-and-interactive-visuals/<p>One of my goals for this year is to spend more time and effort
developing effective Visualizations for my various research projects, in
an effort to make my research more accessible to others. This is one
thing that I think many academics are particularly bad at: letting
others know what they are up to, and why it might be something worth
looking at. In order to avoid this pitfall, I plan to focus on producing
interactive, web-based visuals suitable for a more general audience <em>in
addition</em> to more traditional forms of research dissemination such as
journals and conference papers. It is my hope that by doing this, I will
be making my research more readily available to those who might actually
be able to use it, and maybe even create some compelling Visualizations
in the process. While I’m not quite ready to start creating full-blown
interactive websites yet, I thought it might be a good idea to start
with something small to get the ball rolling; so I put together an
<a href="/examples/visitors/">upgraded version</a> of my <a href="http://carsonfarmer.com/2012/03/because-its-fun-to-map-stuff/">previous map</a> of visitors to www.carsonfarmer.com.
<!--more--></p>
<p>I used the excellent <a href="http://mbostock.github.com/d3/">D3 JavaScript library</a> to re-create the visitor
map, this time providing some basic interaction with the data. Most of
the functionality is based on <a href="http://mbostock.github.com/d3/ex/">examples</a> from the D3 website, and at
this point, the map is really more of a learning tool than anything. As
in my previous static map, the <span class="caps">IP</span> addresses were geocoded using the
<a href="http://www.datasciencetoolkit.org/">Data Science Toolkit <span class="caps">API</span></a> via the <a href="https://github.com/rtelmore/RDSTK"><span class="caps">RDSTK</span> R package</a>, and all data
processing and manipulation was done using R. Additionally, the colour
scheme functionality comes from <a href="http://mbostock.github.com/d3/talk/20111018/#25">slide 25</a> of <a href="http://mbostock.github.com/d3/talk/20111018/#0">this presentation</a>,
and uses the sequential colour palettes from <a href="http://colorbrewer2.org/">colorbrewer.org</a>. There
are still a few kinks to work out (like how to get the <code>onmouseout</code> event
to work properly in Internet Explorer), and tonnes of additional
features and functions could be added, so comments and suggestions are
welcome. Having said that, I think the new version looks quite nice, and
will likely form the basis for more complex visuals as I become more
familiar with Javascipt and various other web-development tools.</p>Because its fun to map stuff…2012-03-30T17:46:00-04:00cfarmertag:carsonfarmer.com,2012-03-30:2012/03/because-its-fun-to-map-stuff/<p>Its been quite a while since my last post, and its Friday and I was
feeling creative, so I decided to map something! I’ve been looking for
an excuse to produce a nice graphic like the one <a href="http://underdark.wordpress.com/about/">Anita Graser</a>
created to represent Vienna’s green-spaces. She used <a href="http://qgis.org/">Quantum
<span class="caps">GIS</span></a> to produce a <a href="http://underdark.wordpress.com/2012/03/04/mapping-density-with-hexagonal-grids/">hexagonal grid</a> for representing the density of
Viennese trees instead of the standard heat map or kernel density map,
and the results are quite nice! I’m a huge fan of <span class="caps">QGIS</span>, but I tend to do
most of my work in R, so I decided to see if I could produce something
similar using R. Turns out you can, and the final results are displayed
below (read on to see the full work-flow). Instead of trees, I went
ahead and mapped the locations of unique visitors to
<code>http://www.carsonfarmer.com/</code> between 2009 and 2011:</p>
<p><a href="http://carsonfarmer.com/images/website_visitors.svg"><img alt="Visitors map" src="http://carsonfarmer.com/images/website_visitors.svg" /></a></p>
<h3>Work-flow</h3>
<p>Firstly, I downloaded the logs for <code>www.carsonfarmer.com</code>. I did this
directly from the console, though I’m pretty sure I could have done this
from R as well. Next, I needed to extract the unique <span class="caps">IP</span> addresses from
the logs. I found this nice grep one-liner from <a href="http://blogs.law.harvard.edu/djcp/2009/04/how-to-extract-uniq-ips-from-apache-via-grep-cut-and-uniq/">here</a>, which I
modified to grab all unique <span class="caps">IP</span> addresses that ‘<span class="caps">GET</span>’ something from the site:</p>
<div class="highlight"><pre>grep <span class="s1">'GET'</span> access.log <span class="p">|</span> cut -d<span class="s1">' '</span> -f1 <span class="p">|</span> sort <span class="p">|</span> uniq > ip_addresses.log
</pre></div>
<p>To actually map the <span class="caps">IP</span> addresses, I obviously needed some way to convert
the raw <span class="caps">IP</span> addresses to latitude and longitude coordinates. Enter the
very nice <a href="http://www.datasciencetoolkit.org/">Data Science Toolkit (<span class="caps">DSTK</span>)</a> from Pete Warden and the very
handy <a href="https://github.com/rtelmore/RDSTK"><span class="caps">RDSTK</span> R package</a> from Ryan Elmore! Basically, the <span class="caps">DSTK</span> has an
<span class="caps">API</span> that can be queried for all sorts of information useful for ‘data
science’ applications, and the <span class="caps">RDSTK</span> makes it possible to query to <span class="caps">API</span>
directly from within R. I first heard about both these projects from the
<a href="http://blog.revolutionanalytics.com">Revolution Analytics blog</a>, where there is an <a href="http://blog.revolutionanalytics.com/2011/05/mapping-locations-in-r-with-the-data-science-toolkit.html">article</a> summarising
Ryan Elmore’s <a href="http://thelogcabin.wordpress.com/2011/05/02/r-and-the-data-science-toolkit/">work on <span class="caps">RDSTK</span></a>, and a few other handy links. <span class="caps">RDSTK</span>
isn’t (yet?) available on <a href="http://cran.r-project.org/"><span class="caps">CRAN</span></a>, so I downloaded it directly from github:</p>
<div class="highlight"><pre>wget https://github.com/rtelmore/RDSTK/raw/master/src/RDSTK_1.0.tar.gz
</pre></div>
<p>Then I installed it via <code>R CMD INSTALL</code> (note that it requires other R
packages: <code>RCurl</code>, <code>rjson</code>, and <code>plyr</code>):</p>
<div class="highlight"><pre>R CMD INSTALL RDSTK_1.0.tar.gz
</pre></div>
<p>Once I had all that stuff installed and ready to go, I actually started
up an R session and got working:</p>
<div class="highlight"><pre>addresses <span class="o">=</span> read.table<span class="p">(</span><span class="s">'ip_addresses.log'</span><span class="p">,</span> col.names<span class="o">=</span><span class="s">'address'</span><span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>RDSTK<span class="p">)</span>
<span class="o">?</span> ip2coordinates
</pre></div>
<p>The <code>ip2coordinates</code> function requires multiple IPs to be contained
within a single string with comma-separated <span class="caps">IP</span> addresses, but we can
only do a few IPs at a time (about 100 I think?) so I had to do this
part in a loop (it probably isn’t polite to slam <span class="caps">DSTK</span> with 1,000s of
requests, so be nice!).</p>
<div class="highlight"><pre>ips <span class="o">=</span> <span class="kp">paste</span><span class="p">(</span><span class="kp">as.character</span><span class="p">(</span>addresses<span class="o">$</span>address<span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="m">80</span><span class="p">]),</span> collapse<span class="o">=</span><span class="s">', '</span><span class="p">)</span>
out <span class="o">=</span> ip2coordinates<span class="p">(</span>ips<span class="p">)</span>
last <span class="o">=</span> <span class="m">80</span>
s <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="kp">seq</span><span class="p">(</span><span class="m">160</span><span class="p">,</span> <span class="kp">nrow</span><span class="p">(</span>addresses<span class="p">),</span> <span class="m">80</span><span class="p">),</span> <span class="kp">nrow</span><span class="p">(</span>addresses<span class="p">))</span>
<span class="kr">for</span> <span class="p">(</span>i <span class="kr">in</span> s<span class="p">)</span> <span class="p">{</span>
ips <span class="o">=</span> <span class="kp">paste</span><span class="p">(</span><span class="kp">as.character</span><span class="p">(</span>addresses<span class="o">$</span>address<span class="p">[(</span>last<span class="m">+1</span><span class="p">)</span><span class="o">:</span>i<span class="p">]),</span> collapse<span class="o">=</span><span class="s">', '</span><span class="p">)</span>
out <span class="o">=</span> <span class="kp">rbind</span><span class="p">(</span>out<span class="p">,</span> ip2coordinates<span class="p">(</span>ips<span class="p">))</span>
last <span class="o">=</span> i
<span class="p">}</span>
</pre></div>
<p>Once that is done running, the next step(s) are to a) convert the
returned <code>data.frame</code> to a <code>SpatialPointsDataFrame</code>, b) create a
<code>SpatialGrid</code> based on the points, c) create a <code>SpatialPolygons</code> object
from a hexagonal sample of the grid, and then finally d) create a
<code>SpatialPolygonsDataFrame</code> for plotting:</p>
<div class="highlight"><pre><span class="kn">library</span><span class="p">(</span>sp<span class="p">)</span>
<span class="c1"># make the output into a Spatial* object</span>
coordinates<span class="p">(</span>out<span class="p">)</span> <span class="o">=</span> <span class="o">~</span>longitude<span class="o">+</span>latitude
<span class="kn">library</span><span class="p">(</span>maptools<span class="p">)</span> <span class="c1"># need this for the following function</span>
sg <span class="o">=</span> Sobj_SpatialGrid<span class="p">(</span>out<span class="p">,</span> maxDim<span class="o">=</span><span class="m">200</span><span class="p">,</span> n<span class="o">=</span><span class="kc">NULL</span><span class="p">)</span><span class="o">$</span>SG
hex_pts <span class="o">=</span> spsample<span class="p">(</span>sg<span class="p">,</span> type<span class="o">=</span><span class="s">'hexagonal'</span><span class="p">,</span> cellsize<span class="o">=</span>sg<span class="o">@</span>grid<span class="o">@</span>cellsize<span class="p">)</span>
hex_poly <span class="o">=</span> HexPoints2SpatialPolygons<span class="p">(</span>hex_pts<span class="p">)</span>
pts_poly <span class="o">=</span> over<span class="p">(</span>hex_poly<span class="p">,</span> out<span class="p">,</span> returnList<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span>
pts_poly_count <span class="o">=</span> <span class="kp">sapply</span><span class="p">(</span>pts_poly<span class="p">,</span> <span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> <span class="kp">nrow</span><span class="p">(</span>x<span class="p">))</span>
poly <span class="o">=</span> SpatialPolygonsDataFrame<span class="p">(</span>poly<span class="p">,</span> <span class="kt">data.frame</span><span class="p">(</span><span class="s">'count'</span><span class="o">=</span>pts_poly_count<span class="p">),</span>
match.ID<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span>
</pre></div>
<p>Ok, now for some plotting!</p>
<div class="highlight"><pre><span class="c1"># pick some reasonable break points</span>
breaks <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">1.0</span><span class="p">,</span> <span class="m">10.0</span><span class="p">,</span> <span class="m">20.0</span><span class="p">,</span> <span class="m">50.0</span><span class="p">,</span> <span class="m">100.0</span><span class="p">,</span> <span class="m">500.0</span><span class="p">,</span> <span class="m">2000.0</span><span class="p">)</span>
<span class="c1"># use RColorBrewer to get a nice blue palette</span>
<span class="kn">library</span><span class="p">(</span>RColorBrewer<span class="p">)</span>
<span class="c1"># don't use the lightest colour, it looks washed out</span>
cols <span class="o">=</span> brewer.pal<span class="p">(</span><span class="m">7</span><span class="p">,</span><span class="s">"Blues"</span><span class="p">)[</span><span class="m">-1</span><span class="p">]</span>
<span class="c1"># plot the grid, which produces something close to our final product</span>
spplot<span class="p">(</span>poly<span class="p">[</span>poly<span class="o">$</span>count<span class="o">></span><span class="m">0</span><span class="p">,],</span> col<span class="o">=</span><span class="s">'white'</span><span class="p">,</span> col.regions<span class="o">=</span>cols<span class="p">,</span> at<span class="o">=</span>breaks<span class="p">,</span>
par.settings<span class="o">=</span><span class="kt">list</span><span class="p">(</span>axis.line<span class="o">=</span><span class="kt">list</span><span class="p">(</span>col<span class="o">=</span><span class="s">'transparent'</span><span class="p">)))</span>
</pre></div>
<p>I then used <a href="http://inkscape.org/">Inkscape</a> to tweak the final product, adding titles and
labels and modifying the colour key to look like something a bit more
suited to the map at hand. In the end I had a nice map of my blog
readers, and an excellent way to procrastinate on a sunny Friday afternoon!</p>Environment and Planning A paper published2011-12-01T13:52:00-05:00cfarmertag:carsonfarmer.com,2011-12-01:2011/12/environment-and-planning-a-paper-published/<p>My latest article, “Network-based functional regions”, has recently been
<a href="http://www.envplan.com/abstract.cgi?id=a44136">published on-line</a>, with <a href="http://www.envplan.com/A.html">Environment and Planning A</a>. The article
can be cited as:</p>
<blockquote>
<p>Farmer C J Q, Stewart Fotheringham A, 2011, “Network-based functional
regions” <em>Environment and Planning A</em> 43(11) 2723 – 2741</p>
</blockquote>
<p>If you would like a copy, but do not have access to the article, please
email me and I can forward you a <span class="caps">PDF</span> version.</p>It’s about time…2011-11-09T12:16:00-05:00cfarmertag:carsonfarmer.com,2011-11-09:2011/11/its-about-time/<p>Well its been a long time since my last post, but I <em>do</em> have a
relatively good reason: I was finishing up my PhD thesis. The good news
is that I’m now done and graduated! I’m hoping I’ll have a bit more time
to blog and continue working on side-projects that I had to put on-hold
while finishing up. My plan for the next few months is to finish up here
in Maynooth, (unofficially) start some post-doc work, and finish/get
going on several papers on my PhD research. I’m also going to try to
learn Bayesian statistics, fiddle about with some visualizations I’ve
been working on, and start getting back into <span class="caps">QGIS</span> and Python development again</p>
<p>In the mean time, I’ve put together a fun little visualization of my
PhD thesis in the form of a word-cloud.
<!--more--></p>
<p><a href="http://carsonfarmer.com/images/wordcloud.png"><img alt="image" src="http://carsonfarmer.com/images/wordcloud-300x280.png" title="Thesis wordcloud" /></a></p>
<p>This is actually a pretty rough version, and I suspect there are a few
issues with hyphenated words and things like that; but it does give a
pretty good impression of what my thesis is all about, so I’ll leave it
at that for now. For those who might be interested (and for my own
reference), the <code>R</code> code to generate this figure is here (requires the
<code>wordcloud</code> and <code>tm</code> packages):</p>
<div class="highlight"><pre><span class="c1"># read in all the lines as a character vector</span>
lines <span class="o"><-</span> <span class="kp">readLines</span><span class="p">(</span><span class="s">'modified.txt'</span><span class="p">)</span>
<span class="kp">head</span><span class="p">(</span>lines<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>tm<span class="p">)</span> <span class="c1"># text mining package</span>
<span class="kn">library</span><span class="p">(</span>wordcloud<span class="p">)</span>
<span class="c1"># create a corpus object</span>
corpus <span class="o"><-</span> Corpus<span class="p">(</span>DataframeSource<span class="p">(</span><span class="kt">data.frame</span><span class="p">(</span>lines<span class="p">)))</span>
<span class="c1"># now start processing the text and removing punctuation etc</span>
corpus <span class="o"><-</span> tm_map<span class="p">(</span>corpus<span class="p">,</span> removePunctuation<span class="p">)</span>
corpus <span class="o"><-</span> tm_map<span class="p">(</span>corpus<span class="p">,</span> <span class="kp">tolower</span><span class="p">)</span>
corpus <span class="o"><-</span> tm_map<span class="p">(</span>corpus<span class="p">,</span> <span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> removeWords<span class="p">(</span>x<span class="p">,</span> stopwords<span class="p">(</span><span class="s">"english"</span><span class="p">)))</span>
<span class="c1"># create a term document matrix (don't really know what that is...)</span>
tdm <span class="o"><-</span> TermDocumentMatrix<span class="p">(</span>corpus<span class="p">)</span>
<span class="c1"># convert to matrix</span>
m <span class="o"><-</span> <span class="kp">as.matrix</span><span class="p">(</span>tdm<span class="p">)</span>
<span class="c1"># count up re-occuring words</span>
v <span class="o"><-</span> <span class="kp">sort</span><span class="p">(</span><span class="kp">rowSums</span><span class="p">(</span>m<span class="p">),</span> decreasing<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span>
<span class="c1"># create dataframe for word cloud</span>
d <span class="o"><-</span> <span class="kt">data.frame</span><span class="p">(</span>word <span class="o">=</span> <span class="kp">names</span><span class="p">(</span>v<span class="p">),</span> freq<span class="o">=</span>v<span class="p">)</span>
png<span class="p">(</span><span class="s">"wordcloud.png"</span><span class="p">,</span> width<span class="o">=</span><span class="m">1280</span><span class="p">,</span> height<span class="o">=</span><span class="m">800</span><span class="p">)</span>
wordcloud<span class="p">(</span>d<span class="o">$</span>word<span class="p">,</span>d<span class="o">$</span>freq<span class="p">,</span><span class="kt">c</span><span class="p">(</span><span class="m">8</span><span class="p">,</span><span class="m">.3</span><span class="p">),</span><span class="m">2</span><span class="p">,</span><span class="m">100</span><span class="p">,</span><span class="kc">TRUE</span><span class="p">,</span><span class="m">.15</span><span class="p">,</span> vfont<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="s">"sans serif"</span><span class="p">,</span><span class="s">"plain"</span><span class="p">))</span>
dev.off<span class="p">()</span>
</pre></div>
<p>I actually got this snippet from <a href="http://onertipaday.blogspot.com/2011/07/word-cloud-in-r.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+OneRTipADay+(One+R+Tip+A+Day)">One R Tip A Day</a> via <a href="http://www.r-bloggers.com/">R-bloggers</a>.</p>Adding direct editing of geometry fields in QGIS2011-03-12T18:43:00-05:00cfarmertag:carsonfarmer.com,2011-03-12:2011/03/adding-direct-editing-of-geometry-fields-in-qgis/<p>Being able to add/remove attributes isn’t actually a very new feature
for <span class="caps">QGIS</span> at this point. However, to date non of the fTools functions
(Vector menu) have taken advantage of this capability. If a tool needed
to create a new field in the input vector layer, it simply wrote a new
version of the vector layer to disk with the additional fields added.
There have been several requests to allow some tools to add/update
attributes directly on the input layers, so I went ahead and created a
script to test this functionality out. I’ve
<a href="|filname|/uploads/add_geometry_information.py">provided a copy here</a>
for anyone who would like to test it out before I add it to <span class="caps">QGIS</span>
permanently. Basically, the script will replace/update three of the
Vector menu tools, including <code>Analysis \> Sum line lengths</code>, <code>Analysis
\> Points in polygon</code>, and <code>Geometry tools \> Add/Export geometry
info</code>.
<!--more-->
Here are some examples of the script’s usage from the <span class="caps">QGIS</span> Python console:</p>
<ol>
<li>Add geometry information (assumes that the target layer in first
layer in the layer-list):</li>
</ol>
<div class="highlight"><pre><span class="o">>>></span> <span class="n">mc</span> <span class="o">=</span> <span class="n">qgis</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">iface</span><span class="o">.</span><span class="n">mapCanvas</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">layer</span> <span class="o">=</span> <span class="n">mc</span><span class="o">.</span><span class="n">layer</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="o">>>></span> <span class="kn">import</span> <span class="nn">add_geometry_information</span>
<span class="o">>>></span> <span class="n">add_geometry_information</span><span class="o">.</span><span class="n">addGeometryInformation</span><span class="p">(</span><span class="n">vlayer</span><span class="p">)</span> <span class="bp">True</span>
</pre></div>
<ol>
<li>Count the number of points or length of lines in each polygon of an
input polygon layer (assumes polygon layer is second, and point or line
layer is the first in layer-list):</li>
</ol>
<div class="highlight"><pre><span class="o">>>></span> <span class="n">polygonLayer</span> <span class="o">=</span> <span class="n">mc</span><span class="o">.</span><span class="n">layer</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="o">>>></span> <span class="nb">input</span> <span class="n">Layer</span> <span class="o">=</span> <span class="n">mc</span><span class="o">.</span><span class="n">layer</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">add_geometry_information</span><span class="o">.</span><span class="n">countFeaturesInPolygon</span><span class="p">(</span><span class="n">polygonLayer</span><span class="p">,</span> <span class="n">inputLayer</span><span class="p">)</span> <span class="bp">True</span>
</pre></div>
<p>Note that to import the module properly (and easily), make sure it is
somewhere that PyQGIS can find it, such as <code>\~/.qgis/python</code>. If the layer
is in editing mode, then udpates can be undone, otherwise, updates are
written automatically to the provider. The <code>countFeaturesInPolygon()</code>
function automatically recognizes if an input layer is a point or line
layer, and computes the correct information accordingly (outputting a
count for points, and line lengths for lines). For both functions, the
last argument can be a boolean specifying whether to update selected
features only (<code>default=False</code>).</p>
<p>C</p>Because its Friday2010-11-06T01:00:00-04:00cfarmertag:carsonfarmer.com,2010-11-06:2010/11/because-its-friday/<p>My two favorite scientific programming languages are <a href="http://www.python.org">Python</a> and
<a href="http://http://www.r-project.org/">R</a>, each for their own specific strengths. I stick with R for most of
my serious stats stuff, but for everyday processing, analysis, and <span class="caps">GUI</span>
building, Python is my <em>modus operandi</em>. Lately however, I’ve been doing
more and more things in Python… even the stats stuff. When doing
statistical analysis in Python, I usually use the excellent <a href="http://rpy.sourceforge.net/rpy2.html">rpy2</a>
library to communicate between Python and R. As a result, I have put
together quite a few little code snippets to work with R commands in
Python. Recently, I decided to put a bunch of these snippets together to
create what I’ve called fakeR. Basically it is a simple Python script
that emulates a very basic (toy) R console. The fakeR console supports
multi-line commands and pretty much all regular R commands, but has no
history or any nice features like that. The cool and/or handy thing
about it is that it separates the input/output from the actual
processing via the very cool <a href="http://docs.python.org/library/multiprocessing.html">multiprocessing</a> Python package. Using
this package, it is possible to separate the input/output and processing
into two separate processes running in parallel, with communication back
and forth done via a duplex (two-way) pipe. I’ve <a href="http://carsonfarmer.com/uploads/faker.py">uploaded the script</a>
for anyone interested in having a play with it. If anyone has any ideas
on how to (safely) cancel a currently running R command on the
processing side, I’d be very interested to hear them.</p>
<p>Carson</p>Evolution is beautiful…2010-10-27T16:33:00-04:00cfarmertag:carsonfarmer.com,2010-10-27:2010/10/evolution-is-beautiful/<p>Check out <a href="http://vimeo.com/16148504">this video</a> of the evolution of OpenSteetMap in Europe! It
sure is cool to see how much ground a bunch of nerds with <span class="caps">GPS</span> units can
cover! Link courtesy of <a href="http://slashgeo.org/">slashgeo.org</a>.</p>pgRouting, OpenStreetMap, and QGIS2010-10-14T12:26:00-04:00cfarmertag:carsonfarmer.com,2010-10-14:2010/10/pgrouting-openstreetmap-and-qgis/<p>I mentioned <a href="http://carsonfarmer.com/2010/05/osm-data-by-country/">a few posts back</a> that there was a great resource for
downloading OpenStreetMap data, and that it was relatively easy to
import osm data into <code>PostgreSQL</code>/<code>PostGIS</code> for use with <code>pgRouting</code> to
calculate shortest paths and various other network-based operations. In
this post, I’ll outline the steps required to get all this up and going,
and provide a quick example to show how this can be combined with <span class="caps">QGIS</span>
to visualise the computed shortest path directly.
<!--more--></p>
<p>Firstly, we need to install all the required packages. I’m assuming you
already have <code>PostgreSQL</code> and <code>PostGIS</code> installed, but if not, have a <a href="http://carsonfarmer.com/2008/11/quick-guide-to-setting-up-postgis-database/">look
here</a> for a quick guide to getting things set up (Note that the latest
version of <code>PostgreSQL</code> is now 8.4).</p>
<div class="highlight"><pre>sudo apt-get install postgresql-server-dev-8.4 libboost-graph-dev
</pre></div>
<p>If you don’t have them already, you might also need tools for building
the required software packages, as well as subversion and cmake.</p>
<div class="highlight"><pre>sudo apt-get install build-essential subversion cmake
</pre></div>
<p>To be able to run the driving distance algorithms we need <span class="caps">CGAL</span>:</p>
<div class="highlight"><pre>sudo apt-get install libcgal*
</pre></div>
<p>And the traveling sales person algorithm requires <span class="caps">GAUL</span>:</p>
<div class="highlight"><pre>wget http://downloads.sourceforge.net/gaul/gaul-devel-0.1850-0.tar.gz
tar -xzf gaul-devel-0.1850-0.tar.gz <span class="nb">cd </span>gaul-devel-0.1850-0/
./configure --disable-slang
make
sudo make install
sudo ldconfig
</pre></div>
<p>Now it’s time to download, build, and install <code>pgRouting</code>. If you don’t
have subversion, or you don’t want to have the latest trunk version of
<code>pgRouting</code>, you can also <a href="http://pgrouting.postlbs.org/wiki/pgRoutingDownload">download it manually</a>.</p>
<div class="highlight"><pre>svn checkout http://pgrouting.postlbs.org/svn/pgrouting/trunk pgrouting
<span class="nb">cd </span>pgrouting/
cmake -DWITH_TSP<span class="o">=</span>ON -DWITH_DD<span class="o">=</span>ON . make
sudo make install
</pre></div>
<p>Once we’ve got that installed and ready to go, we need to set up
<code>PostgreSQL</code> so that it ‘trusts’ local database connections.</p>
<div class="highlight"><pre>sudo gedit /etc/postgresql/8.4/main/pg_hba.conf
</pre></div>
<p>Scroll to the bottom, and make sure you change the <code>METHOD</code> to ‘trust’.</p>
<div class="highlight"><pre># TYPE DATABASE USER CIDR-ADDRESS METHOD
local all all trust
</pre></div>
<p>And now that we’ve made these changes, we need to restart <code>PostgreSQL</code></p>
<div class="highlight"><pre>sudo /etc/init.d/postgresql-8.4 restart
</pre></div>
<p>Next we simply create a routing database to store our data in…</p>
<div class="highlight"><pre>createdb -U postgres routing createlang -U postgres plpgsql routing
</pre></div>
<p>…add the <code>PostGIS</code> functions…</p>
<div class="highlight"><pre>psql -U postgres -f /usr/share/postgresql/8.4/contrib/postgis-1.5/postgis.sql routing
psql -U postgres -f /usr/share/postgresql/8.4/contrib/postgis-1.5/spatial_ref_sys.sql routing
</pre></div>
<p>…add all the <code>pgRouting</code> functions…</p>
<div class="highlight"><pre>psql -U postgres -f /usr/share/postlbs/routing_core.sql routing
psql -U postgres -f /usr/share/postlbs/routing_core_wrappers.sql routing
psql -U postgres -f /usr/share/postlbs/routing_topology.sql routing
</pre></div>
<p>…including the traveling salesman functions…</p>
<div class="highlight"><pre>psql -U postgres -f /usr/share/postlbs/routing_tsp.sql routing
psql -U postgres -f /usr/share/postlbs/routing_tsp_wrappers.sql routing
</pre></div>
<p>…and finally the driving distance functions.</p>
<div class="highlight"><pre>psql -U postgres -f /usr/share/postlbs/routing_dd.sql routing
psql -U postgres -f /usr/share/postlbs/routing_dd_wrappers.sql routing
</pre></div>
<p>We now have a fully working <code>pgRouting</code> database ready to be populated
with data! So in order to do that relatively painlessly, we first
install the <a href="http://pgrouting.postlbs.org/wiki/tools/osm2pgrouting">osm2pgrouting</a> tool, which will help us import our osm
data directly into our <code>pgRouting</code> database with the correct structure and everything.</p>
<div class="highlight"><pre>svn checkout http://pgrouting.postlbs.org/svn/pgrouting/tools/osm2pgrouting/trunk osm2pgrouting
<span class="nb">cd </span>osm2pgrouting/
make
</pre></div>
<p>Once that’s finished building, we can go ahead and download our osm data
from <a href="http://download.geofabrik.de/osm/">http://download.geofabrik.de/osm/</a>. See this <a href="http://carsonfarmer.com/2010/05/osm-data-by-country/">previous post</a>
for details. For this example, I’ll be using the osm data for Ireland</p>
<div class="highlight"><pre>wget http://download.geofabrik.de/osm/europe/ireland.osm.bz2
bzip2 -d ireland.osm.bz2
</pre></div>
<p>Once we have that downloaded and extracted, we’re ready to import the
osm data into our database using the <code>osm2pgrouting</code> tool</p>
<div class="highlight"><pre>./osm2pgrouting -file /home/cfarmer/Downloads/ireland.osm -conf mapconfig.xml -dbname routing -user postgres -clean
</pre></div>
<p>Once that is finished (could take a long time) we’re ready to query the
network (Note that the values 52343 and 39219 represent network node ids)…</p>
<div class="highlight"><pre>psql -U postgres routing
<span class="k">select</span> * from shortest_path
<span class="o">(</span><span class="s1">'select gid as id,</span>
<span class="s1"> source::int4,</span>
<span class="s1"> target::int4,</span>
<span class="s1"> length::double precision as cost</span>
<span class="s1"> from ways'</span>,
52343, 39219, <span class="nb">false</span>, <span class="nb">false</span><span class="o">)</span><span class="p">;</span>
</pre></div>
<p>…to produce something like this:</p>
<div class="highlight"><pre>vertex_id | edge_id | cost
----------+---------+---------------------
52343 | 78055 | 0.217641978736602
52341 | 78052 | 0.0230665826613562
52342 | 78053 | 0.0839311516838216
20390 | 28717 | 0.166809293071158
20389 | 28716 | 0.493120178133836
20388 | 28715 | 0.271165901884914
20387 | 112841 | 0.101669458767093
14183 | 22893 | 0.106433172954507
...
</pre></div>
<p>Assuming your database is structured as <code>pgRouting</code> expects (which it
should be if you’ve used <code>osm2pgrouting</code>), you can use the some of the
functions which return geometries for use with other <code>PostGIS</code> functions:</p>
<div class="highlight"><pre><span class="k">select</span> * from dijkstra_sp<span class="o">(</span><span class="s1">'ways'</span>, 52343, 39219<span class="o">)</span><span class="p">;</span>
</pre></div>
<div class="highlight"><pre>id | gid | the_geom
----+--------+----------------------------
1 | 78055 | 0105000020E610000001000...
2 | 78052 | 0105000020E610000001000...
3 | 78053 | 0105000020E610000001000...
4 | 28717 | 0105000020E610000001000...
5 | 28716 | 0105000020E610000001000...
6 | 28715 | 0105000020E610000001000...
7 | 112841 | 0105000020E610000001000...
8 | 22893 | 0105000020E610000001000...
...
</pre></div>
<p>These queries are all fine and dandy, and can easily be used to
calculate the distances of shortest paths etc, but what I really want to
do is visualise this output in a <span class="caps">GIS</span> so I can get an idea of what these
shortest paths looks like. In <em>another</em> <a href="http://carsonfarmer.com/2010/04/postgis-select-statement-as-vector-layer-in-qgis/">previous post</a>, I mentioned
how we could visualise spatial <code>SQL</code> queries directly in <span class="caps">QGIS</span> from both
the Python console, and using a handy plugin. We can do the same thing
here using <code>pgRouting</code> to produce a lovely spatial representation of our
shortest path query:</p>
<p><a href="http://carsonfarmer.com/images/shortest_path.png"><img alt="image" src="http://carsonfarmer.com/images/shortest_path-300x213.png" title="shortest_path" /></a></p>
<p>And there you go, a full fledged routing library built right into our database!</p>Happy 42 day!2010-10-10T19:37:00-04:00cfarmertag:carsonfarmer.com,2010-10-10:2010/10/happy-42-day/<p>Happy <a href="http://fortytwoday.com/42daycities.html">42 day</a> to all those nerds and geeks out there! May this day
bring you the question to the ultimate answer!</p>
<p><center>
<a href="http://carsonfarmer.com/images/hitchhikersguide.jpg"><img alt="image" src="http://carsonfarmer.com/images/hitchhikersguide-300x123.jpg" /></a>
</center></p>Adding a bit of class(ification) to QGIS…2010-09-29T00:40:00-04:00cfarmertag:carsonfarmer.com,2010-09-29:2010/09/adding-a-bit-of-classification-to-qgis/<p>In my <a href="http://carsonfarmer.com/2010/09/playing-around-with-classification-algorithms-python-and-qgis/">last post</a>, I implemented several classification algorithms for
quantitative data which could be used directly from the Python console
in <span class="caps">QGIS</span>. While this was a handy addition to my PyQGIS toolkit, it wasn’t
quite handy enough for me, so I decided to (re)implement the same
algorithms in C++ so that they could be added directly to the <span class="caps">QGIS</span> <span class="caps">API</span>.
Before I did that however, I wanted to fix a few issues, and speed
things up a bit, particularly for the Jenks Natural Breaks algorithm,
which can be quite slow for large datasets.</p>
<p>After porting everything over to C++, I noticed that things were still
a little too slow for large datasets. My first thought was to limit the
amount of data that the algorithm had to go through by taking a random
sample (without replacement) and only running the algorithm on this
sample. Based on trial and error, I found that about 1000 values was a
good number to use, as it was still relatively fast, but generally not
too small to be unrepresentative of the overall distribution. In the
end, I went with using <code>max(1000, n*0.10)</code> for layers with more than
1000 features.</p>
<p>Since several of my colleagues here still use <span class="caps">ESRI</span> products, I also
decided to compare my version with the Natural Breaks algorithm in
ArcMAP. I noticed right away that their version was much faster (when I
didn’t use the random sampling scheme), so I decided to search around
for information on their implementation. Obviously <span class="caps">ESRI</span>’s documentation
didn’t explain the specifics of their algorithm, but it does produce
very similar class breaks to mine (and the implementation <a href="http://cran.r-project.org/web/packages/classInt/index.html">available in
R</a>,
so I was relatively confident that the main underlying algorithm
was similar. I then stumbled upon this question in an
<a href="http://mappingcenter.esri.com/index.cfm?fa=ask.answers&q=541"><span class="caps">ESRI</span> forum</a>:</p>
<blockquote>
<p>Does ArcMap use the Jenks-Caspall or the Fisher-Jenks algorithm for
classifying data into natural breaks. I did some support.esri.com
research and found that ArcView 3.x appeared to have used the
Fisher-Jenks, but ArcGIS Desktop only generically talked about Jenks
Optimization without eluded to what algorithm it was using.</p>
</blockquote>
<p>The initial response was quite good, but unfortunately, like all
propriety software companies, when asked exactly how the algorithm was
implemented, they responded with:</p>
<blockquote>
<p>That’s proprietary information, along the lines of a trade secret, and
so corporate policy is that we do not provide it.</p>
</blockquote>
<p>Bummer… I <em>was</em> able to figure out a few things from simply clicking
around the various different options in ArcMAP, and I found that for
large datasets, ArcMAP’s version of the Jenks algorithm also uses
sampling to reduce computation time. However, I was surprised to find
that their sampling technique appears to simply sample the first <code>x</code>
data values (where <code>x</code> defaults to 10,000, but can be changed by the
user). Depending on how the data was created/digitised, this may not
produce a statistically representative sample at all! In my opinion, it
is better to use a random sampling scheme, with the constraint that both
the minimum and maximum values are included in the random sample so that
we don’t lose values off the ends of our class intervals. Which is
exactly what I’ve done…</p>
<p>At the moment I’m still looking at ways to optimize my rather basic
implementation. So far I use a relatively simple sampling procedure, but
it might be possible to do something more ‘intelligent’ here to speed
things up while maintaining a statistically representative sample.
Comments are more than welcome. I have also posted the diff file to the
<span class="caps">QGIS</span> developers mailing list for evaluation, and once I tidy up a few
issues I’ll commit to this trunk, making it available for future
versions of <span class="caps">QGIS</span>.</p>
<p>Just to remind you of how important it is to carefully consider the
classification scheme you use when presenting your data, here is a
graphic of the 5 different classification schemes (soon to be) available
in <span class="caps">QGIS</span>. All 5 methods are grouping the same underlying data (2007
population) into the same number of classes (5).</p>
<p><a href="http://carsonfarmer.com/images/class_intervals.png"><img alt="image" src="http://carsonfarmer.com/images/class_intervals-300x118.png" /></a></p>Playing around with classification algorithms: Python and QGIS2010-09-23T10:56:00-04:00cfarmertag:carsonfarmer.com,2010-09-23:2010/09/playing-around-with-classification-algorithms-python-and-qgis/<p>Data visualization is part of my everyday work-flow. More often than
not, I’m playing around with my data in a <span class="caps">GIS</span> to tease out interesting
or informative spatial patterns, or to ensure that I’m getting the
results that I’m expecting. As a result, I am constantly trying out
different classification schemes to help me generalize spatial patterns,
highlight outliers and/or patterns, or just plain <a href="http://www.markmonmonier.com/how_to_lie_with_maps_14880.htm">mess around with my
data</a>.
<!--more--></p>
<p>Unfortunately, <a href="http://www.qgis.org/"><span class="caps">QGIS</span></a> (which has been my primary <span class="caps">GIS</span> for several years
now) only <strike>has</strike> had ‘Equal Interval’ and ‘Quantiles’ classification algorithms
implemented. While these classification schemes are no doubt useful and
revealing when used in the right context, I often need something that
better represents the ‘actual’ distribution of values in my data. For
this, I usually turn to the <a href="http://en.wikipedia.org/wiki/Jenks_Natural_Breaks_Optimization">Jenks Optimisation</a> (or Natural Breaks)
classification. Essentially, this classification algorithm generates
class intervals that minimize within group variance, and maximize
between group variance. In this way, given a certain number (<code>k</code>) of
classes, we arrive at an ‘optimal’ classification of our data into <code>k</code>
classes. In the past, I would import my data into <a href="http://www.r-project.org/">R</a>, and calculate
class intervals using the very handy <a href="http://cran.r-project.org/web/packages/classInt/index.html">classInt</a> package. However,
moving data between <span class="caps">QGIS</span> and R, while slightly easier using my <a href="http://code.google.com/p/ftools-qgis/">manageR
tool</a> (shameless plug!), is not optimal when all I really want to do
is fiddle around with different classification schemes. So I decided to
reimplement the Jenks algorithm in Python so that I could do things
directly from the Python console in <span class="caps">QGIS</span>.</p>
<p>Obviously I didn’t really want to implement this algorithm from scratch,
so I had a look at the R code from the <code>classInterval</code> function in the
classInt package (ah open source!), as well as the handy Python script
<a href="http://danieljlewis.org/2010/06/07/jenks-natural-breaks-algorithm-in-python/">from here</a>. Once I had the code in hand, it didn’t take long to have
a nice Python script ready to be run on my data directly from within
<span class="caps">QGIS</span>. While I was at it, I also implemented a few other classification
algorithms to play around with, including ‘Equal Interval’, ‘Quantiles’,
‘Standard Deviation’, and R’s ‘Pretty’ algorithm. For those of you who
don’t know, R’s pretty algorithm basically computes a sequence of about
‘n+1’ equally spaced ‘round’ values which cover the range of our input
data, such that the class breaks are 1, 2 or 5 times a power of 10. The
Python script is <a href="http://carsonfarmer.com/uploads/class_intervals.py">available here</a>, and has a version of the pretty
algorithm based on code from the <a href="https://r-forge.r-project.org/projects/labeling/">labeling</a> package.</p>New online GIS resource2010-07-29T20:59:00-04:00cfarmertag:carsonfarmer.com,2010-07-29:2010/07/new-online-gis-resource/<p>The new Geographic Information Systems Stack Exchange site is now open
to the public!</p>
<p>It’s brand new, and has the potential to be an extremely valuable
resource for <span class="caps">GIS</span> professionals, academics, enthusiasts, and just about
anyone else looking for answers to <span class="caps">GIS</span> related questions.
Check it out here: <a href="http://gis.stackexchange.com">http://gis.stackexchange.com</a></p>OSM data by country2010-05-13T12:05:00-04:00cfarmertag:carsonfarmer.com,2010-05-13:2010/05/osm-data-by-country/<p>For part of a traffic simulation project I am currently working on we
need country-wide road network data for Ireland. In the past, getting
decent road network data for an area this large was quite a task (not to
mention expensive and time consuming), however, with OpenStreetMap we
have access to this type of data instantly, and for free! In order to
download full country coverage all at once, all I had to do was turn to
this <a href="http://download.geofabrik.de/osm/">extremely useful site</a>, which provides links for daily excerpts
of OpenStreetMap data for any country in Europe plus several non-country
regions such as the Alps region, as well as select countries outside of
Europe. It currently also features <a href="http://labs.geofabrik.de/haiti/">special coverage of Haiti</a>.</p>
<p>Now that I have the <span class="caps">OSM</span> data downloaded, it should be relatively easy to
import it into my PostGIS database using <a href="http://pgrouting.postlbs.org/">pgRouting</a> and the
<a href="http://pgrouting.postlbs.org/wiki/tools/osm2pgrouting">osm2pgrouting</a> import tool. More to come on this topic once I get
things working nicely…</p>PostGIS ‘select’ statement as vector layer in QGIS2010-04-27T17:40:00-04:00cfarmertag:carsonfarmer.com,2010-04-27:2010/04/postgis-select-statement-as-vector-layer-in-qgis/<p>Several colleagues of mine have asked whether it is possible to
visualise the results of a <code>SELECT</code> statement on a <a href="http://postgis.refractions.net/">PostGIS</a> database
that returns spatial data in <a href="http://www.qgis.org/"><span class="caps">QGIS</span></a>. In other words, can we map the
results of something like:</p>
<div class="highlight"><pre>SELECT id, st_union<span class="o">(</span>the_geom<span class="o">)</span> FROM spatial_table GROUP BY id<span class="p">;</span>
</pre></div>
<p>My usual answer to this in the past has been “not yet…”, but now thanks
to Giuseppe Sucameli and Jürgen E. Fischer, the answer is a resounding
“yes!”. A <a href="https://trac.osgeo.org/qgis/changeset/13340">recent patch</a> to <span class="caps">QGIS</span> trunk now makes custom Postgres
queries possible via the postgres data provider.
<!--more--></p>
<p><strike>Unfortunately there is no user interface implemented to take advantage
of this functionality (yet!)</strike>There is now a plugin available from the
<a href="http://www.faunalia.it/qgis/plugins.xml">Faunalia python plugin repository</a> called <code>RT Sql Layer</code> which provides
a <span class="caps">GUI</span> for loading <code>PostGIS</code> <code>SELECT</code> statements as layer, but you can also
access this handy feature via the <code>QGIS</code> <code>Python</code> console:</p>
<div class="highlight"><pre><span class="n">db_conn</span> <span class="o">=</span> <span class="s">"dbname='gis' host=localhost port=5432 user='cfarmer' password='xxxx'"</span>
<span class="n">id_field</span> <span class="o">=</span> <span class="s">"id"</span>
<span class="n">table</span> <span class="o">=</span> <span class="s">"(select id, st_union(the_geom) from spatial_table group by id)"</span>
<span class="n">uri</span> <span class="o">=</span> <span class="s">"</span><span class="si">%s</span><span class="s"> key=</span><span class="si">%s</span><span class="s"> table=</span><span class="si">%s</span><span class="s"> (the_geom) sql="</span> <span class="o">%</span> <span class="p">(</span><span class="n">db_conn</span><span class="p">,</span><span class="n">id_field</span><span class="p">,</span><span class="n">table</span><span class="p">,)</span>
<span class="n">layer</span> <span class="o">=</span> <span class="n">QgsVectorLayer</span><span class="p">(</span><span class="n">uri</span><span class="p">,</span> <span class="s">"testlayer"</span><span class="p">,</span> <span class="s">"postgres"</span><span class="p">)</span>
</pre></div>
<p>we can then add the layer to the map canvas via:</p>
<div class="highlight"><pre><span class="n">QgsMapLayerRegistry</span><span class="o">.</span><span class="n">instance</span><span class="p">()</span><span class="o">.</span><span class="n">addMapLayer</span><span class="p">(</span><span class="n">layer</span><span class="p">)</span>
</pre></div>
<p>and even query/measure it via something like:</p>
<div class="highlight"><pre><span class="n">provider</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">dataProvider</span><span class="p">()</span>
<span class="n">feat</span> <span class="o">=</span> <span class="n">QgsFeature</span><span class="p">()</span>
<span class="n">provider</span><span class="o">.</span><span class="n">select</span><span class="p">([],</span> <span class="n">QgsRectangle</span><span class="p">())</span>
<span class="n">provider</span><span class="o">.</span><span class="n">nextFeature</span><span class="p">(</span><span class="n">feat</span><span class="p">)</span>
<span class="n">dist</span> <span class="o">=</span> <span class="n">QgsDistanceArea</span><span class="p">()</span>
<span class="n">dist</span><span class="o">.</span><span class="n">measure</span><span class="p">(</span><span class="n">feat</span><span class="o">.</span><span class="n">geometry</span><span class="p">())</span>
</pre></div>
<p>Just another one of the many new features being added to <span class="caps">QGIS</span> every day!</p>Parallel bootstrapping with R2010-04-21T15:34:00-04:00cfarmertag:carsonfarmer.com,2010-04-21:2010/04/parallel-bootstrapping-with-r/<p>In a <a href="http://carsonfarmer.com/2009/10/community-structure-in-directed-weighted-networks/">recent post</a>, I mentioned that I was testing the stability of
clusters generated from a modified network partitioning algorithm using
bootstrap resampling techniques. I also mentioned that I was doing this
in R, using the very nice <a href="http://cran.r-project.org/web/packages/foreach/index.html">foreach</a> package published by <a href="http://www.revolution-computing.com/">REvolution
Computing</a>. To show just how nice this package is, below is a minimal
example of bootstrapping a network partitioning algorithm which takes
advantage of a multicore processor:
<!--more--></p>
<div class="highlight"><pre><span class="kn">library</span><span class="p">(</span>doMC<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>foreach<span class="p">)</span>
<span class="kn">library</span><span class="p">(</span>igraph<span class="p">)</span>
<span class="c1"># Jaccard coeficcient function (taken from package fpc)</span>
clujaccard <span class="o">=</span> <span class="kr">function</span> <span class="p">(</span>c1<span class="p">,</span> c2<span class="p">,</span> zerobyzero <span class="o">=</span> <span class="kc">NA</span><span class="p">)</span> <span class="p">{</span>
<span class="kr">if</span> <span class="p">(</span><span class="kp">sum</span><span class="p">(</span>c1<span class="p">)</span> <span class="o">+</span> <span class="kp">sum</span><span class="p">(</span>c2<span class="p">)</span> <span class="o">-</span> <span class="kp">sum</span><span class="p">(</span>c1 <span class="o">&</span> c2<span class="p">)</span> <span class="o">==</span> <span class="m">0</span><span class="p">)</span>
out <span class="o">=</span> zerobyzero
<span class="kp">else</span>
out <span class="o">=</span> <span class="kp">sum</span><span class="p">(</span>c1 <span class="o">&</span> c2<span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="kp">sum</span><span class="p">(</span>c1<span class="p">)</span> <span class="o">+</span> <span class="kp">sum</span><span class="p">(</span>c2<span class="p">)</span> <span class="o">-</span> <span class="kp">sum</span><span class="p">(</span>c1 <span class="o">&</span> c2<span class="p">))</span>
<span class="kr">return</span><span class="p">(</span>out<span class="p">)</span>
<span class="p">}</span>
registerDoMC<span class="p">()</span> <span class="c1"># registers the parallel backend</span>
B <span class="o">=</span> <span class="m">1000</span> <span class="c1"># number of bootstrap replicates to create</span>
<span class="kp">load</span><span class="p">(</span><span class="s">"igraph_network.Rdata"</span><span class="p">)</span> <span class="c1"># load a previously saved network (network name: g)</span>
fg <span class="o">=</span> fastgreedy.community<span class="p">(</span>g<span class="p">)</span> <span class="c1"># compute original clustering</span>
mm <span class="o">=</span> <span class="kp">which.max</span><span class="p">(</span>fg<span class="o">$</span>modularity<span class="p">)</span> <span class="c1"># find level of max modularity</span>
moc <span class="o">=</span> community.to.membership<span class="p">(</span>g<span class="p">,</span> fg<span class="o">$</span>merges<span class="p">,</span> mm<span class="p">)</span><span class="o">$</span>membership <span class="c1"># get membership</span>
noc <span class="o">=</span> <span class="kp">length</span><span class="p">(</span><span class="kp">unique</span><span class="p">(</span>moc<span class="p">))</span> <span class="c1"># count the number original clusters</span>
bg <span class="o">=</span> g <span class="c1"># make a copy of g for bootstrapping</span>
clusters <span class="o">=</span> foreach<span class="p">(</span>i<span class="o">=</span><span class="kp">seq</span><span class="p">(</span>B<span class="p">),</span> <span class="m">.</span>combine<span class="o">=</span><span class="kp">cbind</span><span class="p">)</span> <span class="o">%dopar%</span> <span class="p">{</span>
E<span class="p">(</span>bg<span class="p">)</span><span class="o">$</span>weight <span class="o">=</span> <span class="kp">sample</span><span class="p">(</span>E<span class="p">(</span>g<span class="p">)</span><span class="o">$</span>weight<span class="p">,</span> replace<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span> <span class="c1"># resample the edge weights</span>
fg <span class="o">=</span> fastgreedy.community<span class="p">(</span>bg<span class="p">)</span> <span class="c1"># compute bootstrap clustering</span>
mm <span class="o">=</span> <span class="kp">which.max</span><span class="p">(</span>fg<span class="o">$</span>modularity<span class="p">)</span> <span class="c1"># find level of max modularity</span>
mbc <span class="o">=</span> community.to.membership<span class="p">(</span>bg<span class="p">,</span> fg<span class="o">$</span>merges<span class="p">,</span> mm<span class="p">)</span><span class="o">$</span>membership <span class="c1"># get membership</span>
nbc <span class="o">=</span> <span class="kp">length</span><span class="p">(</span><span class="kp">unique</span><span class="p">(</span>mbc<span class="p">))</span> <span class="c1"># count the number new clusters</span>
bootresult <span class="o">=</span> <span class="kt">c</span><span class="p">()</span>
<span class="kr">for</span> <span class="p">(</span>j <span class="kr">in</span> <span class="kp">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> noc<span class="m">-1</span><span class="p">))</span> <span class="p">{</span> <span class="c1"># for each of the original clusters...</span>
maxgamma <span class="o">=</span> <span class="m">0</span>
<span class="kr">if</span> <span class="p">(</span>nbc <span class="o">></span> <span class="m">0</span><span class="p">)</span> <span class="p">{</span>
<span class="kr">for</span> <span class="p">(</span>k <span class="kr">in</span> <span class="kp">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> nbc<span class="m">-1</span><span class="p">))</span> <span class="p">{</span> <span class="c1"># for each of the new clusters...</span>
bv <span class="o">=</span> <span class="kp">as.vector</span><span class="p">(</span>mbc <span class="o">==</span> k<span class="p">)</span>
ov <span class="o">=</span> <span class="kp">as.vector</span><span class="p">(</span>moc <span class="o">==</span> j<span class="p">)</span>
jc <span class="o">=</span> clujaccard<span class="p">(</span>ov<span class="p">,</span> bv<span class="p">,</span> zerobyzero<span class="o">=</span><span class="m">0</span><span class="p">)</span>
<span class="kr">if</span> <span class="p">(</span>jc <span class="o">></span> maxgamma<span class="p">)</span> <span class="c1"># if these two clusters are most similar...</span>
maxgamma <span class="o">=</span> jc
<span class="p">}</span>
<span class="p">}</span>
bootresult <span class="o">=</span> <span class="kt">c</span><span class="p">(</span>bootresult<span class="p">,</span> maxgamma<span class="p">)</span> <span class="c1"># combine results</span>
<span class="p">}</span>
<span class="kr">return</span> bootresult <span class="c1"># return the results of this iteration (and cbind with the rest)</span>
<span class="p">}</span>
bootmean <span class="o">=</span> <span class="kp">apply</span><span class="p">(</span>clusters<span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="kp">mean</span><span class="p">)</span> <span class="c1"># mean Jaccard coefficient for each cluster</span>
</pre></div>
<p>The above example might not produce great results, as it simply
resamples (with replacement) the weights of all the network edges, and
therefore a more sophisticated resampling regime might be warranted.
Having said that, it’s quite a useful example, and as you can see, the
only ‘extra’ bits required to make this run on multiple cores is the
<code>registerDoMC()</code> command which simply registers the parallel backend
(uses the multicore package) and the <code>foreach ... %dopar%</code> which tells <code>R</code>
to run the loops in parallel. I ran a similar analysis using a different
community detection algorithm on a computer with 4 cores, and was
(finally) able to take full advantage of my processing power:</p>
<p><a href="http://carsonfarmer.com/images/foreachcpu.png"><img alt="cpu_usage" src="http://carsonfarmer.com/images/foreachcpu-300x114.png" title="foreachcpu" /></a></p>Bootstrapping network partitioning methods2010-04-17T14:26:00-04:00cfarmertag:carsonfarmer.com,2010-04-17:2010/04/bootstrapping-network-partitioning-methods/<p>My PhD research at the moment focuses on network-based algorithms for
delineating functional regions (geographical regions within which a
large majority of the local population seeks employment, and the
majority of local employers recruit their labour). Currently I’m using a
network partitioning algorithm based on <a href="http://en.wikipedia.org/wiki/Modularity_(networks)">modularity maximisation</a>. I
have found my results to be quite good so far, but, ‘quite good’ isn’t
really a very scientific description of validity, so obviously some
others means of validation is required. Enter bootstrap resampling!
<!--more--></p>
<p>Bootstrapping can be used to assess the <strong>validity</strong> of a
particular network partitioning by measuring the <strong>stability</strong> of the
detected partitions (or clusters). Here, a cluster may be thought of as
stable if, for example, it remains relatively invariant to random- or
sampling-error and noise. In this sense, we’re interested in
distinguishing between clusters which reflect the true nature of the
dataset, and those generated as a result of random effects, data
uncertainties, or measurement error.</p>
<p>The process works like this:</p>
<ol>
<li>Generate a large number of random ‘bootstrap samples’ from a
(directed) weighted network,</li>
<li>Apply some network partitioning algorithm to the original network,</li>
<li>(Re)apply the network partitioning algorithm to each bootstrap sample,</li>
<li>For each cluster in the original network partitioning, the most
similar cluster in each bootstrap replicate is found using the
<a href="http://en.wikipedia.org/wiki/Jaccard_index">Jaccard coeffcient</a> <code>γ</code> as a measure of similarity, and
similarity is recorded,</li>
<li>The stability of each cluster is assessed based on the mean Jaccard
similarity over all resampled datasets.</li>
</ol>
<p>Once the above process is run, we get an estimate of how stable each
cluster is. We can then use this information to decide which clusters to
keep, and which ones need to be merged with their closest neighbour.
There are several ways to specify how we resample the data. If we assume
no specific structure in the dataset, regular non-parametric bootstrap
resampling will work fine, however, alternative resampling strategies
include: a) replacing network edge weights with noise, b) adding a small
amount of noise to (a percentage of) the network edges, or c) using only
a subset of the original network (i.e., generating a subgraph of the
original network).</p>
<p>I tested this process on a computer generated network with three
predefined clusters using resampling strategy (<em><code>b</code></em>) above, by adding
random noise to <em><code>k</code></em> percent of the network edges, and observed the
effect of increasing levels of uncertainty by applying the resampling
technique to increasing values of <em><code>k</code></em>. The results show just what we
would expect: as more noise is added to the dataset, the stability of
the detected clusters goes down. The nice bit however, is that for
<em><code>k <= 0.5</code></em>, the detected clusters remained relatively stable
(<em><code>γ >= 0.6</code></em>), meaning the network partitioning algorithm I was using
is doing a pretty good job. Nice!</p>
<p>This bootstrapping process is part of a paper I’m working on at the
moment, and uses a geographical variant of <a href="http://carsonfarmer.com/2009/10/community-structure-in-directed-weighted-networks/">this algorithm</a> to detect
functional regions in travel to work data. I’ll post more on the
algorithm and my bootstrapping implementation in R (using the very cool
<a href="http://cran.r-project.org/web/packages/foreach/index.html">foreach</a> package) here soon.</p>
<h3>References</h3>
<p>Leicht, E. A., <span class="amp">&</span> Newman, <span class="caps">M. E. J.</span>(2008). <a href="http://prl.aps.org/abstract/PRL/v100/i11/e118703">Community structure in
directed networks</a>. <em>Physical Review Letters</em>, 100(11), 118703.</p>
<p>Hennig, C. (2007). <a href="http://www.sciencedirect.com/science/article/B6V8V-4MJJMV8-1/2/303f8dd772cd73d54aea3a224b188005">Cluster-wise assessment of cluster stability</a>.
<em>Computational Statistics <span class="amp">&</span> Data Analysis</em>, 52(1), 258-271.</p>Why I’m *not* going to use Mendeley2010-04-14T23:57:00-04:00cfarmertag:carsonfarmer.com,2010-04-14:2010/04/why-im-not-going-to-use-mendeley/<p>Besides the obvious: “It’s not open source!”, I’m also not making the
switch from <a href="http://www.zotero.org/">Zotero</a> to <a href="http://www.mendeley.com/">Mendeley</a> for my academic reference
management needs due to the answer to this question on the Mendeley <span class="caps">FAQ</span> page:</p>
<blockquote>
<p><strong>Is Mendeley free?</strong></p>
<p>The straight answer would be yes and no. Yes, it’s free, because:
Everything you get when you sign up to Mendeley is completely free and
will always remain free - including the features described in <a href="http://www.mendeley.com/faq/#what-is-mendeley">What is Mendeley?</a></p>
<p>No, it’s not <em>completely</em> free, because: At a later point in time, we
will expand upon the existing features and introduce additional ones
for professional users —- these will be available for a (very
reasonable) fee.</p>
</blockquote>
<p>It’s also not <em>free</em> as in <em>freedom</em>…</p>Speeding up geoprocessing in QGIS2010-04-01T14:55:00-04:00cfarmertag:carsonfarmer.com,2010-04-01:2010/04/speeding-up-geoprocessing-in-qgis/<p>Last night I had an uncontrollable urge to make geopoprocessing in <span class="caps">QGIS</span>
better, faster and more fun! I had come across a couple of posts
(<a href="http://lin-ear-th-inking.blogspot.com/search?q=cascaded">here</a>, <a href="http://blog.cleverelephant.ca/2009/01/must-faster-unions-in-postgis-14.html">here</a>) on the idea of a cascaded union operation, and
since it has recently been added to <a href="http://trac.osgeo.org/geos/"><span class="caps">GEOS</span></a> (which <span class="caps">QGIS</span> uses for its
geometry operations), I thought I’d give a much needed boost to the
fTools union tool and related functions.
<!--more--></p>
<p>This required a bit of mucking about with <a href="http://doc.qgis.org/stable/classQgsGeometry.html">QgsGeometry</a>, but in the
end it really didn’t take too much hacking to get things working
properly. I was able to add a <code>combineCascaded(QList<QgsGeometry*>)</code>
function to the <code>QgsGeometry</code> class, as well as the required Python
bindings. Basically what this function does is union small subsets of
the input layer, then union groups of the resulting features, and so on
recursively until the final union of all features in the input list is
computed. There is a nice explanation of the algorithm straight from the
horses mouth <a href="http://lin-ear-th-inking.blogspot.com/search?q=cascaded">here</a>.</p>
<p>I haven’t yet committed these additions, as I’m not quite sure I like
how I’ve done things, but just to prove how much faster things can be,
here is a quick little demo that can be run from the built-in <span class="caps">QGIS</span>
Python console:</p>
<div class="highlight"><pre><span class="kn">import</span> <span class="nn">time</span>
<span class="n">canvas</span> <span class="o">=</span> <span class="n">qgis</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">iface</span><span class="o">.</span><span class="n">mapCanvas</span><span class="p">()</span>
<span class="n">layer</span> <span class="o">=</span> <span class="n">canvas</span><span class="o">.</span><span class="n">layer</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c"># assumes target layer is fist in layer list</span>
<span class="n">provider</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">dataProvider</span><span class="p">()</span>
<span class="n">attrs</span> <span class="o">=</span> <span class="n">provider</span><span class="o">.</span><span class="n">attributeIndexes</span><span class="p">()</span>
<span class="n">provider</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="n">attrs</span><span class="p">)</span>
<span class="n">geoms</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">feat</span> <span class="o">=</span> <span class="n">QgsFeature</span><span class="p">()</span>
<span class="c"># get a list of all the feature geometries in the layer</span>
<span class="k">while</span><span class="p">(</span><span class="n">provider</span><span class="o">.</span><span class="n">nextFeature</span><span class="p">(</span><span class="n">feat</span><span class="p">)):</span>
<span class="n">geom</span> <span class="o">=</span> <span class="n">QgsGeometry</span><span class="p">(</span><span class="n">feat</span><span class="o">.</span><span class="n">geometry</span><span class="p">())</span>
<span class="n">geoms</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">geom</span><span class="p">)</span>
</pre></div>
<p>First, using the current method, which adds all the geometries together one by one:</p>
<div class="highlight"><pre><span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">regular_geom</span> <span class="o">=</span> <span class="n">geom</span> <span class="c"># start with the last geometry in the layer</span>
<span class="k">for</span> <span class="n">geometry</span> <span class="ow">in</span> <span class="n">geoms</span><span class="p">:</span>
<span class="n">regular_geom</span> <span class="o">=</span> <span class="n">QgsGeometry</span><span class="p">(</span><span class="n">regular_geom</span><span class="o">.</span><span class="n">combine</span><span class="p">(</span><span class="n">geometry</span><span class="p">))</span>
<span class="n">end</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">total</span> <span class="o">=</span> <span class="n">end</span><span class="o">-</span><span class="n">start</span>
<span class="k">print</span> <span class="n">total</span>
</pre></div>
<p>Secondly, using cascaded union, which uses magic to combine geometries together
more efficiently. Also requires fewer lines of code!</p>
<div class="highlight"><pre><span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">cascaded_geom</span> <span class="o">=</span> <span class="n">geom</span><span class="o">.</span><span class="n">combineCascaded</span><span class="p">(</span><span class="n">geoms</span><span class="p">)</span>
<span class="n">end</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">total</span> <span class="o">=</span> <span class="n">end</span><span class="o">-</span><span class="n">start</span>
<span class="k">print</span> <span class="n">total</span>
</pre></div>
<p>When I tested this last night on the <code>grassland.shp</code> layer from the
<a href="http://www.qgis.org/en/download/sample-data.html"><span class="caps">QGIS</span> sample dataset</a> the results were about <strong>86.89</strong> seconds for the
‘old’ method, and <strong>6.14</strong> seconds for the cascaded union method. That’s
a <strong>14.15</strong> times speedup on a relatively small layer (about 143
features of varying complexity)! I’ve tested the function on both
poylgons and lines so far, and it appears to work quite nicely.
Eventually I’ll add this to the official <span class="caps">QGIS</span> <span class="caps">API</span> so that others can
take advantage of the speedup. Additionally, the other fTools functions
which rely to some degree on unioning will also benefit from the extra
speed, which is always a good thing.</p>QGIS developer meeting update2009-11-11T17:33:00-05:00cfarmertag:carsonfarmer.com,2009-11-11:2009/11/qgis-developer-meeting-update/<p>Last week I attended the 2009 <span class="caps">QGIS</span> Developers Meeting in Vienna,
Austria. We all had a really good time, <a href="http://blog.qgis.org/node/139">met many new people</a>, and
actually got a lot done in the process. There have been updates about
the meeting <a href="http://qgis.org/en/developer-meeting.html">(hackfest)</a> on the <span class="caps">QGIS</span> blog, and Tim Sutton has <a href="http://linfiniti.com/2009/11/report-back-on-the-qgis-hackfest-in-vienna-november-2009/">written
a few words</a> about our progress as well. I’m not going to repeat what
others have said, but I <em>would</em> like to give a quick update on the work
that I was doing at the meeting, and show off the new geoprocessing
features now available to all <span class="caps">QGIS</span> developers (Python and C++).
<!--more--></p>
<p>My main goal for the meeting was to start/continue work on the new
‘Analysis Library’ for <span class="caps">QGIS</span>. Basically, this was intended to be a port
(to C++ from Python) of the <a href="http://www.ftools.ca/">fTools</a> suite of tools already available
in <span class="caps">QGIS</span>. These tools currently provide geoprocessing, geometry, and
various other analysis functionality natively within <span class="caps">QGIS</span>. However, the
Python implementation is simply a Python plugin, and so does not provide
these functions to other developers hoping use/add geoprocessing
capabilities to their own plugins or tools. In addition, in some cases
the Python fTools functions can be quite slow, and would benefit greatly
from the speed-ups afforded by a compiled language like C++.</p>
<p>Porting these functions to C++ is still underway, and so far we have
implementations for buffering, dissolving, centroids, convex hulls,
layer extents in the QgsGeometryAnalyzer class, and intersections in the
QgsOverlayAnalyzer class. We have set it up so that it should be
relatively easy to add functions and classes to the library in the
future, though I expect there will be many changes to the library along
the way. We have also created Python bindings for the library, so these
functions will also be available to Python developers. By doing so, it
is now extremely easy to perform geoprocessing functions directly from
the <span class="caps">QGIS</span> Python console, or from <span class="caps">QGIS</span> Python plugins, with only a few
lines of code.</p>
<h3>The old way</h3>
<p>In the past, Python developers had to operate directly on the
geometries of the layers in order to do any sort of geoprocessing. For
example, to perform a (very) simple buffer, one had to use the following
code in the <span class="caps">QGIS</span> Python console:</p>
<div class="highlight"><pre><span class="kn">from</span> <span class="nn">qgis.core</span> <span class="kn">import</span> <span class="n">iface</span>
<span class="n">mc</span> <span class="o">=</span> <span class="n">iface</span><span class="o">.</span><span class="n">mapCanvas</span><span class="p">()</span> <span class="c"># get a reference to the map canvas</span>
<span class="n">layer</span> <span class="o">=</span> <span class="n">mc</span><span class="o">.</span><span class="n">layer</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c"># get a reference to the first layer in the layer list</span>
<span class="n">provider</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">dataProvider</span><span class="p">()</span> <span class="c"># data provider for the layer</span>
<span class="n">provider</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="n">layer</span><span class="o">.</span><span class="n">pendingAllAttributesList</span><span class="p">(),</span> <span class="n">QgsRectangle</span><span class="p">(),</span> <span class="bp">True</span><span class="p">,</span> <span class="bp">True</span><span class="p">)</span> <span class="c"># select features to buffer</span>
<span class="n">in_feat</span> <span class="o">=</span> <span class="n">QgsFeature</span><span class="p">()</span> <span class="c"># empty input feature</span>
<span class="n">out_feat</span> <span class="o">=</span> <span class="n">QgsFeature</span><span class="p">()</span> <span class="c"># empty output feature</span>
<span class="n">writer</span> <span class="o">=</span> <span class="n">QgsVectorFileWriter</span><span class="p">(</span> <span class="s">"output_path.shp"</span><span class="p">,</span> <span class="n">provider</span><span class="o">.</span><span class="n">encoding</span><span class="p">(),</span> <span class="n">provider</span><span class="o">.</span><span class="n">fields</span><span class="p">(),</span>
<span class="n">QGis</span><span class="o">.</span><span class="n">WKBPolygon</span><span class="p">,</span> <span class="n">provider</span><span class="o">.</span><span class="n">crs</span><span class="p">()</span> <span class="p">)</span> <span class="c"># use this to write results to disk (as shapefile)</span>
<span class="k">while</span><span class="p">(</span><span class="n">provider</span><span class="o">.</span><span class="n">nextFeature</span><span class="p">(</span><span class="n">feat</span><span class="p">)):</span> <span class="c"># for each feature that we selected...</span>
<span class="n">geometry</span> <span class="o">=</span> <span class="n">feat</span><span class="o">.</span><span class="n">geometry</span><span class="p">()</span> <span class="c"># grab it's geometry</span>
<span class="nb">buffer</span> <span class="o">=</span> <span class="n">geometry</span><span class="o">.</span><span class="n">buffer</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span><span class="mi">10</span><span class="p">)</span> <span class="c"># buffer the geometry</span>
<span class="n">out_feat</span><span class="o">.</span><span class="n">setAttributeMap</span><span class="p">(</span><span class="n">in_feat</span><span class="o">.</span><span class="n">attributeMap</span><span class="p">())</span> <span class="c"># set the attributes for the output feature</span>
<span class="n">out_feat</span><span class="o">.</span><span class="n">setGeometry</span><span class="p">(</span><span class="nb">buffer</span><span class="p">)</span> <span class="c"># set the bufer as the output geometry</span>
<span class="n">writer</span><span class="o">.</span><span class="n">addFeature</span><span class="p">(</span><span class="n">out_feat</span><span class="p">)</span> <span class="c"># write the feature to file</span>
<span class="k">del</span> <span class="n">writer</span> <span class="c"># delete/close the writer to save to disk</span>
</pre></div>
<h3>The new way</h3>
<p>With the new <span class="caps">QGIS</span> Analysis Library, things are much simpler, and the
same (or more complex) buffering example can be completed with only 5
lines of code in the <span class="caps">QGIS</span> Python Console:</p>
<div class="highlight"><pre><span class="kn">from</span> <span class="nn">qgis.core</span> <span class="kn">import</span> <span class="n">iface</span> <span class="c"># import iface (interface)</span>
<span class="kn">from</span> <span class="nn">qgis.analysis</span> <span class="kn">import</span> <span class="n">QgsGeometryAnalyzer</span> <span class="c"># import (par of) the analysis library</span>
<span class="n">mc</span> <span class="o">=</span> <span class="n">iface</span><span class="o">.</span><span class="n">mapCanvas</span><span class="p">()</span> <span class="c"># get a reference to the map canvas</span>
<span class="n">layer</span> <span class="o">=</span> <span class="n">mc</span><span class="o">.</span><span class="n">layer</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c"># get a reference to the first layer in the layer list</span>
<span class="n">QgsGeometryAnalyzer</span><span class="p">()</span><span class="o">.</span><span class="n">buffer</span><span class="p">(</span><span class="n">layer</span><span class="p">,</span> <span class="s">"output_path.shp"</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="bp">False</span><span class="p">,</span> <span class="bp">False</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="c"># perform the buffer</span>
</pre></div>
<p>Note: only the first two parameters in the buffer function are really
required, and the additional parameters control the buffer distance,
whether to use a subset (selected features) of the layer, whether to
dissolve the output buffer regions, and whether to use a field (field
<span class="caps">ID</span>) from the input layer’s attribute table as the buffer distance.</p>
<p>Pretty nice eh? For now, the <span class="caps">QGIS</span> Analysis Library is only available in
Trunk, however, before long these tools should be available as part of
the official releases, so I hope we will start to see more Plugins
taking advantage of these new capabilities in the future…</p>Community structure in directed, weighted networks2009-10-20T16:24:00-04:00cfarmertag:carsonfarmer.com,2009-10-20:2009/10/community-structure-in-directed-weighted-networks/<p>Many natural and human systems can be represented as networks, including
the Internet, social interactions, food webs, and transportation and
communication flows. One thing that these types of networks have in
common, is that they can each be represented as a series of vertices (or
nodes) and edges (or links). This <a href="http://toreopsahl.com/2008/11/28/network-weighted-network/">blog entry</a> presents a nice
description of networks, highlighting the differences between various
network types (directed, undirected, weighted, unweighted, etc.).
<!--more--></p>
<p>According to <a href="http://arxiv.org/abs/0709.4500">this paper</a>, many
networks are found to display “community structure”, which basically
refers to groupings of vertices where <em>within</em>-group edge connections
are more dense than <em>between</em>-group edge connections. In order to detect
and delineate these groupings, <a href="http://arxiv.org/abs/0709.4500">Leicht <span class="amp">&</span> Newman (2008)</a> present a nice “modularity” optimisation algorithm which is
designed to find a “good” division of a network by maximising</p>
<div class="math">$$Q = \frac{1}{2m}s^TB_s,$$</div>
<p>where <span class="math">\(s\)</span> is a vector whose elements define which group each node
belongs to, and <span class="math">\(\mathbf{B}\)</span> is the so-called modularity matrix, with elements</p>
<div class="math">$$B_{ij} = A_{ij} - \frac{k_{i}^{in} k_{j}^{out}}{m},$$</div>
<p>where <span class="math">\(A_{ij}\)</span> is an element in the adjacency matrix <span class="math">\(\mathbf{A}\)</span>, <span class="math">\(k_{i}^{in}\)</span>
and <span class="math">\(k_{j}^{out}\)</span> are the in- and out-degrees of the vertices, and <span class="math">\(m\)</span> is
the total sum of edges in the network. In practice, this can be extended
to directed networks by considering the matrix <span class="math">\(\mathbf{B} + \mathbf{B}^T\)</span> (for an
explanation of why this is the case, see <a href="http://arxiv.org/abs/0709.4500">Leicht <span class="amp">&</span> Newman</a>).</p>
<p>It is relatively straight-forward to extend the above modularity
optimisation algorithm to the case of a weighted network by computing
the modularity matrix using the in- and out-<em>strength</em>(see link to blog
post above) of the vertices instead of the degree. This is similar to
the concept presented in <a href="http://arxiv.org/abs/cond-mat/0407503">Newman (2004)</a>, and indeed the
theory of the modularity algorithm holds for this more general case
(note that an unweighted network can simply be represented as a weighted
network where the edge weights are all set to 1). As such, our new
modularity matrix can be computed as</p>
<div class="math">$$B_{ij} = A_{ij} - \frac{s_{i}^{in} s_{j}^{out}}{m},$$</div>
<p>where <span class="math">\(m = \sum_{i}s_{i}^{in} = \sum_{j} s_j^{out}\)</span>, and <span class="math">\(s\)</span> represents the vertex
strength. As such, using the above <em>new</em> definition of <span class="math">\(\mathbf{B}\)</span>, the
modularity of a directed, weighted network is computed as</p>
<div class="math">$$Q = \frac{1}{4m}s^{T}(\mathbf{B}-\mathbf{B}^{T})s.$$</div>
<p>My current research uses a modified modularity optimisation algorithm to
compute <a href="http://en.wikipedia.org/wiki/Functional_region">functional regions</a> for Ireland based on a range of
socio-economic variables. The goal is to provide a consistent framework
for computing functional regions which are comparable across different
countries and/or regions.</p>
<p>C</p>
<h3>References</h3>
<p>Leicht, <span class="caps">E. A.</span> <span class="amp">&</span> Newman, <span class="caps">M. E. J.</span>(2008). <a href="http://arxiv.org/abs/0709.4500">Community structure in directed networks</a>.
<em>Physical Review Letters</em>, 100(11), 118703.</p>
<p>Newman, <span class="caps">M. E. J.</span>(2004). <a href="http://arxiv.org/abs/cond-mat/0407503">Analysis of weighted networks</a>. <em>Physical Review E</em>, 70(5), 056131.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
var location_protocol = (false) ? 'https' : document.location.protocol;
if (location_protocol !== 'http' && location_protocol !== 'https') location_protocol = 'https:';
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = location_protocol + '//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Canadian Geographer paper published2009-10-12T12:32:00-04:00cfarmertag:carsonfarmer.com,2009-10-12:2009/10/canadian-geographer-paper-published/<p>My article “Spatial-temporal patterns of snow cover in western Canada”,
has now been <a href="http://www3.interscience.wiley.com/journal/122636489/abstract">published on-line</a> with <a href="http://www.wiley.com/bw/journal.asp?ref=0008-3658">The Canadian Geographer</a>.
<strike>At this stage, only the proof copy is available online.</strike>
The article is now officially published, and can be cited as:</p>
<blockquote>
<p>Farmer, <span class="caps">C. J.</span> Q., Nelson, T. A., Wulder, <span class="caps">M.A.</span> and Derksen, C. (2009).
Spatial-temporal patterns of snow cover in western Canada, <em>Canadian
Geographer</em>, 53 (4): 473-487.</p>
</blockquote>
<p>If you would like a copy, but do not have access to the article, please
email me and I can forward you a <span class="caps">PDF</span> version.</p>
<p>C</p>Remote Sensing of Environment paper published2009-10-06T13:30:00-04:00cfarmertag:carsonfarmer.com,2009-10-06:2009/10/remote-sensing-of-environment-paper-published/<p>My latest article, “Identification of snow cover regimes through spatial
and temporal clustering of satellite microwave brightness temperatures”,
has recently been <a href="http://dx.doi.org/10.1016/j.rse.2009.09.002">published on-line</a>, with the journal<a href="http://www.sciencedirect.com/science/journal/00344257">Remote Sensing
of Environment</a>. <strike>For now,</strike>
<span class="caps">UPDATE</span>: The article can be cited as:</p>
<blockquote>
<p>Farmer, <span class="caps">C. J.</span> Q., Nelson, T. A., Wulder, <span class="caps">M.A.</span> and Derksen, C. (2010).
Identification of snow cover regimes through spatial and temporal
clustering of satellite microwave brightness temperatures, <em>Remote
Sensing of Environment</em>, 114 (1): 199-210.</p>
</blockquote>
<p>If you would like a copy, but do not have access to the article, please
email me and I can forward you a <span class="caps">PDF</span> version.</p>Introduction to Open Source Geospatial Software2009-09-25T10:48:00-04:00cfarmertag:carsonfarmer.com,2009-09-25:2009/09/introduction-to-open-source-geospatial-software/<p>Announcing an opportunity to learn about the leading edge free and
open-source technologies for desktop and web-based mapping and data
analysis. This is a two day Masterclass focusing on introducing
participants to the wonderful world of open source geospatial software.
Check out the <a href="http://shortcourses.maths.lancs.ac.uk/geospatial">announcement</a> from the Postgraduate Statistics Centre
at Lancaster University.</p>Voronoi polygons with R2009-09-16T22:30:00-04:00cfarmertag:carsonfarmer.com,2009-09-16:2009/09/voronoi-polygons-with-r/<p>To create a nice bounded Voronoi polygons tessellation of a point layer
in <code>R</code>, we need two libraries: <code>sp</code> and <code>deldir</code>. The following function
takes a <code>SpatialPointsDataFrame</code> as input, and returns a
<code>SpatialPolygonsDataFrame</code> that represents the Voronoi tessellation of
the input point layer.
<!--more--></p>
<div class="highlight"><pre>voronoipolygons <span class="o">=</span> <span class="kr">function</span><span class="p">(</span>layer<span class="p">)</span> <span class="p">{</span>
<span class="kn">require</span><span class="p">(</span>deldir<span class="p">)</span>
crds <span class="o">=</span> layer<span class="o">@</span>coords
z <span class="o">=</span> deldir<span class="p">(</span>crds<span class="p">[,</span><span class="m">1</span><span class="p">],</span> crds<span class="p">[,</span><span class="m">2</span><span class="p">])</span>
w <span class="o">=</span> tile.list<span class="p">(</span>z<span class="p">)</span>
polys <span class="o">=</span> <span class="kt">vector</span><span class="p">(</span>mode<span class="o">=</span><span class="s">'list'</span><span class="p">,</span> length<span class="o">=</span><span class="kp">length</span><span class="p">(</span>w<span class="p">))</span>
<span class="kn">require</span><span class="p">(</span>sp<span class="p">)</span>
<span class="kr">for</span> <span class="p">(</span>i <span class="kr">in</span> <span class="kp">seq</span><span class="p">(</span>along<span class="o">=</span>polys<span class="p">))</span> <span class="p">{</span>
pcrds <span class="o">=</span> <span class="kp">cbind</span><span class="p">(</span>w<span class="p">[[</span>i<span class="p">]]</span><span class="o">$</span>x<span class="p">,</span> w<span class="p">[[</span>i<span class="p">]]</span><span class="o">$</span>y<span class="p">)</span>
pcrds <span class="o">=</span> <span class="kp">rbind</span><span class="p">(</span>pcrds<span class="p">,</span> pcrds<span class="p">[</span><span class="m">1</span><span class="p">,])</span>
polys<span class="p">[[</span>i<span class="p">]]</span> <span class="o">=</span> Polygons<span class="p">(</span><span class="kt">list</span><span class="p">(</span>Polygon<span class="p">(</span>pcrds<span class="p">)),</span> ID<span class="o">=</span><span class="kp">as.character</span><span class="p">(</span>i<span class="p">))</span>
<span class="p">}</span>
SP <span class="o">=</span> SpatialPolygons<span class="p">(</span>polys<span class="p">)</span>
voronoi <span class="o">=</span> SpatialPolygonsDataFrame<span class="p">(</span>SP<span class="p">,</span> data<span class="o">=</span><span class="kt">data.frame</span><span class="p">(</span>x<span class="o">=</span>crds<span class="p">[,</span><span class="m">1</span><span class="p">],</span>
y<span class="o">=</span>crds<span class="p">[,</span><span class="m">2</span><span class="p">],</span> row.names<span class="o">=</span><span class="kp">sapply</span><span class="p">(</span>slot<span class="p">(</span>SP<span class="p">,</span> <span class="s">'polygons'</span><span class="p">),</span>
<span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> slot<span class="p">(</span>x<span class="p">,</span> <span class="s">'ID'</span><span class="p">))))</span>
<span class="p">}</span>
</pre></div>
<p>To save the output to shapefile, simply use<code>writeOGR</code> from the <code>rgdal</code>
library:</p>
<div class="highlight"><pre><span class="kn">library</span><span class="p">(</span>rgdal<span class="p">)</span>
<span class="o">?</span>writeOGR
</pre></div>Journal of Spatial Information Science2009-09-10T11:19:00-04:00cfarmertag:carsonfarmer.com,2009-09-10:2009/09/journal-of-spatial-information-science/<p>Check out the <a href="http://josis.org/index.php/josis/index">Journal of Spatial Information Science</a>, a new,
peer-reviewed, open-access journal with a range of well established
geospatial academics on the editorial board.
<!--more--></p>
<blockquote>
<p>The Journal of Spatial Information Science (<span class="caps">JOSIS</span>) is an
international, interdisciplinary, open-access journal dedicated to
publishing high-quality, original research articles in spatial
information science. The journal aims to publish research spanning the
theoretical foundations of spatial and geographical information
science, through computation with geospatial information, to
technologies for geographical information use.</p>
<p><span class="caps">JOSIS</span> is run as a service to the geographic information science
community, supported entirely through the efforts of volunteers. <span class="caps">JOSIS</span>
does not aim to profit from the articles published in the journal,
which are open access.</p>
</blockquote>
<p>Since it’s run entirely by volunteers, they provide a link to <a href="http://josis.org/index.php/josis/user/register">become
involved</a> as a reader, reviewer, or author.</p>
<p>Thanks to <a href="http://www.le.ac.uk/geography/staff/academic_brunsdon.html">Professor Chris Brunsdon</a> for the heads-up!</p>Python, Matlab, and R2009-08-12T15:29:00-04:00cfarmertag:carsonfarmer.com,2009-08-12:2009/08/python-matlab-and-r/<p>One project I’m working on at the moment involves exploring a dynamic
extension of the Isomap algorithm for visualising constantly varying
real-world road networks. Currently, we are testing out the method on a
small scale simulated road network, and most of the original code
(written by <a href="http://ticc.uvt.nl/~lvdrmaaten/Laurens_van_der_Maaten/Home.html">Laurens van der Maaten</a>, with updates by <a href="http://ncg.nuim.ie/ncg/people/staff/pozdnoukhov/index.shtml">Alexei
Pozdnoukhov</a>), was done in Matlab. Since this work is eventually going
to have to run on relatively large datasets, and probably behind the
scenes on a server somewhere, we decided that Python was the way to go.
The goal therefore was to reproduce the Matlab code using only Python
libraries, and the fewer additional libraries required, the better.
<!--more--></p>
<p>The most difficult stage in all this was to convert the Matlab code to
Python code, while still remaining relatively fast and simple. The
solution is of course the NumPy Python library, and nothing could have
made this conversion more simple than this <a href="http://carsonfarmer.com/uploads/matlab-python-xref.pdf">pdf document</a>. It is
basically a syntax conversion chart between Matlab/Octave, Python, and
R… brilliant!</p>
<p>Check out Vidar Bronken Gundersen’s <a href="http://mathesaurus.sourceforge.net/">Mathesaurus</a> site for this, and
other useful resources for converting between different mathematical
computation environments.</p>FOSS4G and teaching GIS2009-07-20T10:45:00-04:00cfarmertag:carsonfarmer.com,2009-07-20:2009/07/foss4g-and-teaching-gis/<p>Two quicks notes to share:</p>
<p>Firstly, please check out this <a href="http://linfiniti.com/dla/">excellent introduction to <span class="caps">GIS</span></a> by Tim
Sutton, Otto Dassau, and Marcelle Sutton in partnership with the Chief
Directorate for Spatial Planning <span class="amp">&</span> Information, Department of Land
Affairs, Eastern Cape, South Africa, and the Spatial Information
Management Unit, Office of the Premier, Eastern Cape, South Africa. They
use <span class="caps">QGIS</span> to present some basic <span class="caps">GIS</span> concepts and skills, and I
particularly like their section on Coordinate Reference Systems.</p>
<p>Secondly, don’t forget to checkout the <a href="http://2009.foss4g.org/"><span class="caps">FOSS4G</span> 2009 Free and open source
software for geospatial</a> conference in Sydney in October. There will
be loads of excellent presentations and exhibitors, and the atmosphere
is always very cool. I will be presenting <a href="http://www.ftools.ca/plugins.html">some software</a> that I’ve
been developing for a while now, and will hopefully get a chance to
represent <a href="http://www.qgis.org/"><span class="caps">QGIS</span></a> there as well!</p>‘Watch’ long running processes2009-07-08T12:23:00-04:00cfarmertag:carsonfarmer.com,2009-07-08:2009/07/keep-an-eye-on-long-running-processes/<p>The other day I was loading a shapefile of approximately 11 million
records into a PostGIS database (stay tuned for more on that later) and
I wanted to know when shp2pgsql was done. Instead of continually
checking the console, I decided to ‘watch’ the process using the *nix
command <code>watch</code>. I discovered this handy tool a while ago, and have
found that for long running processes, I can use <code>watch</code> to notify me
when the process has finished, using the following command:</p>
<div class="highlight"><pre>watch -ben <span class="m">1</span> <span class="s2">"ps u -C shp2pgsql"</span>
</pre></div>
<!--more-->
<p>In this case, the three parameters <code>b</code>, <code>e</code>, and <code>n</code> tell <code>watch</code> to
<code>[b]</code>eep if the command has a non-zero exit (in this case when <code>shp2pgsql</code>
is no longer running), <code>[e]</code>xit watch if the command has a non-zero exit
(again when <code>shp2pgsql</code> is done), and the i<code>[n]</code>terval (in seconds) to wait
between updates (in this case 1 second). The rest of the command,
<code>ps u -C</code> is the command that <code>watch</code> runs each second. In this case, it
uses <code>ps</code> to report info on the running process, where the <code>-C</code> flag
tells <code>ps</code> to report the process matching the name <code>"shp2pgsql"</code>. When
<code>shp2pgsql</code> is no longer running, <code>ps u -C</code> will have a non-zero exit,
and I get my beep: very handy!
This can be made even more useful by changing the above command to:</p>
<div class="highlight"><pre>watch -ben <span class="m">1</span> <span class="s2">"ps u -C shp2pgsql"</span><span class="p">;</span> mail -s <span class="s2">"Process complete!"</span> email.address@some.one < /home/username/email_text.txt
</pre></div>
<p>Here I’ve added the <code>mail</code> command to send me an email when <code>watch</code>
exits (the ‘;’ simply allows me to have two commands on one line). If
you’re really smart, you could probably have <code>watch</code> save important info
about the running process to a file and send this with the email, but
for my purposes, the above works just fine.</p>
<p>The next step is figuring out how to make my computer text me when a
long process is complete… and thanks to <a href="https://twitter.com/w_dowling">Will</a>, I may be <a href="http://o2sms.sourceforge.net/">one step
closer</a> to this goal.</p>Syntax highlighting with PyQt2009-07-02T15:28:00-04:00cfarmertag:carsonfarmer.com,2009-07-02:2009/07/syntax-highlighting-with-pyqt/<p>A few months ago I decided to add syntax highlighting capabilities to a
<a href="http://www.ftools.ca/plugins.html">piece of software</a> that I have been working on. Since it is a PyQt
based application, the obvious choice for implementing syntax
highlighting was to use Qt’s QSyntaxHighlighter. Unfortunately, there
weren’t many examples around that implemented syntax highlighting in
Python, so I decided to post my own.
<!--more--></p>
<p>The Python file used in this example is <a href="http://carsonfarmer.com/uploads/highlighter.py">available here</a>.
To implement syntax highlighting, we need to subclass
QSyntaxHighlighter, reimplement the <code>highlightBlock</code> function, and
specify several highlighting rules. Generally, a rule consists of a
QRegExp pattern and a QTextCharFormat instance. For this example, the
syntax rules are based on the R statistical programming language. The
various rules can be stored using a Python list.</p>
<div class="highlight"><pre><span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">from</span> <span class="nn">PyQt4.QtGui</span> <span class="kn">import</span> <span class="o">*</span>
<span class="kn">from</span> <span class="nn">PyQt4.QtCore</span> <span class="kn">import</span> <span class="o">*</span>
<span class="k">class</span> <span class="nc">MyHighlighter</span><span class="p">(</span> <span class="n">QSyntaxHighlighter</span> <span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span> <span class="bp">self</span><span class="p">,</span> <span class="n">parent</span><span class="p">,</span> <span class="n">theme</span> <span class="p">):</span>
<span class="n">QSyntaxHighlighter</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span> <span class="bp">self</span><span class="p">,</span> <span class="n">parent</span> <span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">parent</span> <span class="o">=</span> <span class="n">parent</span>
<span class="bp">self</span><span class="o">.</span><span class="n">highlightingRules</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">keyword</span> <span class="o">=</span> <span class="n">QTextCharFormat</span><span class="p">()</span>
<span class="n">keyword</span><span class="o">.</span><span class="n">setForeground</span><span class="p">(</span> <span class="n">Qt</span><span class="o">.</span><span class="n">darkBlue</span> <span class="p">)</span>
<span class="n">keyword</span><span class="o">.</span><span class="n">setFontWeight</span><span class="p">(</span> <span class="n">QFont</span><span class="o">.</span><span class="n">Bold</span> <span class="p">)</span>
<span class="n">keywords</span> <span class="o">=</span> <span class="n">QStringList</span><span class="p">(</span> <span class="p">[</span> <span class="s">"break"</span><span class="p">,</span> <span class="s">"else"</span><span class="p">,</span> <span class="s">"for"</span><span class="p">,</span> <span class="s">"if"</span><span class="p">,</span> <span class="s">"in"</span><span class="p">,</span>
<span class="s">"next"</span><span class="p">,</span> <span class="s">"repeat"</span><span class="p">,</span> <span class="s">"return"</span><span class="p">,</span> <span class="s">"switch"</span><span class="p">,</span>
<span class="s">"try"</span><span class="p">,</span> <span class="s">"while"</span> <span class="p">]</span> <span class="p">)</span>
<span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">keywords</span><span class="p">:</span>
<span class="n">pattern</span> <span class="o">=</span> <span class="n">QRegExp</span><span class="p">(</span><span class="s">"</span><span class="se">\\</span><span class="s">b"</span> <span class="o">+</span> <span class="n">word</span> <span class="o">+</span> <span class="s">"</span><span class="se">\\</span><span class="s">b"</span><span class="p">)</span>
<span class="n">rule</span> <span class="o">=</span> <span class="n">HighlightingRule</span><span class="p">(</span> <span class="n">pattern</span><span class="p">,</span> <span class="n">keyword</span> <span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">highlightingRules</span><span class="o">.</span><span class="n">append</span><span class="p">(</span> <span class="n">rule</span> <span class="p">)</span>
</pre></div>
<p><code>MyHighlighter</code> is the subclassed <code>QSyntaxHighlighter</code> class, and will
contain our reimplemented <code>highlightBlock</code> function. The above example
is for the <code>keyword</code> rule, which recognizes the most common R keywords.
We give <code>keyword</code> a bold, dark blue font. For each keyword, we assign
the keyword and the specified format to a <code>HighlightingRule</code> object (see
the attached Python file) and append the object to our list of rules.
We can specify further syntax rules, including <code>reservedClasses</code>,
<code>assignmentOperators</code>, and <code>numbers</code>:</p>
<div class="highlight"><pre><span class="n">reservedClasses</span> <span class="o">=</span> <span class="n">QTextCharFormat</span><span class="p">()</span>
<span class="n">reservedClasses</span><span class="o">.</span><span class="n">setForeground</span><span class="p">(</span> <span class="n">Qt</span><span class="o">.</span><span class="n">darkRed</span> <span class="p">)</span>
<span class="n">reservedClasses</span><span class="o">.</span><span class="n">setFontWeight</span><span class="p">(</span> <span class="n">QFont</span><span class="o">.</span><span class="n">Bold</span> <span class="p">)</span>
<span class="n">keywords</span> <span class="o">=</span> <span class="n">QStringList</span><span class="p">(</span> <span class="p">[</span> <span class="s">"array"</span><span class="p">,</span> <span class="s">"character"</span><span class="p">,</span> <span class="s">"complex"</span><span class="p">,</span>
<span class="s">"data.frame"</span><span class="p">,</span> <span class="s">"double"</span><span class="p">,</span> <span class="s">"factor"</span><span class="p">,</span>
<span class="s">"function"</span><span class="p">,</span> <span class="s">"integer"</span><span class="p">,</span> <span class="s">"list"</span><span class="p">,</span>
<span class="s">"logical"</span><span class="p">,</span> <span class="s">"matrix"</span><span class="p">,</span> <span class="s">"numeric"</span><span class="p">,</span>
<span class="s">"vector"</span> <span class="p">]</span> <span class="p">)</span>
<span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">keywords</span><span class="p">:</span>
<span class="n">pattern</span> <span class="o">=</span> <span class="n">QRegExp</span><span class="p">(</span><span class="s">"</span><span class="se">\\</span><span class="s">b"</span> <span class="o">+</span> <span class="n">word</span> <span class="o">+</span> <span class="s">"</span><span class="se">\\</span><span class="s">b"</span><span class="p">)</span>
<span class="n">rule</span> <span class="o">=</span> <span class="n">HighlightingRule</span><span class="p">(</span> <span class="n">pattern</span><span class="p">,</span> <span class="n">reservedClasses</span> <span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">highlightingRules</span><span class="o">.</span><span class="n">append</span><span class="p">(</span> <span class="n">rule</span> <span class="p">)</span>
<span class="n">assignmentOperator</span> <span class="o">=</span> <span class="n">QTextCharFormat</span><span class="p">()</span>
<span class="n">pattern</span> <span class="o">=</span> <span class="n">QRegExp</span><span class="p">(</span> <span class="s">"(<){1,2}-"</span> <span class="p">)</span>
<span class="n">assignmentOperator</span><span class="o">.</span><span class="n">setForeground</span><span class="p">(</span> <span class="n">Qt</span><span class="o">.</span><span class="n">green</span> <span class="p">)</span>
<span class="n">assignmentOperator</span><span class="o">.</span><span class="n">setFontWeight</span><span class="p">(</span> <span class="n">QFont</span><span class="o">.</span><span class="n">Bold</span> <span class="p">)</span>
<span class="n">rule</span> <span class="o">=</span> <span class="n">HighlightingRule</span><span class="p">(</span> <span class="n">pattern</span><span class="p">,</span> <span class="n">assignmentOperator</span> <span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">highlightingRules</span><span class="o">.</span><span class="n">append</span><span class="p">(</span> <span class="n">rule</span> <span class="p">)</span>
<span class="n">number</span> <span class="o">=</span> <span class="n">QTextCharFormat</span><span class="p">()</span>
<span class="n">pattern</span> <span class="o">=</span> <span class="n">QRegExp</span><span class="p">(</span> <span class="s">"[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?"</span> <span class="p">)</span>
<span class="n">pattern</span><span class="o">.</span><span class="n">setMinimal</span><span class="p">(</span> <span class="bp">True</span> <span class="p">)</span>
<span class="n">number</span><span class="o">.</span><span class="n">setForeground</span><span class="p">(</span> <span class="n">Qt</span><span class="o">.</span><span class="n">blue</span> <span class="p">)</span>
<span class="n">rule</span> <span class="o">=</span> <span class="n">HighlightingRule</span><span class="p">(</span> <span class="n">pattern</span><span class="p">,</span> <span class="n">number</span> <span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">highlightingRules</span><span class="o">.</span><span class="n">append</span><span class="p">(</span> <span class="n">rule</span> <span class="p">)</span>
</pre></div>
<p>After a QSyntaxHighlighter object is created, its highlightBlock()
function will be called automatically whenever it is necessary by the
rich text engine, highlighting the given text block. To perform the
actual formatting, the QSyntaxHighlighter class provides the <code>setFormat</code>
function. This function operates on the text block that is passed as
argument to the <code>highlightBlock</code> function. The specified format is
applied to the text from the given start position for the given length.
The formatting properties set in the given format are merged at display
time with the formatting information stored directly in the document.</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">highlightBlock</span><span class="p">(</span> <span class="bp">self</span><span class="p">,</span> <span class="n">text</span> <span class="p">):</span>
<span class="k">for</span> <span class="n">rule</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">highlightingRules</span><span class="p">:</span>
<span class="n">expression</span> <span class="o">=</span> <span class="n">QRegExp</span><span class="p">(</span> <span class="n">rule</span><span class="o">.</span><span class="n">pattern</span> <span class="p">)</span>
<span class="n">index</span> <span class="o">=</span> <span class="n">expression</span><span class="o">.</span><span class="n">indexIn</span><span class="p">(</span> <span class="n">text</span> <span class="p">)</span>
<span class="k">while</span> <span class="n">index</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">length</span> <span class="o">=</span> <span class="n">expression</span><span class="o">.</span><span class="n">matchedLength</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">setFormat</span><span class="p">(</span> <span class="n">index</span><span class="p">,</span> <span class="n">length</span><span class="p">,</span> <span class="n">rule</span><span class="o">.</span><span class="n">format</span> <span class="p">)</span>
<span class="n">index</span> <span class="o">=</span> <span class="n">text</span><span class="o">.</span><span class="n">indexOf</span><span class="p">(</span> <span class="n">expression</span><span class="p">,</span> <span class="n">index</span> <span class="o">+</span> <span class="n">length</span> <span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">setCurrentBlockState</span><span class="p">(</span> <span class="mi">0</span> <span class="p">)</span>
</pre></div>
<p>This process is repeated until the last occurrence of the pattern in the
current text block is found. For rules that apply over multiple blocks
or lines, further logic is needed. For an example, see the
<a href="http://doc.trolltech.com/4.2/richtext-syntaxhighlighter.html">QSynatxHighlighter</a> documentation.
In order to apply the syntax highlighter to a QTextEdit, we simply
create an instance of our QSyntaxHighlighter subclass, and pass it the
QTextEdit or QTextDocument that we want the syntax highlighting to be
applied to, as the following test application demonstrates:</p>
<div class="highlight"><pre><span class="k">class</span> <span class="nc">TestApp</span><span class="p">(</span> <span class="n">QMainWindow</span> <span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">QMainWindow</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
<span class="n">editor</span> <span class="o">=</span> <span class="n">QTextEdit</span><span class="p">()</span>
<span class="n">highlighter</span> <span class="o">=</span> <span class="n">MyHighlighter</span><span class="p">(</span> <span class="n">editor</span> <span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">setCentralWidget</span><span class="p">(</span> <span class="n">editor</span> <span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">setWindowTitle</span><span class="p">(</span> <span class="s">"Syntax Highlighter Example"</span> <span class="p">)</span>
</pre></div>
<p>Once implemented, the above example produces output like this:</p>
<p><a href="http://carsonfarmer.com/images/screenshot.png" title="screenshot"><img alt="image" src="http://carsonfarmer.com/images/screenshot.png" title="screenshot" /></a></p>R featured in New York Times2009-01-28T12:14:00-05:00cfarmertag:carsonfarmer.com,2009-01-28:2009/01/r-featured-in-new-york-times/<p>I’m sure everyone has seen this already, but I’m going to post it
anyway, as I think the more exposure open-source tools get, the better
off we’ll all be!</p>
<p>Check out this <a href="http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?e%3Cbr%20/%3Em" title="Data Analysts Captivated by R’s Power">New York Times article</a> which features <code>R</code>, the
open-source statistical programming language. <code>R</code> now has quite an
extensive range of <a href="http://cran.r-project.org/web/views/Spatial.html" title="CRAN Task View: Analysis of Spatial Data">spatial analysis options</a>, and is the software of
choice for researchers using spatial statistics and geographic
information analysis.</p>Open up your online maps with OpenStreetMap2009-01-26T23:27:00-05:00cfarmertag:carsonfarmer.com,2009-01-26:2009/01/open-up-your-web-maps-with-openstreetmap/<p>OpenStreetMap (<span class="caps">OSM</span>) is a project designed to create and provide free
spatial data (street maps) to anyone and everyone who wants them. It is
based on an open-source <a href="http://wiki.openstreetmap.org/wiki/FAQ#Why_OpenStreetMap.3F" title="OpenStreetMap Philosophy">philosophy</a>, and combines wiki-like user
generated data, with <a href="http://www.opengeodata.org/?p=262">free access</a>, allowing users to create, edit,
download, and use <span class="caps">OSM</span> data to their hearts content. According to the
<a href="http://www.openstreetmap.org/index.html"><span class="caps">OSM</span> website</a>, “the project was started because most maps you think of
as free actually have legal or technical restrictions on their use,
holding back people from using them in creative, productive or
unexpected ways.” There are now tones of websites and open-source
software projects that incorporate <span class="caps">OSM</span> data, and the growing popularity
of the site means that the data is only going to get better (more
accurate) and bigger (more data).
<!--more--></p>
<p>Essentially, OpenStreetMap contributors go out into the world
with handheld <span class="caps">GPS</span> units, and an insatiable need to map everything around
them. They track their own movements down streets and trails, and along
the way they record street names, parks, towns, cities, and other points
of interest (POIs). All these <span class="caps">GPS</span> tracks can be uploaded into the <span class="caps">OSM</span>
database, along with place names, street names, and any other pertinent
information (type of road, and/or road intersections). All the while,
others can do the same thing, in the same area, adjusting the same data,
making more and more accurate maps of the region.</p>
<p>If all this free data isn’t enough, <span class="caps">OSM</span> also processes the uploaded
data, and produces detailed street-level maps which are freely available
for publishing on websites. If fact, in 5 simple steps, you too can have
a beautiful <span class="caps">OSM</span> powered map on your website.</p>
<h3>How to put an OpenStreetMap on your own site</h3>
<ol>
<li>Browse the OpenStreetMap and find the area you want to map, using
the zoom tools to zoom right in to your area of interest.</li>
<li>Select the export tab.</li>
<li>When choosing the export format, select “Embeddable <span class="caps">HTML</span>”.</li>
<li>If you want, pop in a marker symbol so everyone knows where to look…</li>
<li>Copy the provided <span class="caps">HTML</span> code, and paste into your site wherever you want the map to show up.</li>
</ol>
<p>For an example, check out <a href="/contact">my contact page</a>. As you can see, <strike>there
aren’t a lot of people collecting data in Maynooth…</strike>, recent work in
Maynooth has created an extremely rich, usable, and downright decent map
of Maynooth. Nice work Blazej Ciepluch!</p>Understanding spatial reference systems2009-01-12T15:31:00-05:00cfarmertag:carsonfarmer.com,2009-01-12:2009/01/understanding-spatial-reference-systems/<p>For those of you who are still unclear about what exactly a spatial
reference system is, how it is used, and what it means for your data, I
found a pretty good quick guide to <a href="http://www.sharpgis.net/post/2007/05/05/Spatial-references2c-coordinate-systems2c-projections2c-datums2c-ellipsoids-e28093-confusing.aspx">spatial references, coordinate
systems, projections, datums and ellipsoids</a>. This article was written
by Morten Nielsen (who works for <span class="caps">ESRI</span>), and it does a good job of
quickly and simply describing what makes up a spatial reference system,
and some of the errors that people make when talking about their spatial data.</p>
<p>Having a good grasp of this stuff is important when working with spatial
data, so guides like the one above should really only be used as a quick
reference to more in-depth material covering these concepts. Check out
the links below if you want to learn a little bit more:</p>
<ul>
<li><a href="http://webhelp.esri.com/arcgisdesktop/9.3/body.cfm?tocVisable=0&ID=87&TopicName=An%20overview%20of%20map%20projections#">An <span class="caps">ESRI</span> overview of map projections</a></li>
<li><a href="http://spatialreference.org/">spatialreference.org/</a></li>
<li><a href="http://www.progonos.com/furuti/MapProj/Normal/TOC/cartTOC.html">Carlos A. Furuti’s Map Projection Pages</a></li>
<li><a href="http://www.kartografie.nl/geometrics/Map%20projections/mappro.html">Knippers, <span class="caps">R.A.</span>- Map projections</a></li>
</ul>gedit: The ultimate LaTeX editor2008-12-12T00:56:00-05:00cfarmertag:carsonfarmer.com,2008-12-12:2008/12/gedit-the-ultimate-latex-editor/<p>Out of the box <em>gedit</em> is a basic text editor, but it comes equipped
with about 12 standard plugins, and another 9 readily available. In
addition to this, there are a range of ‘third-party’ plugins developed
to do various specific tasks, such as assist you in writing and
exporting LaTeX documents!
<!--more-->
First, get all the basic plugins:</p>
<div class="highlight"><pre>sudo apt-get install gedit-plugins<span class="sb">`</span>
</pre></div>
<p>and enable them in gedit by going to <code>Edit > Preferences > Plugins</code>, and
checking the ones that you want.</p>
<p>Second, make sure you have all the required dependencies for the actual
<span class="math">\(\LaTeX\)</span> plugin:
1. The plugin is written in Python 2.4 and relies on PyGTK 2.4: <code>sudo apt-get install python-gtk2</code>
2. Ensure that you have rubber installed. It is used for automated document
compiling: <code>sudo apt-get install rubber</code>
3. To use the <span class="caps">DVI</span> inverse search you need the Python bindings for D-<span class="caps">BUS</span>: <code>sudo apt-get install python-dbus</code></p>
<p>Third, download the latest version of the <span class="math">\(\LaTeX\)</span> plugin from <a href="http://live.gnome.org/Gedit/LaTeXPlugin">here</a>,
and extract and copy the contained folder and a file to
<code>~/.gnome2/gedit/plugins</code>. You may have to create <code>gedit/plugins</code> if you
haven’t installed any other plugins yet.</p>
<p>After that, restart <code>gedit</code> and activate the plugin in the settings
dialog as we did with the other plugins.</p>
<p>Now you have an editor with all sorts of handy functions, including
inline spell check, code completion, tag, symbol, and character
insertion, a file and document browser, and an embedded terminal, as
well as tools to automatically create new <span class="math">\(\LaTeX\)</span> files, insert graphics,
tables, and matrices, and a fantastic dialog for automatically inserting
BibTeX entries. Also, if you’re an R user who creates reports etc. you
can use Sweave directly from gedit to embed R code in your LaTeX documents.</p>
<p>All this in a lightweight text editor, nice!</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
var location_protocol = (false) ? 'https' : document.location.protocol;
if (location_protocol !== 'http' && location_protocol !== 'https') location_protocol = 'https:';
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = location_protocol + '//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Quick guide to setting up a PostGIS database2008-11-28T12:01:00-05:00cfarmertag:carsonfarmer.com,2008-11-28:2008/11/quick-guide-to-setting-up-postgis-database/<p>Recently I decided to seriously start using PostGIS to manage my spatial
data. As I have several projects on the go, organizing and managing my
data effectively has become extremely important, and PostGIS is by far
the most convenient way to do this. There is lots of documentation out
there that explains in detail how to set up PostGIS, but by far the best
reference I’ve found is from <a href="http://tim.linfiniti.com/" title="Tim Sutton's blog">Tim Sutton’s blog</a>, mainly because he
uses Ubuntu, and sudo-apt gets everything you need to have PostGIS
working in minutes.
<!--more--></p>
<p>Here is a <a href="http://tim.linfiniti.com/page/3" title="Setting Up Postgis on Ubuntu">link to the article</a>, and below is a quote from his blog:</p>
<blockquote>
<p>Another reason I love Ubuntu - getting postgis + postgresql is really easy…</p>
</blockquote>
<div class="highlight"><pre>sudo apt-get install postgis postgresql-8.3-postgis
sudo su postgres
createuser -s -d -r -l -P -E -e timlinux
<span class="nb">exit</span>
</pre></div>
<blockquote>
<p>Enter prompts following above commands as needed. Now you have
postgres installed and a user created. Next create an empty spatial database:</p>
</blockquote>
<div class="highlight"><pre>createdb qgis
createlang plpgsql qgis
psql qgis < /usr/share/postgresql-8.3-postgis/lwpostgis.sql
psql qgis < /usr/share/postgresql-8.3-postgis/spatial_ref_sys.sql
</pre></div>
<blockquote>
<p>Easy peasy.</p>
</blockquote>View spatial data attribute tables in R2008-10-14T13:44:00-04:00cfarmertag:carsonfarmer.com,2008-10-14:2008/10/view-spatial-data-attribute-tables-in-r/<p>Many <span class="caps">GIS</span> offer the ability to view the attribute table of a vector
layer. While this is perhaps less obvious in the R environment, it is
not impossible. The following command allows you to visually inspect,
and change any data.frame (or other vector, matrix, etc.), including
Spatial*DataFrames.
<!--more--></p>
<div class="highlight"><pre><span class="kp">invisible</span><span class="p">(</span>edit<span class="p">(</span>spatial_layer<span class="o">@</span>data<span class="p">))</span>
</pre></div>
<p>Note: <code>invisible</code> allows you to close the viewer without filling the
console with the attributes of the table. You could also use:</p>
<div class="highlight"><pre>new_data <span class="o">=</span> edit<span class="p">(</span>spatial_layer<span class="o">@</span>data<span class="p">)</span>
</pre></div>
<p>to assign changes made to the data to a new variable, or use:</p>
<div class="highlight"><pre>spatial_layer<span class="o">@</span>data <span class="o">=</span> edit<span class="p">(</span>spatial_layer<span class="o">@</span>data<span class="p">)</span>
</pre></div>
<p>or,</p>
<div class="highlight"><pre>fix<span class="p">(</span>spatial_layer<span class="o">@</span>data<span class="p">)</span>
</pre></div>
<p>to make changes to the Spatial*DataFrame itself.</p>R spatial indentify tool2008-09-23T12:36:00-04:00cfarmertag:carsonfarmer.com,2008-09-23:2008/09/r-spatial-indentify-tool/<p>This is useful for visually exploring R spatial data such as
<code>SpatialPointDataFrames</code> or <code>SpatialGridDataFrames</code>. By clicking on various
features, the value at that point will be displayed.</p>
<div class="highlight"><pre><span class="kn">library</span><span class="p">(</span>rgdal<span class="p">)</span>
y <span class="o">=</span> readGDAL<span class="p">(</span><span class="kp">system.file</span><span class="p">(</span><span class="s">"pictures/Rlogo.jpg"</span><span class="p">,</span> package<span class="o">=</span><span class="s">"rgdal"</span><span class="p">)[</span><span class="m">1</span><span class="p">],</span> band<span class="o">=</span><span class="m">1</span><span class="p">)</span>
y.grid <span class="o">=</span> y<span class="o">@</span>grid
y.coords <span class="o">=</span> coordinates<span class="p">(</span>y.grid<span class="p">)</span>
image<span class="p">(</span>y<span class="p">)</span>
identify<span class="p">(</span>x<span class="o">=</span>y.coords<span class="p">,</span> y<span class="o">=</span><span class="kc">NULL</span><span class="p">,</span> n<span class="o">=</span><span class="m">1</span><span class="p">)</span>
</pre></div>
<p>where <code>x</code> and <code>y</code> refer to coordinates (in this case because <code>y.coords</code>
contains both <code>x</code> and <code>y</code> coordinates, <code>y</code> can be set to <code>NULL</code>), and <code>n</code> is the
number of features to identify.</p>Find and replace multiple files2008-09-08T09:09:00-04:00cfarmertag:carsonfarmer.com,2008-09-08:2008/09/find-and-replace-multiple-files/<p>Recently, I had to do a find and replace over several individual python
files.There are plenty of scripts out there which will accomplish this,
but I was interested in something simple, and preferably a single line
command. After a lot of Google-ing, I ended up finding <a href="http://rushi.wordpress.com/2008/08/05/find-replace-across-multiple-files-in-linux/#comment-26487">this post</a>,
which does a great job of explaining how to do this in linux. The basic
command is:</p>
<div class="highlight"><pre>find . -name <span class="s2">"\*.py"</span> -print <span class="p">|</span> xargs sed -i <span class="s1">'s/foo/bar/g'</span>
</pre></div>
<p>where <code>find . -name "*.py"</code> is used to find all python files (recursively) in
your directory, and <code>xargs sed -i 's/foo/bar/g'</code> is used to replace all
occurrences of ‘foo’ in the files with ‘bar’.
The link above gives a good explanation of each command (find, xargs, sed),
and how they combine together to create this useful single line command.</p>