It is exciting to do true interdisciplinary research which is part of the revolution that brings advances in computation and automated observation (data) to solve problems in science and engineering. Yet even when in pure mathematics, I valued the exploited automatic computation on concrete examples to gain insight. In moving from math to computer science I was able to apply my background to networking at Yemini’s lab (Columbia) then to computer vision at Nayar’s lab (Columbia). I worked on general models of camera geometry, modeling radiometric response functions, and the development of systems combining projectors and cameras.

At CCNY, my collaboration with Gladkova introduced me to the NOAA-CREST center. Initially we focused on sounder (infra-red spectrometer) data compression. As a baseline, we comprehensively evaluated current compression algorithms on many samples of sounder data over globally and over time. We developed a method to estimate the optimal compression rate through estimating the entropy. We then developed and patented a novel compression algorithm for the sounder on the upcoming GOES-R NOAA satellite. We showed superior performance in an international compression competition but the instrument was cancelled, ending the project.

While designing an implementation of an operational compression algorithm we addressed reconstruction of data from lost scan-lines. This led us to studying the NASA’s MODIS 1.6- micron band which is critical in separating snow and ice from cloud. The MODIS on the Aqua satellite is missing 75% of the scan-lines due to damage. Over two years we were able to develop and evaluate an algorithm to accurately estimate the missing data through statistical regression. This accurate estimation improved both NASA’s snow and cloud products. Our work is currently in operational use at NASA for the collection 6 snow product.

I also worked on other projects which restored and enhanced data from satellite remote sensors using statistical regression. One project created virtual sensors with learning based super resolution. Gladkova and I developed algorithms to reconstruct a virtual green band, and true color images for the upcoming GOES-R mission. We also developed an algorithm to reconstruct the 13.3-micron band for VIIRS, which is needed to estimate cloud top pressure. This algorithm is being evaluated for use within the official NOAA product.

With Gladkova and Ahmed, we applied machine learning classification to detect harmful algae blooms from satellites. More recently, with Gladkova we have developed a classifier of sea ice for the National Ice Center. Our ice product pulls together both microwave and visible imaging. We built a web app for monitoring and visualization of all the major ice products for comparison. The design of the app is based on a prototype I built for NOAA, now operational, to monitor regional sea surface temperature.

In addition to remote sensing, I have worked with Krakauer, and my PhD student Aizenman, on climate research to apply information theoretic measures to evaluate long term forecasting models. We generalized evaluations of point forecasts to information-gain metrics of probabilistic predictions. The most recent work, which has been accepted but under revision, shows that the current long term physics-based forecasts improve when fused with statistical models. A second climate related project with that student, as well as Vörösmarty’s Crossroads Initiative, addresses the risk of climate change to the world’s river deltas. Using unsupervised machine learning led us to identifying deltas heavily dependent on engineering that will be at risk as sea levels rise and construction costs increase. This work appears in the journal Science.

I also collaborate with Mageras and my PhD student Hu at Memorial Sloan Kettering Cancer Center (MSKCC) on segmentation of volumetric medical images. Current clinical practice involves manual contouring of each structure by a medical professional. Automatic methods work in some cases (e.g. healthy organs), but problems remain. Legal and ethical considerations necessitate human oversight. Automatic methods often produce errors requiring manual intervention. A semi-automatic method, we developed, dramatically accelerates manual segmentation, automating where possible, and integrating correction with on-line statistical learning. The interface replaces labor intensive contour delineation with rough brush strokes to indicate examples of different tissue types. The examples are input to a statistical model from which a proposed segmentation is built. A user can correct and then accept the segmentation updating the statistical model. Subsequent segmentations progressively automate much of the work. We extended the method over the years from a Markov to a Conditional Random Field, with fewer assumptions. Starting with greyscale CT images, we eventually tackled multi-modal images (e.g. MRI). Initially we produced bi-class discrimination for normal tissues whereas our recent work gives multi-class segmentation for tumors. Our algorithm was incorporated in MSKCC’s next generation system.

Over the years, my work has been supported from individual grants from NIH (P20), NOAA and from NASA. I have also been collaborating on ONR supported research and participated in several successful interdisciplinary and center grant proposals.

Data Visualization, The Graduate Center 83060

The subject matter of this course will be similar to CSc 59969. However, students will be required to read and present research papers from the field, and there will be guest speakers from the data visualization field. There will be some hands on instructions and lectures as well as a group project.

Data Visualization CSc 59969

One of the fastest growing job opportunities is that of "Data Scientist." The future belongs to the companies and people that turn data into successful products. more...

The revolution resulting from the vast amounts data coming from measurement and transactions on an unprecedented scale, is transforming not only business but also science, and engineering as well. Interpreting and making data useful depends on critically on visualization methods. Visualization of data is important both for exploring the data, and for crafting a story with data to convince your audience of you results.

This course will give an overview of data visualization as well as the overlapping fields of information and scientific visualization. Students will learn to programmatically process and analyze data with Python libraries widely used in statistics, engineering, science and finance. We will cover the design of effective visualizations. Students will learn to build data visualizations directly using matplotlib (Python) and interactive web-based visualizations using JavaScript and D3. Project groups of students will each propose, design and build a visualization of a data set. The course requires students have programming experience such as CSc 102/103 or equivalent.

The goals of the course are for students to:

  • Recognize the appropriate applications and value of visualizations
  • Critically evaluate visualizations and suggest improvements and refinements
  • Apply a structured design process to create effective visualizations
  • Use programmatic tools to scrape, clean, and process data
  • Use principles of human perception and cognition in visualization design
  • Use visualization tools to explore data
  • Create web-based interactive visualizations
  • Use statistical tools to aid visualization of data

Advanced Topics in Internet Programming

In this course will talk mostly about web applications and web services as well other topics in building web based internet applications. It begins with a lecture component. more...

In lectures I will discuss: - History of web technologies - Python client - Test frameworks - Web services REST - We will study Restful APIs such as Twitter, Google, Amazon web services - Web frameworks for building web services eg. Flask, Django - HTML5/JavaScript/CSS for consuming data

These topics will be covered in much less depth and more cursory than in the undergraduate course CSc 473. You will be expected to pick up what you need more rapidly.

While the course will start off with some lectures and exercises initially presenting some foundations it will then incorporate a seminar-like component where students will be expected to present focused expositions on the latest developments in web technologies. These topics will be negotiated with the student. In the past topics have included No-SQL databases such as MongoDB, Redis, and Neo4J, JavaScript front-end frameworks like AngularJS and React, APIs such as Twillio, Sound-Cloud, and map-services such as CartoDB. Groups of 3-4 students will build non-trivial web service projects with or without front-ends. For project technology, see the CSc 473 course, however projects in the course often use more bleeding edge technology instead.

Web Development (aka. Web Site Design) CSc 473

The design and implementation of web sites and web applications. This course will focus on foundational tools in "full-stack" web development. You will be on building a real or at least realistic web solution to solve a "business-problem". There will be an emphasis on testing, working in a small team and software engineering best practices. more...

Why do you need to know this?
Full stack web development is a critical skill. One shouldn’t think of web technologies as being “for the web” but rather general purpose software development skills for a range of applications. Most mobile and desktop applications use web technologies for communication with remote servers. More and more user interface development, even for desktops, now exploits web technology such as HTML5, CSS, and JavaScript. Even server to server communication often uses web APIs to enforce modularity and access from many different platforms. You should be able to stand up an application using a cloud service such as provided by Amazon, Google, Microsoft-Azure, Rackspace or Heroku. Once you can build a basic application from database communication to user interface and deploy it you have some software development superpowers. You can take a almost any new idea, such as for a new social app, marketplace or multi-player game, build, deploy and distribute it with very little resource or investment besides you imagination, your time and your sweat.

What technology will I learn/use?
The course uses HTML5, CSS, JavaScript, and Python for server-side programming. Initially we use the python micro-framework “Flask”. Flask is a light “pay for what you eat” framework providing routing and (server-side) templates without much ceremony. As we begin projects most projects use Django as it provides nearly every basic service and component a web application would need (at the cost of some learning curve).

In addition, you will be expected to use software engineering/development best practices. Your code must pass code linting, for example, using pylint. You will need to write pure unit-tests using mocking, integration tests, using the framework (Django/Flask) and acceptance tests using Selenium or something equivalent. You will need to write developer and project documentation. You and your team will need to track project issues and maintain the code using a collection of forks of code repositories using a distributed version control system (e.g. git or mercurial). Individual grades on group projects are determined both by the overall project quality, as well as individual contribution via code commits, and project management as visible from the code repository. Weekly status summaries on project progress become an important part of the second half of the course.

Due to time limitations there is not enough time to delve into rich user interface frameworks like React or AngularJS. We cover core JavaScript and Ajax, and most projects use a frontend framework such as Twitter-Bootstrap or Foundation, touch on foundations of design, use of color, font and accessibility.

Why can’t I just use PHP/Go/Scala-Play/Node.js/.net/Ruby on Rails/etc.?
I am evaluating a whole package of technologies that work together, from testing, to documentation, from database to front end. There isn’t even enough time in the course to teach all you need to know with the set of technologies I have chosen. Certainly for the purposes of instruction, the choices must to be limited. Because we have group projects, often with students who have limited web development experience, it would not be reasonable to ask them to learn a whole new set of tools unsupported by the initial part of the course. That said if a team can show that they have every part of the software support in place, e.g. automated testing, documentation, linting, and a full featured web framework, I will consider the argument.

For server-side technology, there is much more room for debate. It is hard to establish solid market share numbers but server-side Ruby on Rails, Python, and NodeJS tend to be the most popular for newer projects. PHP is on the decline and while many important mature frameworks such as WordPress and Drupal are written in PHP, PHP presents challenges for software engineering best practices. Ruby like python is a good teaching language, has excellent full featured frameworks like Rails and light ones like Sinatra, robust testing packages and is probably slightly more popular and mature than python as a server-side technology. Python has the advantage of being more broadly used outside of server-side web development, providing synergy with data science, integration with internet of things, and system administration. Moreover, it is more common that students come with some knowledge of python than with Ruby.

More recently the rise of JavaScript server-side technology, based on Node.js, has lead me to consider it for use every semester. Unfortunately, JavaScript is so badly fragmented and so intensely in flux that it becomes very challenging to settle on a stable set of choices. There is even great debate on what “best practice” means for basic JavaScript programming. On one-hand TypeScript and ES7 introduce types and traditional classes found in most other object oriented languages, addressing criticisms of JS by reducing boiler plate code and adding type safety. On the other hand fans of functional programming argue much of this as wrong-headed and making the mistakes of other languages such as Java or C++. The server-side frameworks such as ExpressJS or Meteor do not yet provide the full set of features that Django or Ruby on Rails currently do. The JavaScript tooling remains in flux with battles between Grunt vs. Gulp or Jasmine vs. Mocha raging on.

The choice of JavaScript for the client side technology is not much of a choice. For the moment, the client-side wars are over and JavaScript has prevailed. Moreover, the capabilities of client-side JavaScript become more impressive each week. With a solid knowledge of Python, Django/Flask on the backend, using (usually) a Postgresql database, and HTML5, CSS3 and JavaScript/JQuery on the front-end, students are reasonably well equipped for to build a wide range of application.

Software Engineering and Object Orient Design MIS 2010

This course teaches both the theory and practice of software engineering and object oriented design. We discuss the classical waterfall model and its variants such as iterative waterfall, spiral and the rational unified process. more...

This is contrasted with Agile methodology and we focus on Scrum and extreme programing techniques as representative of some of these ideas. In order for students to understand the concretely the software engineering process they will design and implement a software development project using python using agile and iterative methodology, applying best practices. These best practices include use of distributed version control (git, hg), software testing including TDD and BDD, agile style user stories and user profiles, and issue tracking. Classical UML diagraming and software estimation will also be presented.

Why do I need to program?
It is possible to manage programmers and understanding software engineering best practices without being an elite programmer. It is very difficulty to do so without some hands on experience. This hands on experience will be provided by using a scrum process to build a basic programming project with the Python language.

Python is one of the easiest programming languages to learn, running on nearly all platforms. It has become the most popular language, nation wide, for intro to programming courses, and is frequently taught in middle and high schools. Moreover, there is extensive online material for learning and application. The language is multi-paradigm supporting imperative, procedural and object oriented programming and it is even possible to program functionally in python, although not as naturally. Also, with python, there are well established and popular libraries and scripts for unit, integration and system testing, documentation, profiling and linting as well as agreed best practice coding style.

Students have produced projects during independent studies, as part of one of my courses, or as lab projects. Examples include:

I have also informally given technical mentoring for Zahn Prize winners Jeremy Neiman (but not on his winning 2013 Zahn Project), Amali Nassereddine and Teona Lazashvili which won the Zahn Prize in 2014 and Shawn Augustine's 2015 Zahn Prize winning buildonthego.

Mentoring Guidelines

Frequently students want to work with me on data science or web development projects. Before you ask for an appointment, consider the following:

  1. Your time: You will also need to commit a significant amount of time if you want to get anything but frustration out of this experience. I can promise though that you will learn a lot if you commit to it. If you can't devote a significant amount of time to a working in the lab on a project you won't be able to make much progress. Anticipate spending somewhere between 5 and 20 hrs every week on lab work. With at least 4 hrs per week sitting in the lab on a regular schedule. If you can only come in once a week for an hour between classes, you are not going learn anything useful, don't bother. You will need to be around the lab to talk to other students, especially senior students who are familiar with the work.

    Some students looking for a project often try to "sample" working in multiple research labs at the same time. They may also also try to work largely from home and come in occasionally. They may also try to do this while balancing a full load of classes. Overloading like this has so far not been successful for any of students I have seen. The pattern is wild enthusiasm for 1 month followed by gradual increasing stress followed by vanishing as classes become more demanding.

  2. My time: I am usually in my lab, NAC 7/311, Monday through Thursday; early in the morning is often the best time to catch me. Make an appointment by emailing grossberg@cs.ccny.cuny.edu

    My inbox is often full and sometimes it takes me a while to get back, so you may have to follow up a few times. If you are expecting to be able to drift in late in the afternoon, I will often be busy or gone. Plan accordingly. Again, if you decide you want to take on a project, please take a careful evaluation of your ongoing time commitments. It is a big waste of everybody's time, and disheartening for you to start and leave. Be decisive and stick to it.

  3. Experience required: I almost never have well defined tasks that you can jump into with little or no experience. I often provide toy versions of the problem, but these are essentially "homework exercises" to familiarize you with the data and tools specific to a given project. If they take more than 1-2 weeks to complete, then there is no point to completing them as they are intended solely to get you quickly up to speed. While they are instructive for learning, they have often been done numerous times in the past and are therefore not projects in and of themselves.

  4. Progress and review: Project goals will often be vague; A fundamental difference between between research projects and homework exercises is that not only is the answer not in the back of the book, but we often start out asking the wrong question. One only finds this out by getting some kind of answer (quickly) and seeing if it is in the right direction. This is why short feedback loops are critical.

    You need to be prepared to:

    • do some research
    • build something
    • show me
    • get feedback

    That loop should ideally happen on a weekly basis. If necessary for your project, you will also be working with a senior student in the lab. You are responsible for keeping me updated on your progress; this can be accomplished through emails, meetings, repository check-ins, and other methods as appropriate for the project.

Project Guidelines

  1. Prior work: For any project you work on, you will want to thoroughly research previous work on the topic. If you find that it has been done before, you must have a good reason for why you are redoing it. Often, a tiny technical reason (like implementing the project in a new language) is not a good one. You need to improve on the existing work somehow: by adding important features, making it open source, reducing the resources it uses, improving the user interface, etc.

    In searching for prior work, make sure you don't just use limited keywords you are thinking of. Try to use synonyms and more general topics. Read articles, even on wikipedia, to get some idea of the area around your project topic; this will also yield more keywords to search for.

    Don't just rely on Google. Use other search engines as well such as Bing. Also explore: vlrc and iseek. For academic papers and research:

  2. Big Tools: Don't get lost in big tools. Some students will get very involved in a learning a system, a programming language or a library and never see their way out. Be critically skeptical about completely diving into some new thing because it is new and you think it will magically solve all your problems.

    Very often it solves some problems, but the ones it solves may not be core to what you want to get done. Sometimes and old and dusty library in fortran actually does almost exactly what you need. Figure out why you need this thing, for what, and how much of it you really need. Only learn as much as you need to.

    For example, if you are doing a data science project, then you don't need to learn everything about python software development. It may not even be absolutely essential that you master classes and it is likely you won't need to know python properties. In this situation, a general programming book for the language is a poor place to start. Look at the examples related to your task list and flip back to general references to figure out what you don't understand.

  3. Avoid the "Not Invented Here" Syndrome: NIH is often a problem with stronger developers who "just want to code." They don't want to spend hours learning some third party library. They also feel they will learn more if they try to build something from the ground up. At least in this context (projects here) you don't want to write a line of code that you don't have to; in other words, be lazy in the right way.

    For example: Because somebody has built a good machine learning library, that's a whole lot of code you won't have to debug. If it is an established library, it also often has dealt with some dead ends and mistakes you don't have to go down. If it doesn't do what you need it to, or has bugs or things the authors didn't think of, then you can always extend it and learn that way. If it is open source, you can also contribute back to the project, thereby improving a tool used by many people.

Technical Resources

We mostly work with some flavor of unix (Mac OSX included) in my lab. Unix skills are not optional.

Linux

We mostly use Ubuntu in our own lab but some servers have CentOS. The CS department uses CentOS. I recommend Ubuntu as it is very popular, supported, and generally well documented. Learn Command Line the Hard Way is a good tutorial on the typical way of working with the linux terminal (the shell).

Windows

Depending on the project, you may be able to work solely on Windows. If you do not need a full unix environment, Cygwin and git bash are excellent Linux terminal emulators and Anaconda Python is the easiest way to get a working scientific Python stack on windows.

Sometimes though, you will need to run a linux operating system on windows. You can use virtualbox to emulate your hardware and then use Ubuntu as your guest os. You can also find pre-built virtual box images. Alternatively you will get better performance if you dual boot, but that may involve tweaking your hardware.

OS X

OS X is a linux (BSD) based operating system; therefore all the typical unix terminal commands are built in. You should access them using the terminal or better yet install iTerm2. Also install the homebrew package maneger and avoid "fink" or macports, as they have become problematic over the years.

Software Carpentry

The software carpentry web site has excellent highly condensed materials for learning scientific programming. If you can, go through them quickly and do the hands on exercises before you come in for a project. Make sure you go through the sections on: - The Unix Shell - Version Control with Git - Programming with Python

Data Science

What need to know will depend on the project. It will probably help if you install Anaconda Python and become familar with using Jupyter notebooks.

A good place to start with scientific Python is the Scipy Lecture Notes. You are going to want to have the basics of numpy, scipy, and matplotlib, so go through: - chapter 1; all sections - chapter 2: 2.6 "Image manipulation and processing using Numpy and Scipy." - chapter 3: most of our projects involve some: - 3.1 pandas: Statistics in Python - 3.3 scikit-image: image processing - 3.6. scikit-learn: machine learning in Python.

Remote Sensing and Climate:

Most of our work that deals with remote sensing requires putting things on a map, so you should also know how to use the basemap plotting library. Go through: - Visualizing Earthquakes - Plotting Satellite images - Plotting netcdf data

If you want to deeper dive into basemap there is the basemap tutorial For some great free courses on climate and remote sensing check out MetEd.

Medical Imaging

For medical imaging work you will need to be able to read dicom files so please go through the pydicom tutorial. If you can get it installed, MedPy is a new and promising library.

Web Development

The best introduction would be my class CSc47300 Web Site Design (Web Development).

Most of the projects use a Python based backend, usually either Django or Flask. For the front end, we have mostly been using JQuery and D3.jsfor 2d Visualization, and Three.js for 3D. You should also be comfortable with html5/css3, and twitter bootstrap or foundation framework. An at least passing familiarity with React.js would also be beneficial.