F# is composed of expressions:
// expressions separated by `;`s
1; 2; 3
// expressions separated by new lines
4
5
6
6
If you wrap expressions with [||]
, you get an array:
// This is an array.
[| 1; 2; 3 |]
// This is also an array.
[|
1
2
3
|]
[ 1, 2, 3 ]
Every expression in F# has a type:
1 // int
2. // float
2.0 // float
"abc" // string
abc
All elements of an array must be the of the same type. So [| 1; "a"; true |]
is not valid.
In F#, the ,
separates tuple elements, not collection elements.
// two-ple
1, 2
(1, 2)
Item1 | 1 |
Item2 | 2 |
Tuples are useful for all kinds of things in F#, and the language comes with a terse syntax for representing them:
// 4-ary tuple
3, 4, 5, 6
// Unlike collections (lists and arrays), tuples can hold parameters of different types.
"Erica", 34, false
// Sometimes parentheses are required
(6, 7)
(6, 7)
Item1 | 6 |
Item2 | 7 |
Let's use tuples and arrays together to plot some points. We can install Plotly.NET from NuGet and use it all in one go:
// Use this syntax to install packages from **NuGet**.
// (Only necessary in interative mode. Otherwise can be installed with the command line with `dotnet add <PackageName>`.)
#r "nuget: Plotly.NET"
#r "nuget: Plotly.NET.Interactive"
// Use this syntax to open a module or a namespace.
open Plotly.NET
open Plotly.NET.LayoutObjects
// set some default styling (ignore for now)
let margin = 30.
Defaults.DefaultHeight <- 400
Defaults.DefaultWidth <- 0
Defaults.DefaultTemplate <- Template.init(Layout.init(AutoSize = true, Margin = Margin.init(Top = margin, Left = margin, Right = margin, Bottom = margin)))
- Plotly.NET, 4.2.0
- Plotly.NET.Interactive, 4.2.1
Loading extensions from `C:\Users\retru\.nuget\packages\plotly.net.interactive\4.2.1\lib\netstandard2.1\Plotly.NET.Interactive.dll`
Chart.Point([|
1, 2
2, 4
3, 3
|])
You can also create arrays using the range operator ..
,
// start..end (both inclusive)
[| 1 .. 10 |]
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
// ..step..
[| 5 .. -1 .. -5 |]
[ 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5 ]
or by using sequence expressions:
[| for i in 1..10 -> i * i |]
[ 1, 4, 9, 16, 25, 36, 49, 64, 81, 100 ]
Let's use them together to plot the array of integer squares up to 10:
open Plotly.NET
Chart.Line([| for x in 1 .. 10 -> x, x * x |])
In math, elements belong to one or more sets, and functions map elements between those sets.
Here is a function that maps an element $x$ to itself plus 2: $$f(x) = x + 2$$
We didn't write out the sets that $f$ maps elements between. We can just infer that information based on how $x$ is used in $f$. That information is still there though. If we wanted to specifically say $f$ maps a complex number to a complex number, we could write a domain constraint: $$f(x) : \mathbb{C} \rightarrow \mathbb{C}$$
We could write a function resembling $f$ in F# like so:
fun (x: int) -> (x + 2): int
Here, we take an x
(which we state is an int
) and add 2
to it, which evaluates to x + 2
(which we also state is an int
).
int
is a type, and elements in types do not overlap (unlike math where sets can enclose each other). However, it works a lot like a set here in that it helps us define what our function maps to and from. Our function looks kind of like this:
Similar to our first math notation, we can omit the types and infer them just from how we use x
:
fun x -> x + 2
What is key though, is that even though we didn't write the types, much like our domain constraint, they're still there. The compiler is doing the hard work of inferring the types for us and making sure we're writing code that makes sense.
We can illustrate this by writing another function that evaluates to our function above and manually specifying a result type, which should be int -> int
(since our inner function takes and int
and evaluates to an int
):
▶️ For notebook users
To test the above statement, change
int
to something else (likestring
) below and watch the code break. Take note of which expressions the red squiggles appear under whether you change the first or the secondint
.
fun () -> (fun x -> x + 2): int -> int
Functions aren't very useful unless we can evaluate them, which we do by applying their parameters with arguments, like so:
(fun x -> x + 2) 3
5
ℹ️ Note for C# developers
int -> int
is an F# type, but what does this look like when compiled to an assembly? F# types are a superset of .NET types. All .NET types can be represented in F# but F# function types compile toFSharpFunc
. Reading on to Partial Application and taking a look at some F# decompiled to C# explains why.
let
Bindings¶
Functions allow us to create scopes (the ()
s) wherein we can assign names to values:
(fun x -> // name in (
x + 2 // scope
) 3 // ) = value
5
We can rewrite this use of a function with a let
binding:
let x = 3 in // name = value in (
x + 3 // scope
// )
6
These two pieces of code effectively represent the same thing in F#.
The in
and indentation are used to explicitly define the scope where x
is defined. If you want a binding to be defined for as long as possible (up until the parent scope ends), you can leave out the in
and indentation:
let y = 2 in
let x = 3
x + y // `x` is defined here
// `x` is not defined here
5
Assigning a name to an expression in F# is called a binding, because the value can't change once set. Using =
without a let
compares the equality of two objects.
let a = 3
a = 4
False
Partial Application¶
We can move the body up into the same line with in
:
let x = 3 in x + 2
5
And assign the whole let
expression to another one:
let five = let x = 3 in x + 2
five
5
Rewriting our inner let
binding back to a fun
looks like this:
let five = (fun x -> x + 2) 3
five
5
We can remove the 3
to delay binding the parameter to our function:
let add2 = fun x -> x + 2
add2 3
5
We can rewrite the above function by moving x
to the left of the =
. This results in the same behavior.
let add2 x = x + 2
add2 3
5
We can replace x + 2
with another fun
, one that introduces a parameter y
and uses it in tandem with x
(this is called closure):
let add x = fun y -> x + y
add 3 2
5
We can also move y
to the left of =
:
let add x y = x + y
add 3 2
5
Practically, we've come across a function add
that can take not just one parameter, but two! It may not surprise you that we can keep repeating this process to allow for many parameters. However, under the hood, we can treat functions that take multiple parameters like add
as if they were recursively enclosing fun
s, which means we can bind them to names without applying every single one of their parameters:
let add2 = add 2
add2 3
5
This feature is called partial application. It can help you express complexity using simple, modular pieces:
Here we build add
and divide
from combine
by passing +
and /
to it. This code doesn't do much anything useful though...
// y applied to x applied to f
let combine f x y = f x y
let add = combine (+) // a way to pass the + function
let divide = combine (/)
add 3 4, divide 9 4
(7, 2)
Item1 | 7 |
Item2 | 2 |
We can pass a check
function to combine
that can perform a check and decide whether we want to continue the computation or not.
let combine f check x y =
check f x y
We can build all different kinds of "adders" from combine
:
let normalize f x y = f (abs x) (abs y)
let normalizeThenAdd = combine (+) normalize
normalizeThenAdd -4 5 |> printfn "%d"
9
let printThenAdd = combine (+) (fun f x y -> printfn "Adding %d + %d..." x y; f x y)
printThenAdd 5 6 |> printfn "%d"
Adding 5 + 6... 11
let add = combine (+) id // id is a special function that means "do nothing" in this context
add 3 5 |> printfn "%d"
8
We can build "safe dividers" that check when the denominator = 0 and change behavior in response.
safeDivide
replaces y
with NaN
when y = 0
:
let ``convert divBy0 to NaN`` f x y =
f x (if y = 0 then nan else y)
let safeDivide = combine (/) ``convert divBy0 to NaN``
safeDivide 4 0 |> printfn "%f"
NaN
safeDivide
implicitly evaluates to a float
, though, because NaN
is not a valid value for int
s. Sometimes you absolutely do want integer division, which evaluates to an int
and ignores the remainder.
tryDivide
checks if y = 0
, and if it is, it avoids doing the division altogether (by not evaluating cont
).
ℹ️ Note
I should move this example down further to when I explain Option types, perhaps referencing this example. It is not 100% clear what is going on here without explaining option types.
let ``convert divBy0 to None`` cont x y =
if y = 0 then None else Some(cont x y)
// remove this example for now and reference it when teaching the Option type
let tryDivide = combine (/) ``convert divBy0 to None``
tryDivide 4 1 |> printfn "%O"
tryDivide 4 0 |> printfn "%O"
Some(4) <null>
Let's start with a simple exercise comparing with Python:
#!connect jupyter --kernel-name pythonkernel --conda-env base --kernel-spec python3
The #!connect jupyter
feature is in preview. Please report any feedback or issues at https://github.com/dotnet/interactive/issues/new/choose.
Kernel added: #!pythonkernel
We create a function that prints the combined age of two people:
def add_person_age(person1, person2):
print(f"{person1.name} and {person2.name}'s combined age is {person1.age + person2.age}")
We construct two objects that each contain the attributes our function uses:
# using the type function
person1 = type("", (), {})
person1.name, person1.age = "Rebecca", 23
# using a class
class Person:
# optionally write an __init__ function to instead pass attribute values to a constructor
pass
person2 = Person()
person2.name, person2.age = "Eric", 27
And then call the function with our objects:
add_person_age(person1, person2)
Rebecca and Eric's combined age is 50
person3 = type("", (), {})
person3.name = "Eric"
# oops we forgot to assign Age...
# ❌ this code compiles but fails in the function call ⬇️ when trying to add an Age of None
add_person_age(person1, person3)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[1], line 5 2 person3.name = "Eric" 3 # oops we forgot to assign Age... 4 # ❌ this code compiles but fails in the function call ⬇️ when trying to add an Age of None ----> 5 add_person_age(person1, person3) Cell In[1], line 2, in add_person_age(person1, person2) 1 def add_person_age(person1, person2): ----> 2 print(f"{person1.name} and {person2.name}'s combined age is {person1.age + person2.age}") AttributeError: type object '' has no attribute 'age'
AttributeError
We know that invoking add_person_age(person1, person3)
will always error before executing. However, the error isn't thrown until we actually run our code.
Let's write one possible F# alternative:
let inline addPersonAge person1 person2 =
printfn "%s and %s's combined age is %d"
(^T: (member Name : string) person1)
(^U: (member Name : string) person2)
((^T: (member Age : int) person1) + (^U: (member Age : int) person2))
^T: (member Name : string)
is called a type constraint. When applied to person1
, it's the same as accessing person1.Name
but also tells the F# compiler that person1
has to have a Name
of type string
.
We can evaluate addPersonAge
with anonymous records:
addPersonAge
{| Name = "Rebecca"; Age = 23 |}
{| Name = "Eric"; Age = 27 |}
Rebecca and Eric's combined age is 50
// ❌ this code does not compile
addPersonAge
{| Name = "Rebecca"; Age = 23 |}
{| Name = "Eric" |} // oops... we forgot to add `Age`
input.fsx (4,5)-(4,24) typecheck error The type '{| Name: string |}' does not support the operator 'get_Age'. See also input.fsx(2,0)-(2,12).
This feature is called Statically Resolved Type Parameters. It's pretty useful for when you want to make assertions about the data your function accepts, but you'd like to write your function in a way that accepts all kinds of data.
For example, Jonny's record has an additional property below, but we can still use it with addPersonAge
because it satisfies the necessary type constraints (requires member Name and member Age
):
addPersonAge
{| Name = "Rebecca"; Age = 23 |}
{| Name = "Jonny"; Age = 34; IsAdmin = true |}
Rebecca and Jonny's combined age is 57
We can make the body of addPersonAge
more concise by moving the type constraints to the function signature:
let inline addPersonAge<'T, 'U
when 'T : (member Name : string)
and 'T : (member Age : int)
and 'U : (member Name : string)
and 'U : (member Age : int)>(person1: 'T) (person2: 'U) =
printfn "%s and %s's combined age is %d"
person1.Name
person2.Name
(person1.Age + person2.Age)
However... in turn, it makes the function signature a little cluttered...
When we have control of our source data, it's often better to use explicit types:
type Person = {
Name : string
Age : int
}
The following code creates a person1
of type Person
, which denotes that person1
has the exact shape of Person
, no more members, no fewer.
let person1 : Person = { Name = "Rebecca"; Age = 23 }
The : Person
is called the type signature. It's often not needed, which we can demonstrate with our new addPersonAge
:
let addPersonAge person1 person2 =
printfn "%s and %s's combined age is %d"
person1.Name
person2.Name
(person1.Age + person2.Age)
addPersonAge { Name = "Rebecca"; Age = 23 } { Name = "Rebecca"; Age = 23 }
Rebecca and Rebecca's combined age is 46
Just because we didn't have to write the type doesn't mean it's not there. The compiler looks at how we used the parameter and infers the type. This is - perhaps unsurprisingly - called type inference.
Type Providers¶
We did still have to define the structure of Person
up front though. This is called domain modeling, and it's useful when you are writing an application Type Providers to model a business function and want to reduce number of possible error states to a minimum.
Sometimes, however, we're not writing an application but instead writing a script that deals with large amounts of data, and our stupid mistakes might come from accidentally misinterpreting the structure of our data.
F# has a feature for this called type providers. Whenever we're dealing with external data imports, type providers create a type for us based on the structure of the data we're importing, meaning we don't have to manually create types for huge data sets and our data and types never get out of sync.
Here's a quick example using the FSharp.Data WorldBank type provider that I took from their documentation:
#r "nuget: FSharp.Data"
open FSharp.Data
let data = WorldBankData.GetDataContext()
data.Countries.``United Kingdom``.Indicators.``Gross capital formation (% of GDP)``
|> Seq.maxBy fst
- FSharp.Data, 6.4.0
(2022, 18.6020102387308)
Item1 | 2022 |
Item2 | 18.6020102387308 |
The WorldBank provider is a premade example where a type is specifically created and republished using a well-known data source. But we can also use "data type" (CSV, JSON, SQL, etc.,) type providers to create types for our own data:
// define our data source
[<Literal>]
let uri = "http://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true"
// create a type using it
// (normally, we'd pass a smaller and local file with the same structure here, but passing the same `uri` is fine for example / notebooks)
type Stocks = CsvProvider<uri>
// get a sample of the data from the new type using the default data source
let msft = Stocks.GetSample()
// plot the high vs low daily difference over time
Chart.Line([ for row in msft.Rows -> row.Date, row.High - row.Low ])
Working with type providers can sometimes be less convenient than dynamically typed data access libraries like pandas
's DataFrame
when your data isn't well structured.
For example, if you had data with similar structure as above (Date
, Open
, Adj Close
, etc., columns) but for multiple companies all in one CSV file, they might be addressed with the name of the company first, then the column name, such as MSFT_Date
, for example.
You could access this data in an unsafe way using programmatic access, such as with f-strings in Python (frame[f"{ticker}_{column_name}"]
), but F# type providers would have no knowledge of the implicit structure via column naming because CSV does not allow for multi-level indexing unlike other data types like JSON. In fact, iterating over all or a subset of columns from a CsvProvider
type requires a hack, and you'd be better off importing into a semi-strongly-typed DataFrame
using a package called Deedle than using type providers.
However, when you need to access a few named columns of homogenous types, it's actually not too difficult to work with the data provided from type providers as pure collections and not use a data frame at all:
let resample (interval : TimeSpan) (observations : (DateTime * decimal array) seq) =
let groups = observations |> Seq.groupBy (fun (date, _) -> DateTime((date.Ticks / interval.Ticks) * interval.Ticks))
let flattenByAverage =
Seq.reduce (fun acc next ->
(acc |> Array.zip next)
|> Array.map (fun (a, b) -> (a + b) * decimal 0.5)
)
let flattenByTakeFirst = Seq.head
groups
|> Seq.map (fun (key, group) ->
key,
let rows = group |> Seq.map snd
flattenByAverage rows
)
let print observations =
observations
|> Seq.map (fun (date, (cols : 'a array)) -> {| Date = date; Low = cols[0]; High = cols[1] |})
|> Array.ofSeq
|> _.DisplayTable()
msft.Rows
|> Seq.map (fun row -> row.Date, [| row.Low; row.High |])
|> resample (TimeSpan.FromDays 7)
|> print
Date | High | Low |
2023-03-06 00:00:00Z | 255.4656199375 | 249.8818779375 |
2023-03-13 00:00:00Z | 276.5512450625 | 267.7306269375 |
2023-03-20 00:00:00Z | 280.2400038125 | 274.1731263125 |
2023-03-27 00:00:00Z | 285.7424945000 | 280.8943768125 |
2023-04-03 00:00:00Z | 290.167492125 | 282.947505875 |
2023-04-10 00:00:00Z | 288.5650063125 | 283.2793785000 |
2023-04-17 00:00:00Z | 287.9837437500 | 284.1906227500 |
2023-04-24 00:00:00Z | 303.6206265000 | 296.6893751250 |
2023-05-01 00:00:00Z | 310.1125010625 | 304.0624923750 |
2023-05-08 00:00:00Z | 310.9949970625 | 306.5987567500 |
2023-05-15 00:00:00Z | 317.4143754375 | 314.0462454375 |
2023-05-22 00:00:00Z | 328.3193703125 | 320.5950012500 |
2023-05-29 00:00:00Z | 336.092498750 | 329.686241250 |
2023-06-05 00:00:00Z | 330.5868720625 | 325.0318795625 |
2023-06-12 00:00:00Z | 347.3925016875 | 337.8481349375 |
2023-06-19 00:00:00Z | 338.986244000 | 333.551254000 |
2023-06-26 00:00:00Z | 339.6562518125 | 334.5931281250 |
2023-07-03 00:00:00Z | 341.961250500 | 336.287502250 |
2023-07-10 00:00:00Z | 346.1118680000 | 339.7806226250 |
2023-07-17 00:00:00Z | 354.5606174375 | 343.0856190625 |
(33 more) |
If you do need access to a data frame, it's pretty trivial to import data into a DataFrame
using Deedle and perform operations on it. You won't get the helpful IntelliSense hints informing you of valid navigations, and you won'll get most type errors at runtime instead of compile time, but the errors should be clearer than Python in most cases.
#r "nuget: Deedle"
- Deedle, 3.0.0
open System.Net
open System.Net.Http
open Deedle
[<Literal>]
let url = "http://query1.finance.yahoo.com/v7/finance/download/MSFT?period1=1678116713&period2=1709739113&interval=1d&events=history&includeAdjustedClose=true"
let frame =
Frame.ReadCsv((new HttpClient()).GetStreamAsync(url).Result)
|> Frame.indexRowsDate "Date"
frame.Print()
Open High Low Close Adj Close Volume 3/6/2023 12:00:00 AM -> 256.429993 260.119995 255.979996 256.869995 254.778961 24109800 3/7/2023 12:00:00 AM -> 256.299988 257.690002 253.389999 254.149994 252.081085 21473200 3/8/2023 12:00:00 AM -> 254.039993 254.539993 250.809998 253.699997 251.634766 17340200 3/9/2023 12:00:00 AM -> 255.820007 259.559998 251.580002 252.320007 250.266006 26653400 3/10/2023 12:00:00 AM -> 251.080002 252.789993 247.600006 248.589996 246.566345 28333900 3/13/2023 12:00:00 AM -> 247.399994 257.910004 245.729996 253.919998 251.852966 33339700 3/14/2023 12:00:00 AM -> 256.750000 261.070007 255.860001 260.790009 258.667053 33620300 3/15/2023 12:00:00 AM -> 259.980011 266.480011 259.209991 265.440002 263.279175 46028000 3/16/2023 12:00:00 AM -> 265.209991 276.559998 263.279999 276.200012 273.951630 54768800 3/17/2023 12:00:00 AM -> 278.260010 283.329987 276.320007 279.429993 277.155304 69527400 3/20/2023 12:00:00 AM -> 276.980011 277.480011 269.850006 272.230011 270.013916 43466600 3/21/2023 12:00:00 AM -> 274.880005 275.000000 269.519989 273.779999 271.551300 34558700 3/22/2023 12:00:00 AM -> 273.399994 281.040009 272.179993 272.290009 270.073456 34873300 3/23/2023 12:00:00 AM -> 277.940002 281.059998 275.200012 277.660004 275.399750 36610900 3/24/2023 12:00:00 AM -> 277.239990 280.630005 275.279999 280.570007 278.286041 28172000 : ... ... ... ... ... ... 2/14/2024 12:00:00 AM -> 408.070007 409.839996 404.570007 409.489990 409.489990 20401200 2/15/2024 12:00:00 AM -> 408.140015 409.130005 404.290009 406.559998 406.559998 21825500 2/16/2024 12:00:00 AM -> 407.959991 408.290009 403.440002 404.059998 404.059998 22281100 2/20/2024 12:00:00 AM -> 403.239990 404.489990 398.010010 402.790009 402.790009 24307900 2/21/2024 12:00:00 AM -> 400.170013 402.290009 397.220001 402.179993 402.179993 18631100 2/22/2024 12:00:00 AM -> 410.190002 412.829987 408.570007 411.649994 411.649994 27009900 2/23/2024 12:00:00 AM -> 415.670013 415.859985 408.970001 410.339996 410.339996 16295900 2/26/2024 12:00:00 AM -> 411.459991 412.160004 407.359985 407.540009 407.540009 16193500 2/27/2024 12:00:00 AM -> 407.989990 408.320007 403.850006 407.480011 407.480011 14835800 2/28/2024 12:00:00 AM -> 408.179993 409.299988 405.320007 407.720001 407.720001 13183100 2/29/2024 12:00:00 AM -> 408.640015 414.200012 405.920013 413.640015 413.640015 31947300 3/1/2024 12:00:00 AM -> 411.269989 415.869995 410.880005 415.500000 415.500000 17800300 3/4/2024 12:00:00 AM -> 413.440002 417.350006 412.320007 414.920013 414.920013 17596000 3/5/2024 12:00:00 AM -> 413.959991 414.250000 400.640015 402.649994 402.649994 26919200 3/6/2024 12:00:00 AM -> 402.970001 405.160004 398.390015 402.089996 402.089996 22344100
let df =
frame?Low
|> Series.sampleTime (TimeSpan.FromDays 7) Direction.Forward
|> Series.mapValues (fun v -> Stats.mean v)
df.Print()
3/6/2023 12:00:00 AM -> 251.8720002 3/13/2023 12:00:00 AM -> 260.0799988 3/20/2023 12:00:00 AM -> 272.40599979999996 3/27/2023 12:00:00 AM -> 278.0919984 4/3/2023 12:00:00 AM -> 283.64250925 4/10/2023 12:00:00 AM -> 283.0340024 4/17/2023 12:00:00 AM -> 285.17000160000003 4/24/2023 12:00:00 AM -> 289.076001 5/1/2023 12:00:00 AM -> 304.1639953999999 5/8/2023 12:00:00 AM -> 306.58600459999997 5/15/2023 12:00:00 AM -> 311.6499938 5/22/2023 12:00:00 AM -> 317.95 5/29/2023 12:00:00 AM -> 328.77999124999997 6/5/2023 12:00:00 AM -> 327.41800539999997 6/12/2023 12:00:00 AM -> 333.5020082 ... -> ... 11/27/2023 12:00:00 AM -> 375.7160034 12/4/2023 12:00:00 AM -> 366.2200012 12/11/2023 12:00:00 AM -> 367.547998 12/18/2023 12:00:00 AM -> 370.3599976 12/25/2023 12:00:00 AM -> 373.48750325000003 1/1/2024 12:00:00 AM -> 367.237503 1/8/2024 12:00:00 AM -> 376.31000359999996 1/15/2024 12:00:00 AM -> 389.012497 1/22/2024 12:00:00 AM -> 398.5859986 1/29/2024 12:00:00 AM -> 402.6699952 2/5/2024 12:00:00 AM -> 408.3839966 2/12/2024 12:00:00 AM -> 406.0880066 2/19/2024 12:00:00 AM -> 403.19250475 2/26/2024 12:00:00 AM -> 406.6660032 3/4/2024 12:00:00 AM -> 403.78334566666666